Back to all APIs
G

Groq

AI & Machine Learning · Fast AI Inference

Ultra-fast AI inference powered by Groq's custom LPU chips. Free tier offers blazing-fast access to LLaMA, Mixtral, and Gemma models.

No Credit CardForever FreeUltra FastBest for Students
Duration

Forever (with rate limits)

Credit Card

Not Required

Rating

4.7/5 (376)

Geo Restrictions

None

Rate Limits

Requests per Minute30 RPM
Requests per Day14,400 RPD
Tokens per Minute6,000 TPM
Other: Rate limits vary by model. LLaMA 3.1 70B: 30 RPM, 14,400 RPD.

Free Tier Details

✅ Included

  • LLaMA 3.1 70B & 8B
  • Mixtral 8x7B
  • Gemma 2 9B
  • Whisper Large v3 (speech-to-text)
  • LLaVA (vision)
  • Tool use / function calling

❌ Not Included

  • Dedicated capacity
  • SLA guarantee
  • Priority queue

How to Get Your Free API Key

Sign up at console.groq.com with email or Google.

https://console.groq.com/signup

Go to API Keys and create a new key.

https://console.groq.com/keys

Groq uses an OpenAI-compatible API format for easy integration.

How to Test Your Key

Send a chat completion request. Notice the blazing-fast response time!

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.1-70b-versatile","messages":[{"role":"user","content":"Hello!"}]}'

Expected: JSON response in ~200ms — much faster than typical cloud inference.

Hidden Limitations

  • Token per minute limits are relatively low (6,000 TPM for some models)
  • Model selection is more limited than OpenAI/Anthropic
  • No fine-tuning support
  • Occasional capacity issues during peak demand

Last verified: 2026-02-15 · Last updated: 2026-02-15