G
Groq
AI & Machine Learning · Fast AI Inference
Ultra-fast AI inference powered by Groq's custom LPU chips. Free tier offers blazing-fast access to LLaMA, Mixtral, and Gemma models.
No Credit CardForever FreeUltra FastBest for Students
Rate Limits
Requests per Minute30 RPM
Requests per Day14,400 RPD
Tokens per Minute6,000 TPM
Other: Rate limits vary by model. LLaMA 3.1 70B: 30 RPM, 14,400 RPD.
Free Tier Details
✅ Included
- LLaMA 3.1 70B & 8B
- Mixtral 8x7B
- Gemma 2 9B
- Whisper Large v3 (speech-to-text)
- LLaVA (vision)
- Tool use / function calling
❌ Not Included
- —Dedicated capacity
- —SLA guarantee
- —Priority queue
How to Get Your Free API Key
Sign up at console.groq.com with email or Google.
https://console.groq.com/signupGo to API Keys and create a new key.
https://console.groq.com/keysGroq uses an OpenAI-compatible API format for easy integration.
How to Test Your Key
Send a chat completion request. Notice the blazing-fast response time!
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.1-70b-versatile","messages":[{"role":"user","content":"Hello!"}]}'Expected: JSON response in ~200ms — much faster than typical cloud inference.
Hidden Limitations
- Token per minute limits are relatively low (6,000 TPM for some models)
- Model selection is more limited than OpenAI/Anthropic
- No fine-tuning support
- Occasional capacity issues during peak demand
Official Links
Last verified: 2026-02-15 · Last updated: 2026-02-15