Groq (inference platform)
Also known as: GroqCloud, LPU, Language Processing Unit
Groq is a US provider of exceptionally fast LLM inference. Instead of graphics cards, the company uses custom-built chips: Language Processing Units (LPUs). They are designed for one job only — running language models, not training them.
Not to be confused with Grok — the AI model from xAI. Groq is an infrastructure platform, Grok is a language model.
What the platform offers
GroqCloud serves open models via API — including Llama from Meta, OpenAI’s GPT-OSS, plus Qwen and DeepSeek variants. Closed models such as ChatGPT or Claude are deliberately absent. In late 2025, Nvidia licensed the LPU technology for around 20 billion US dollars. GroqCloud continues to operate as an independent service.
What sets it apart: speed
LPUs deliver answers far faster than typical GPU inference. Llama 3.3 70B reaches roughly 300–400 tokens per second on GroqCloud. Smaller models hit up to 1,000. That equals several pages of text per second.
Typical use cases
Speed matters wherever people wait for answers: chatbots, voice assistants, and AI agents that execute many steps in sequence. The faster each individual response, the smoother the entire workflow.
Discuss the next step in a free diagnostic call. Book a call →
As of: June 2026