Topic

Infrastructure

3 articles with this tag

Groq (inference platform)

Groq runs LLM inference on custom LPU chips – at hundreds of tokens per second. Models, speed, and typical use cases at a glance.

LLM inference is a language model in day-to-day operation. How token costs add up, what drives speed, and which providers matter.

Self-hosted AI means language models run on company hardware. Requirements, tools like Ollama and vLLM, benefits and limits at a glance.