netzstrategen AI Operations.
Tools & Regulation

Self-hosted AI

Also known as: Self-hosting, local LLMs, on-premise AI

Self-hosted AI means language models run on a company’s own infrastructure instead of a vendor’s cloud. Requests and data never leave the organisation. The foundation is usually an open source LLM.

Technical requirements

Graphics memory (VRAM) is the deciding factor. Small models with 7–8 billion parameters run compressed from around 8 GB of VRAM — a decent consumer GPU. The 70B class needs roughly 40–48 GB compressed, for example two 24 GB cards. Uncompressed, it takes around 140 GB — data centre hardware.

Common tools

Three open-source tools dominate: Ollama for an easy start on single machines, vLLM for high-performance production serving, and Open WebUI as a browser-based chat interface. All three are actively maintained.

Benefits and drawbacks

The benefits: full data sovereignty, no per-token API costs (LLM inference), and adaptability up to fine-tuning. The drawbacks: hardware investment, operational responsibility — updates, security, monitoring — and model upgrades handled in-house.

Who it makes sense for

Self-hosting pays off with sensitive data, high request volumes, or strict compliance requirements — such as data residency in the EU. At low volumes, cloud inference is usually cheaper and faster to launch.

Positioning

Discuss the next step in a free diagnostic call. Book a call →

As of: June 2026