Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio
How to run a local LLM server step by step
Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio
These tools represent different layers of the AI stack. While they overlap, they generally serve distinct purposes:
Serving (Llama.cpp, vLLM),
Managing (Ollama, LM Studio),
Routing (LiteLLM).
Table of Contents
Managing (Ollama, LM Studio)
Ollama
A local LLM inference/runtime platform.It handles model downloads, storage, and execution with a simple CLI/API. Think of it as a “local LLM server”.
Run AI Models locally integrate via API
LM Studio
A desktop application.
Run AI Models locally with a Chat UI
Serving (Llama.cpp, vLLM)
llama.cpp - run ai model on edge devices.
Run a model on a Raspberry Pi.
vLLM
Build a high-traffic AI startup or production API.
Routing (LiteLLM)
LiteLLM
LiteLLM is not an inference engine; it is a Proxy/Router.A proxy/gateway layer that provides a unified, OpenAI-compatible API for calling many LLM providers (cloud and local).
Comments
Comments are closed