How to run a local LLM server step by step

Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio

These tools represent different layers of the AI stack. While they overlap, they generally serve distinct purposes:

Serving (Llama.cpp, vLLM),

Managing (Ollama, LM Studio), 

Routing (LiteLLM).

 

Managing (Ollama, LM Studio)

Ollama

A local LLM inference/runtime platform.It handles model downloads, storage, and execution with a simple CLI/API. Think of it as a “local LLM server”.

 

LM Studio

A  desktop application.

 

Serving (Llama.cpp, vLLM)

llama.cpp

Run a model on a Raspberry Pi.

 

vLLM

Build a high-traffic AI startup or production API.

 

Routing (LiteLLM)

LiteLLM

LiteLLM is not an inference engine; it is a Proxy/Router.A proxy/gateway layer that provides a unified, OpenAI-compatible API for calling many LLM providers (cloud and local).

Comments


Comments are closed