How to run a local LLM server step by step

Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio

These tools represent different layers of the AI stack. While they overlap, they generally serve distinct purposes:

Serving (Llama.cpp, vLLM),

Managing (Ollama, LM Studio), 

Routing (LiteLLM).

 

Managing (Ollama, LM Studio)

Ollama

A local LLM inference/runtime platform.It handles model downloads, storage, and execution with a simple CLI/API. Think of it as a “local LLM server”.

Run AI Models locally  integrate via API

 

LM Studio

A  desktop application.

Run AI Models locally with a Chat UI

 

Serving (Llama.cpp, vLLM)

llama.cpp - run ai model on edge devices.

Run a model on a Raspberry Pi.

 

vLLM

Build a high-traffic AI startup or production API.

 

Routing (LiteLLM)

LiteLLM

LiteLLM is not an inference engine; it is a Proxy/Router.A proxy/gateway layer that provides a unified, OpenAI-compatible API for calling many LLM providers (cloud and local).

Comments


Comments are closed