Yes, LM Studio is free to download and use on macOS, Windows and Linux. You only pay for the hardware you run models on — there's no subscription and no API key required.

What hardware do I need for LM Studio?

Any modern laptop can run a small model on CPU, but it's slow. A GPU with 8 GB+ VRAM or an Apple Silicon Mac with 16 GB+ unified memory makes 7B–8B models genuinely usable. Bigger models need more VRAM.

Does LM Studio work offline?

Yes. After you download a model, everything runs locally on your machine — no internet needed, no data leaves your computer. You only need a connection to browse and download new models.

LM Studio: A Complete Beginner's Guide (2026)

By LocalLLMGear Editorial · Editorial Team · Updated 2026-06-29

We test hardware hands-on and may use AI tools in research — every guide is human-reviewed. Editorial policy.

We may earn a commission from links in this article, at no extra cost to you. Disclosure.

If you want to run a local LLM but the terminal feels intimidating, LM Studio is the easiest place to start. It’s a free desktop app that lets you download models and chat with them in a clean, ChatGPT-style window — fully private, fully offline, no API keys. Here’s the whole process from install to running your own local API server.

The 30-second answer: Install LM Studio, open the built-in model browser, download a quantized model that fits your VRAM, and start chatting. When you’re ready, flip on the local server to get an OpenAI-compatible API at localhost. It’s all free.

What LM Studio actually is

LM Studio is a free desktop application (macOS, Windows, Linux) that wraps everything you need to run local language models into one graphical interface:

A built-in model browser to search and download open models (Llama, Mistral, Qwen, Gemma, DeepSeek and more) without hunting around the web.
A chat window that looks and feels like a normal AI chat app.
A local API server that exposes an OpenAI-compatible endpoint on your machine, so your own apps and scripts can talk to the model.

If you’d rather drive everything from the command line, the alternative is Ollama — see How to run Llama locally with Ollama for that route, and LM Studio vs Ollama for a direct comparison.

Step 1 — Install LM Studio

Download the installer from the official site for your OS:

macOS: the app runs natively on Apple Silicon (M-series), which is genuinely good for local models thanks to unified memory.
Windows: standard installer; an NVIDIA GPU gives the biggest speed-up.
Linux: an AppImage build is provided — make it executable and run it.

It installs like any normal desktop app. No command line, no configuration files to edit.

Step 2 — Download a model

Open the Discover / search tab inside the app and look for a model. The single most important thing for beginners: pick a quantized GGUF version that fits your VRAM.

Quantization shrinks a model so it uses far less memory, with only a small quality trade-off. You’ll see labels like Q4_K_M (a popular 4-bit balance) or Q5_K_M. A rough rule of thumb:

8 GB VRAM (or 16 GB Mac): a 7B–8B model at 4-bit fits comfortably.
12–16 GB: you can step up to 13B–14B models or higher-quality quants.
24 GB+: larger 30B-class models become realistic.

LM Studio usually flags whether a given file will fit your machine, which takes the guesswork out of it. Start small — an 8B model is plenty to learn with — and size up later.

Step 3 — Start chatting

Switch to the Chat tab, load the model you downloaded from the dropdown, and type. The first response may take a moment while the model loads into memory; after that it streams replies like any chat app. Everything stays on your machine — nothing is sent to the cloud.

Step 4 — Run the local API server

This is where LM Studio gets powerful. Open the Developer / Local Server tab and start the server. LM Studio then serves an OpenAI-compatible API locally, typically at http://localhost:1234/v1.

That means any tool or library that already speaks the OpenAI format can point at your local model just by changing the base URL — no code rewrite needed. A minimal request looks like:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

This is how you wire a local model into note apps, coding assistants, scripts or your own side projects — all running offline and free.

Step 5 — Tips that make it better

A few settings worth knowing once you’re past “hello world”:

Context length: how much text the model can “remember” in one conversation. Longer context uses more memory, so raise it only when you need it.
GPU offload: LM Studio lets you choose how many model layers run on the GPU. More layers on the GPU = faster, until you run out of VRAM. If you hit out-of-memory errors, lower the offload or pick a smaller quant.
Keep models small first: it’s far better to run an 8B model quickly than to fight a 13B model that swaps to disk and crawls.

If models are slow or won’t load

Nine times out of ten, the bottleneck isn’t LM Studio — it’s VRAM. If a model refuses to load, loads partly onto the CPU, or generates text painfully slowly, your GPU simply doesn’t have enough memory for that model at that quant. The fix is either a smaller/more aggressively quantized model, or more VRAM.

If you’re thinking about an upgrade, our Best GPU for local LLMs guide breaks down exactly how much VRAM each model tier needs, and the rest of the hardware category covers budget picks, prebuilts and full rig builds.

Going further

LM Studio gets you running in minutes, but understanding prompting, quantization and building on top of local models is where it gets genuinely useful. A structured course is the fastest way to fill the gaps:

Go deeper with a DataCamp course Ad

Once you’re comfortable here, try the command-line route too — Ollama pairs nicely with LM Studio for scripting and servers.