Local AI on a MacBook Pro with 16 GB RAM: What Actually Works

June 14, 2026 · 8 min read

Software Architect

Running AI coding assistants locally sounds appealing — no API costs, your code stays on your machine, and you get a Claude Code-like experience for free. But getting it to actually work well on a 16 GB MacBook Pro M1 takes more trial and error than most guides admit.

This post summarizes months of real-world testing across multiple tools, runtimes, and models. The conclusion surprised me.

The Goal

Build a local, subscription-free AI development environment that can support:

Architecture discussions and documentation
Repository exploration and planning
Code modifications, refactoring, and bug fixes
ADRs and Obsidian notes
Angular, NestJS, .NET, Docker, and Kubernetes projects

The setup should run entirely locally on a MacBook Pro with 16 GB RAM.

The Biggest Discovery

Before getting into tool comparisons and benchmarks, the most important finding was this:

Agent capability > Coding benchmark score

A model that can find files, edit code, use tools, and execute multi-step workflows is often more useful than a model with a higher coding benchmark score that cannot reliably do those things.

This single insight changed which models and tools came out on top.

Tools Evaluated

Three coding agent frameworks were tested in depth:

Tool	What It Is
OpenCode	Agent-oriented terminal interface
Aider	Git-focused code editing assistant
Continue.dev	VSCode extension with chat and autocomplete

Two local model runtimes were compared:

Runtime	What It Is
Ollama	Simple, popular CLI-based local model server
LM Studio	GUI-based model manager with an OpenAI-compatible API

Tool Comparison Results

Tool	Architecture	Docs	Coding	Tool Use	Ease of Use	Verdict
OpenCode	10/10	10/10	8/10	9/10	8/10	Primary Workspace
Continue.dev	6/10	4/10	7/10	5/10	10/10	IDE Companion
Aider	2/10	1/10	5/10	3/10	6/10	Not recommended

OpenCode

OpenCode won as the primary workspace because it is built for agentic workflows. It plans, explores repositories, generates documents, and writes long-form content reliably.

Strengths: Planning, architecture analysis, documentation, repository exploration, long-form content
Weaknesses: Requires a larger context window; performance depends heavily on model tool-use capability

Aider: The Surprise Disappointment

Aider has a good reputation for code editing, and that reputation may be deserved in other setups. In this workflow it underperformed significantly.

During testing, a simple prompt like Create a file containing one line: hello caused Aider to generate unrelated JavaScript instead of creating the file. It also failed at file discovery and documentation generation.

In this workflow it was not recommended.

Continue.dev

The simplest of the three. Install the VSCode extension, point it at a local model, and it works. Great for IDE chat, quick explanations, and autocomplete. Not the right tool for architecture work or anything requiring multi-step reasoning.

Runtime Comparison

The runtime turned out to matter more than expected.

Runtime	OpenCode Experience	Model Management	Troubleshooting	Verdict
LM Studio	10/10	10/10	10/10	Preferred for OpenCode
Ollama	6/10	7/10	6/10	Secondary (Continue.dev)

What went wrong with Ollama and OpenCode

When connecting OpenCode to local models through Ollama, several problems appeared:

Some models returned no answer at all
Some responses were truncated mid-output
Context handling was opaque — hard to know what was actually loaded
OpenCode's agent prompt was too heavy for small contexts
The experience was inconsistent across sessions

Why LM Studio worked better

After switching OpenCode to LM Studio, the problems mostly disappeared:

The exact loaded model ID was visible via /v1/models
Context length was visible and configurable in the UI
The OpenAI-compatible API made troubleshooting straightforward
Models that failed through Ollama worked correctly through LM Studio

Key observation: The models themselves were not the main problem. The Ollama + OpenCode combination was less reliable on this machine. LM Studio + OpenCode behaved better with the same models.

You can verify what LM Studio is serving at any time:

curl http://localhost:1234/v1/models

Model Comparison

Model	Speed	Tool Use	Architecture	Docs	Coding	Final Role
qwen3.5:9b	7/10	10/10	10/10	9/10	8/10	Main model
qwen3.5:2b	10/10	9/10	6/10	7/10	5/10	Fast agent
Ministral 3B	8/10	8/10	7/10	8/10	6/10	Alternative
qwen2.5-coder:7b	6/10	4/10	6/10	5/10	8/10	Coding specialist (weak agent)
llama3.1:8b	5/10	4/10	6/10	6/10	6/10	Rejected
gemma4	5/10	2/10	6/10	5/10	7/10	Rejected

Real-world tool use test results

One of the most revealing tests was simply asking each model to create and edit files using OpenCode's tools:

Model	Creates Files	Edits Files	Uses Tools
qwen3.5:9b	✅	✅	✅
qwen3.5:2b	✅	✅	✅
Ministral 3B	✅	Sometimes	✅
qwen2.5-coder:7b	Mixed	Mixed	Mixed
gemma4	❌	❌	❌

gemma4 refused editing tasks despite seeing the available tools. llama3.1:8b worked but delivered poor value for its resource cost.

qwen2.5-coder:7b scores well on coding benchmarks but has mixed tool use in practice — it underperformed as an OpenCode agent compared to expectations. This is the core finding: benchmark scores do not predict agent capability.

qwen3.5:9b is the main model: strong reasoning, reliable tool use, excellent for architecture and documentation work. qwen3.5:2b is the fast companion for quick tasks and notes.

The Context Window Problem

This was one of the most important — and most surprising — findings.

OpenCode needs more context than you expect

When OpenCode was connected to a model with 8k context, this error appeared immediately:

n_keep: 26460 >= n_ctx: 8192

Then later, after attempts to reduce it:

n_keep: 8232 >= n_ctx: 8192

OpenCode's initial system prompt alone exceeds 8k tokens. This means the model never even received the user's first message — the context was already full before the conversation started.

The fix: set LM Studio context to 32,768 tokens and match that in the OpenCode model configuration.

Recommended context sizes

Tool	Runtime	Context
OpenCode	LM Studio	32,768
Continue.dev	Either	8,192

The Final Stack

After all this testing, the recommended setup has three modes:

Primary: Architecture & Documentation Work

OpenCode + LM Studio + qwen3.5:9b
Context: 32,768

Use for: architecture reviews, documentation, ADRs, planning, repository exploration, Obsidian notes

Fast Mode: Quick Tasks

OpenCode + LM Studio + qwen3.5:2b
Context: 32,768

Use for: quick questions, notes, summaries, repository exploration

IDE Companion

Continue.dev + LM Studio or Ollama + qwen3.5:2b
Context: 8,192

Use for: IDE chat, code explanations, autocomplete, small edits

Quick Decision Table

Task	Tool	Runtime	Model
Architecture discussion	OpenCode	LM Studio	qwen3.5:9b
Writing documentation	OpenCode	LM Studio	qwen3.5:9b
Obsidian notes / ADRs	OpenCode	LM Studio	qwen3.5:9b
Repository exploration	OpenCode	LM Studio	qwen3.5:9b
Planning	OpenCode	LM Studio	qwen3.5:9b
Quick questions / notes	OpenCode	LM Studio	qwen3.5:2b
IDE chat / explanation	Continue.dev	LM Studio or Ollama	qwen3.5:2b
Autocomplete	Continue.dev	LM Studio or Ollama	qwen3.5:2b

Installation

Ollama (for Continue.dev)

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3.5:2b

LM Studio (for OpenCode)

Download from the official LM Studio website. Load qwen3.5:9b for primary work or qwen3.5:2b for fast mode, set context to 32,768, and start the local server.

Tip: LM Studio's Zen models are particularly effective on machines with limited RAM. They are well-optimized for Apple Silicon and worth trying if you want good performance without pushing memory limits.

Verify the server is running:

curl http://localhost:1234/v1/models

OpenCode

curl -fsSL https://opencode.ai/install | bash

Configure ~/.config/opencode/opencode.json to point at LM Studio. The model name in the config must match the exact id returned by /v1/models:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio Local",
      "options": {
        "baseURL": "http://localhost:1234/v1"
      },
      "models": {
        "qwen/qwen3.5-9b": {
          "name": "Qwen 3.5 9B (Main)",
          "limit": {
            "context": 32768,
            "output": 2048
          }
        },
        "qwen/qwen3.5-2b": {
          "name": "Qwen 3.5 2B (Fast)",
          "limit": {
            "context": 32768,
            "output": 2048
          }
        }
      }
    }
  }
}

Continue.dev

Install the Continue extension in VSCode and point it at LM Studio or Ollama with qwen3.5:2b.

Lessons Learned

Assumption	Reality
Best coding benchmark = best agent	False — tool use capability matters more
Bigger model = better experience	False on 16 GB RAM
qwen2.5-coder:7b is the best local model	Strong coder, weak agent — mixed tool use
Aider is the best local coding tool	Not recommended for this workflow
Ollama is the best runtime	Not for OpenCode on this machine
Context size is a minor detail	False — OpenCode requires 32k minimum
LM Studio is just a GUI	False — it is the better OpenCode runtime

Future Upgrade Path

If upgrading to 32 GB RAM: evaluate larger models and bigger contexts.

If upgrading to 64 GB RAM: re-evaluate 32B-class models and 64k contexts.

On 16 GB RAM, avoid running memory-heavy apps (Docker Desktop, multiple browser windows, IDE indexing) at the same time as a loaded 9B model.

Conclusion

The biggest discovery was not a model — it was a combination:

OpenCode + LM Studio delivered a substantially better experience than any other setup tested.

The setup that works on a 16 GB MacBook Pro M1:

OpenCode for thinking, planning, and writing
Continue.dev for IDE support and autocomplete
LM Studio as the OpenCode runtime
32k context for OpenCode, 8k for Continue.dev
qwen3.5:9b as the main model, qwen3.5:2b for fast tasks

The principle that guided every decision: agent capability matters more than benchmark scores.

Go to Top

The Goal​

The Biggest Discovery​

Tools Evaluated​

Tool Comparison Results​

OpenCode​

Aider: The Surprise Disappointment​

Continue.dev​

Runtime Comparison​

What went wrong with Ollama and OpenCode​

Why LM Studio worked better​

Model Comparison​

Real-world tool use test results​

The Context Window Problem​

OpenCode needs more context than you expect​

Recommended context sizes​

The Final Stack​

Primary: Architecture & Documentation Work​

Fast Mode: Quick Tasks​

IDE Companion​

Quick Decision Table​

Installation​

Ollama (for Continue.dev)​

LM Studio (for OpenCode)​

OpenCode​

Continue.dev​

Lessons Learned​

Future Upgrade Path​

Conclusion​

The Goal

The Biggest Discovery

Tools Evaluated

Tool Comparison Results

OpenCode

Aider: The Surprise Disappointment

Continue.dev

Runtime Comparison

What went wrong with Ollama and OpenCode

Why LM Studio worked better

Model Comparison

Real-world tool use test results

The Context Window Problem

OpenCode needs more context than you expect

Recommended context sizes

The Final Stack

Primary: Architecture & Documentation Work

Fast Mode: Quick Tasks

IDE Companion

Quick Decision Table

Installation

Ollama (for Continue.dev)

LM Studio (for OpenCode)

OpenCode

Continue.dev

Lessons Learned

Future Upgrade Path

Conclusion