- Learn
- Foundations
- Understanding AI Coding Models
Learn how LLMs work, explore the major model families (Claude, GPT, Gemini), and understand context windows and tokens.
Understanding AI Coding Models
Behind every AI coding tool is a Large Language Model (LLM). Understanding how these models work will help you use them more effectively and set realistic expectations.
What Are Large Language Models?
Large Language Models are artificial neural networks trained on massive amounts of text data—including millions of lines of code. They're built on the Transformer architecture, which uses a mechanism called "self-attention" to understand relationships between different parts of text.
How LLMs Generate Code
When you ask an AI tool to write code, here's what happens:
- Tokenization: Your prompt is broken into smaller units called tokens (roughly 4 characters each)
- Context Analysis: The model processes all tokens, understanding relationships between them
- Next-Token Prediction: Based on patterns learned during training, it predicts the most likely next token
- Iterative Generation: This process repeats token-by-token until the response is complete
Think of it like a very sophisticated autocomplete—but one that understands programming patterns, syntax rules, and even coding conventions.
Major Model Families
Claude (Anthropic)
Claude models power Claude Code and are widely used in Cursor, Lovable, and other tools.
| Model | Best For | Context Window |
|---|---|---|
| Claude Opus 4.6 | Complex architecture, deep reasoning, 14.5-hour task horizon | 200K tokens |
| Claude Sonnet 4.6 | Daily coding tasks, balanced performance, 40% cheaper than Opus | 200K tokens |
| Claude Haiku 4.5 | Quick completions, simple tasks, cost-efficient | 200K tokens |
Key Strengths:
- Excellent at following complex instructions
- Strong code comprehension and reasoning
- 80.8% on SWE-Bench (highest score, February 2026)
- Extended thinking capabilities for complex problems
GPT (OpenAI)
OpenAI's models power GitHub Copilot, OpenAI Codex, and are available in Cursor and other tools.
| Model | Best For | Context Window |
|---|---|---|
| GPT-5.4 | Current flagship, native computer-use | 1M tokens |
| GPT-5.4 Thinking | Complex reasoning with extended thinking | 1M tokens |
| GPT-5-mini | Cost-optimized, fast responses | 128K tokens |
Key Strengths:
- Native computer-use capabilities
- Multiple reasoning levels (from fast to deep thinking)
- Excellent multimodal capabilities (code + images)
- Powers the new OpenAI Codex CLI and app
Note: GPT-4 series (GPT-4 Turbo, GPT-4o, GPT-4.1) has been retired from ChatGPT and replaced by the GPT-5 series.
Gemini (Google)
Google's Gemini models excel at web development and visual tasks.
| Model | Best For | Context Window |
|---|---|---|
| Gemini 3.1 Pro | Latest flagship, advanced reasoning | 2M tokens |
| Gemini 3 Flash | High-volume, cost-sensitive tasks | 1M tokens |
| Gemini 2.5 Pro | Still available, proven reliability | 2M tokens |
Key Strengths:
- Huge context windows (up to 2M tokens)
- Excellent at visually compelling web apps
- Strong performance on web development benchmarks
- Powers Google Jules autonomous coding agent
Note: Gemini 3 Pro Preview was deprecated March 9, 2026. Use Gemini 3.1 Pro Preview for latest features.
Which Tools Use Which Models?
| Tool | Default Model | Other Options |
|---|---|---|
| Cursor | Composer (custom model) | Claude, GPT-5, Gemini |
| Claude Code | Claude Sonnet 4.6 | Opus 4.6, Haiku 4.5 |
| GitHub Copilot | GPT-5.4 | Claude Opus 4.5, Gemini 3 Flash |
| Windsurf | Windsurf SWE | GPT-5.4, Claude 4, Gemini 3 |
| OpenAI Codex | GPT-5.4 | GPT-5.4 Thinking |
| Lovable | Gemini 3 Flash | Claude 4, GPT-5.2 |
| Bolt.new | Claude Opus 4.6 | Claude Sonnet 4.6 |
| Google Jules | Gemini 3 Pro | — |
Understanding Context Windows
A context window is the maximum amount of text a model can process at once. Think of it as the model's working memory.
What Fits in the Context Window?
Everything you send and receive must fit:
- Your prompt and instructions
- Any code files you include
- The conversation history
- The model's response
Context Window Sizes
| Size | Approximate Content |
|---|---|
| 8K tokens | ~6,000 words (short document) |
| 32K tokens | ~24,000 words (novella) |
| 128K tokens | ~96,000 words (300+ pages) |
| 200K tokens | ~150,000 words (entire codebase) |
The "Lost in the Middle" Problem
Research shows that models often struggle with information placed in the middle of very long contexts. They pay more attention to the beginning and end.
Best Practice: Put your most important context at the beginning of your prompt.
Understanding Tokens
Tokens are the fundamental units that LLMs process. A token is approximately:
- 4 characters in English
- 0.75 words
Token Examples
| Text | Approximate Tokens |
|---|---|
| "function" | 1 token |
| "getUserData" | 2-3 tokens |
| 1,000 words | ~1,333 tokens |
| Typical React component | 500-2,000 tokens |
Why Tokens Matter
- Cost: API pricing is per token (input and output separately)
- Limits: Context windows are measured in tokens
- Speed: More tokens = longer generation time
Choosing the Right Model
For most daily coding tasks, a balanced model like Claude Sonnet 4.6 or GPT-5.4 works well. Here's when to use different tiers:
| Use Case | Recommended Model |
|---|---|
| Quick code completions | Haiku 4.5 / GPT-5-mini |
| Daily development | Sonnet 4.6 / GPT-5.4 |
| Complex architecture | Opus 4.6 / GPT-5.4 Thinking |
| Large codebases | Gemini 3.1 Pro (2M context) |
| Extended reasoning | Opus 4.6 / GPT-5.4 Thinking |
Understanding "Reasoning" Models
Modern AI includes reasoning models (like GPT-5.4 Thinking and Claude with extended thinking) that "think through" problems before responding. These models:
- Don't need explicit "think step by step" prompts—they reason internally
- Are better for complex multi-step problems
- Take longer but produce more accurate results
- Cost more per request
Key Benchmarks
When comparing models, these benchmarks help measure coding ability:
| Benchmark | What It Tests | Top Scores (March 2026) |
|---|---|---|
| HumanEval | Basic function generation | 99% (top models) |
| SWE-Bench Verified | Real GitHub issue fixing | 80.8% (Claude Opus 4.6) |
| SWE-Bench Pro | Harder real-world tasks | ~25% (shows realistic limitations) |
| WebDev Arena | Web application building | v0 leads |
Current SWE-Bench Leaders:
- Claude Opus 4.6 — 80.8%
- Claude Sonnet 4.6 — 79.6%
- GPT-5.4 Thinking — ~78%
Important: Benchmark scores don't always reflect your specific use case. The best model depends on your workflow and the types of tasks you do most often.
Summary
- LLMs generate code by predicting the most likely next token based on patterns learned during training
- Major model families include Claude (Anthropic), GPT (OpenAI), and Gemini (Google)
- Context windows determine how much code the model can "see" at once
- Tokens are the fundamental units—roughly 4 characters each
- Different models excel at different tasks; most tools let you choose
Next Steps
Now that you understand how AI models work, let's learn how to communicate with them effectively through prompt engineering.