Beginner20 min

Learn how LLMs work, explore the major model families (Claude, GPT, Gemini), and understand context windows and tokens.

Understanding AI Coding Models

Behind every AI coding tool is a Large Language Model (LLM). Understanding how these models work will help you use them more effectively and set realistic expectations.

What Are Large Language Models?

Large Language Models are artificial neural networks trained on massive amounts of text data—including millions of lines of code. They're built on the Transformer architecture, which uses a mechanism called "self-attention" to understand relationships between different parts of text.

How LLMs Generate Code

When you ask an AI tool to write code, here's what happens:

Tokenization: Your prompt is broken into smaller units called tokens (roughly 4 characters each)
Context Analysis: The model processes all tokens, understanding relationships between them
Next-Token Prediction: Based on patterns learned during training, it predicts the most likely next token
Iterative Generation: This process repeats token-by-token until the response is complete

Think of it like a very sophisticated autocomplete—but one that understands programming patterns, syntax rules, and even coding conventions.

Major Model Families

Claude (Anthropic)

Claude models power Claude Code and are widely used in Cursor, Lovable, and other tools.

Model	Best For	Context Window
Claude Opus 4.6	Complex architecture, deep reasoning, 14.5-hour task horizon	200K tokens
Claude Sonnet 4.6	Daily coding tasks, balanced performance, 40% cheaper than Opus	200K tokens
Claude Haiku 4.5	Quick completions, simple tasks, cost-efficient	200K tokens

Key Strengths:

Excellent at following complex instructions
Strong code comprehension and reasoning
80.8% on SWE-Bench (highest score, February 2026)
Extended thinking capabilities for complex problems

GPT (OpenAI)

OpenAI's models power GitHub Copilot, OpenAI Codex, and are available in Cursor and other tools.

Model	Best For	Context Window
GPT-5.4	Current flagship, native computer-use	1M tokens
GPT-5.4 Thinking	Complex reasoning with extended thinking	1M tokens
GPT-5-mini	Cost-optimized, fast responses	128K tokens

Key Strengths:

Native computer-use capabilities
Multiple reasoning levels (from fast to deep thinking)
Excellent multimodal capabilities (code + images)
Powers the new OpenAI Codex CLI and app

Note: GPT-4 series (GPT-4 Turbo, GPT-4o, GPT-4.1) has been retired from ChatGPT and replaced by the GPT-5 series.

Gemini (Google)

Google's Gemini models excel at web development and visual tasks.

Model	Best For	Context Window
Gemini 3.1 Pro	Latest flagship, advanced reasoning	2M tokens
Gemini 3 Flash	High-volume, cost-sensitive tasks	1M tokens
Gemini 2.5 Pro	Still available, proven reliability	2M tokens

Key Strengths:

Huge context windows (up to 2M tokens)
Excellent at visually compelling web apps
Strong performance on web development benchmarks
Powers Google Jules autonomous coding agent

Note: Gemini 3 Pro Preview was deprecated March 9, 2026. Use Gemini 3.1 Pro Preview for latest features.

Which Tools Use Which Models?

Tool	Default Model	Other Options
Cursor	Composer (custom model)	Claude, GPT-5, Gemini
Claude Code	Claude Sonnet 4.6	Opus 4.6, Haiku 4.5
GitHub Copilot	GPT-5.4	Claude Opus 4.5, Gemini 3 Flash
Windsurf	Windsurf SWE	GPT-5.4, Claude 4, Gemini 3
OpenAI Codex	GPT-5.4	GPT-5.4 Thinking
Lovable	Gemini 3 Flash	Claude 4, GPT-5.2
Bolt.new	Claude Opus 4.6	Claude Sonnet 4.6
Google Jules	Gemini 3 Pro	—

Understanding Context Windows

A context window is the maximum amount of text a model can process at once. Think of it as the model's working memory.

What Fits in the Context Window?

Everything you send and receive must fit:

Your prompt and instructions
Any code files you include
The conversation history
The model's response

Context Window Sizes

Size	Approximate Content
8K tokens	~6,000 words (short document)
32K tokens	~24,000 words (novella)
128K tokens	~96,000 words (300+ pages)
200K tokens	~150,000 words (entire codebase)

The "Lost in the Middle" Problem

Research shows that models often struggle with information placed in the middle of very long contexts. They pay more attention to the beginning and end.

Best Practice: Put your most important context at the beginning of your prompt.

Understanding Tokens

Tokens are the fundamental units that LLMs process. A token is approximately:

4 characters in English
0.75 words

Token Examples

Text	Approximate Tokens
"function"	1 token
"getUserData"	2-3 tokens
1,000 words	~1,333 tokens
Typical React component	500-2,000 tokens

Why Tokens Matter

Cost: API pricing is per token (input and output separately)
Limits: Context windows are measured in tokens
Speed: More tokens = longer generation time

Choosing the Right Model

For most daily coding tasks, a balanced model like Claude Sonnet 4.6 or GPT-5.4 works well. Here's when to use different tiers:

Use Case	Recommended Model
Quick code completions	Haiku 4.5 / GPT-5-mini
Daily development	Sonnet 4.6 / GPT-5.4
Complex architecture	Opus 4.6 / GPT-5.4 Thinking
Large codebases	Gemini 3.1 Pro (2M context)
Extended reasoning	Opus 4.6 / GPT-5.4 Thinking

Understanding "Reasoning" Models

Modern AI includes reasoning models (like GPT-5.4 Thinking and Claude with extended thinking) that "think through" problems before responding. These models:

Don't need explicit "think step by step" prompts—they reason internally
Are better for complex multi-step problems
Take longer but produce more accurate results
Cost more per request

Key Benchmarks

When comparing models, these benchmarks help measure coding ability:

Benchmark	What It Tests	Top Scores (March 2026)
HumanEval	Basic function generation	99% (top models)
SWE-Bench Verified	Real GitHub issue fixing	80.8% (Claude Opus 4.6)
SWE-Bench Pro	Harder real-world tasks	~25% (shows realistic limitations)
WebDev Arena	Web application building	v0 leads

Current SWE-Bench Leaders:

Claude Opus 4.6 — 80.8%
Claude Sonnet 4.6 — 79.6%
GPT-5.4 Thinking — ~78%

Important: Benchmark scores don't always reflect your specific use case. The best model depends on your workflow and the types of tasks you do most often.

Summary

LLMs generate code by predicting the most likely next token based on patterns learned during training
Major model families include Claude (Anthropic), GPT (OpenAI), and Gemini (Google)
Context windows determine how much code the model can "see" at once
Tokens are the fundamental units—roughly 4 characters each
Different models excel at different tasks; most tools let you choose

Next Steps

Now that you understand how AI models work, let's learn how to communicate with them effectively through prompt engineering.

Mark this lesson as complete to track your progress