Beginner20 min

Learn how LLMs work, explore the major model families (Claude, GPT, Gemini), and understand context windows and tokens.

Understanding AI Coding Models

Behind every AI coding tool is a Large Language Model (LLM). Understanding how these models work will help you use them more effectively and set realistic expectations.

What Are Large Language Models?

Large Language Models are artificial neural networks trained on massive amounts of text data—including millions of lines of code. They're built on the Transformer architecture, which uses a mechanism called "self-attention" to understand relationships between different parts of text.

How LLMs Generate Code

When you ask an AI tool to write code, here's what happens:

  1. Tokenization: Your prompt is broken into smaller units called tokens (roughly 4 characters each)
  2. Context Analysis: The model processes all tokens, understanding relationships between them
  3. Next-Token Prediction: Based on patterns learned during training, it predicts the most likely next token
  4. Iterative Generation: This process repeats token-by-token until the response is complete

Think of it like a very sophisticated autocomplete—but one that understands programming patterns, syntax rules, and even coding conventions.

Major Model Families

Claude (Anthropic)

Claude models power Claude Code and are widely used in Cursor, Lovable, and other tools.

ModelBest ForContext Window
Claude Opus 4.6Complex architecture, deep reasoning, 14.5-hour task horizon200K tokens
Claude Sonnet 4.6Daily coding tasks, balanced performance, 40% cheaper than Opus200K tokens
Claude Haiku 4.5Quick completions, simple tasks, cost-efficient200K tokens

Key Strengths:

  • Excellent at following complex instructions
  • Strong code comprehension and reasoning
  • 80.8% on SWE-Bench (highest score, February 2026)
  • Extended thinking capabilities for complex problems

GPT (OpenAI)

OpenAI's models power GitHub Copilot, OpenAI Codex, and are available in Cursor and other tools.

ModelBest ForContext Window
GPT-5.4Current flagship, native computer-use1M tokens
GPT-5.4 ThinkingComplex reasoning with extended thinking1M tokens
GPT-5-miniCost-optimized, fast responses128K tokens

Key Strengths:

  • Native computer-use capabilities
  • Multiple reasoning levels (from fast to deep thinking)
  • Excellent multimodal capabilities (code + images)
  • Powers the new OpenAI Codex CLI and app

Note: GPT-4 series (GPT-4 Turbo, GPT-4o, GPT-4.1) has been retired from ChatGPT and replaced by the GPT-5 series.

Gemini (Google)

Google's Gemini models excel at web development and visual tasks.

ModelBest ForContext Window
Gemini 3.1 ProLatest flagship, advanced reasoning2M tokens
Gemini 3 FlashHigh-volume, cost-sensitive tasks1M tokens
Gemini 2.5 ProStill available, proven reliability2M tokens

Key Strengths:

  • Huge context windows (up to 2M tokens)
  • Excellent at visually compelling web apps
  • Strong performance on web development benchmarks
  • Powers Google Jules autonomous coding agent

Note: Gemini 3 Pro Preview was deprecated March 9, 2026. Use Gemini 3.1 Pro Preview for latest features.

Which Tools Use Which Models?

ToolDefault ModelOther Options
CursorComposer (custom model)Claude, GPT-5, Gemini
Claude CodeClaude Sonnet 4.6Opus 4.6, Haiku 4.5
GitHub CopilotGPT-5.4Claude Opus 4.5, Gemini 3 Flash
WindsurfWindsurf SWEGPT-5.4, Claude 4, Gemini 3
OpenAI CodexGPT-5.4GPT-5.4 Thinking
LovableGemini 3 FlashClaude 4, GPT-5.2
Bolt.newClaude Opus 4.6Claude Sonnet 4.6
Google JulesGemini 3 Pro

Understanding Context Windows

A context window is the maximum amount of text a model can process at once. Think of it as the model's working memory.

What Fits in the Context Window?

Everything you send and receive must fit:

  • Your prompt and instructions
  • Any code files you include
  • The conversation history
  • The model's response

Context Window Sizes

SizeApproximate Content
8K tokens~6,000 words (short document)
32K tokens~24,000 words (novella)
128K tokens~96,000 words (300+ pages)
200K tokens~150,000 words (entire codebase)

The "Lost in the Middle" Problem

Research shows that models often struggle with information placed in the middle of very long contexts. They pay more attention to the beginning and end.

Best Practice: Put your most important context at the beginning of your prompt.

Understanding Tokens

Tokens are the fundamental units that LLMs process. A token is approximately:

  • 4 characters in English
  • 0.75 words

Token Examples

TextApproximate Tokens
"function"1 token
"getUserData"2-3 tokens
1,000 words~1,333 tokens
Typical React component500-2,000 tokens

Why Tokens Matter

  1. Cost: API pricing is per token (input and output separately)
  2. Limits: Context windows are measured in tokens
  3. Speed: More tokens = longer generation time

Choosing the Right Model

For most daily coding tasks, a balanced model like Claude Sonnet 4.6 or GPT-5.4 works well. Here's when to use different tiers:

Use CaseRecommended Model
Quick code completionsHaiku 4.5 / GPT-5-mini
Daily developmentSonnet 4.6 / GPT-5.4
Complex architectureOpus 4.6 / GPT-5.4 Thinking
Large codebasesGemini 3.1 Pro (2M context)
Extended reasoningOpus 4.6 / GPT-5.4 Thinking

Understanding "Reasoning" Models

Modern AI includes reasoning models (like GPT-5.4 Thinking and Claude with extended thinking) that "think through" problems before responding. These models:

  • Don't need explicit "think step by step" prompts—they reason internally
  • Are better for complex multi-step problems
  • Take longer but produce more accurate results
  • Cost more per request

Key Benchmarks

When comparing models, these benchmarks help measure coding ability:

BenchmarkWhat It TestsTop Scores (March 2026)
HumanEvalBasic function generation99% (top models)
SWE-Bench VerifiedReal GitHub issue fixing80.8% (Claude Opus 4.6)
SWE-Bench ProHarder real-world tasks~25% (shows realistic limitations)
WebDev ArenaWeb application buildingv0 leads

Current SWE-Bench Leaders:

  1. Claude Opus 4.6 — 80.8%
  2. Claude Sonnet 4.6 — 79.6%
  3. GPT-5.4 Thinking — ~78%

Important: Benchmark scores don't always reflect your specific use case. The best model depends on your workflow and the types of tasks you do most often.

Summary

  • LLMs generate code by predicting the most likely next token based on patterns learned during training
  • Major model families include Claude (Anthropic), GPT (OpenAI), and Gemini (Google)
  • Context windows determine how much code the model can "see" at once
  • Tokens are the fundamental units—roughly 4 characters each
  • Different models excel at different tasks; most tools let you choose

Next Steps

Now that you understand how AI models work, let's learn how to communicate with them effectively through prompt engineering.

Mark this lesson as complete to track your progress