AI Models
A model is the reasoning engine at the heart of every AI system. It is the part that reads your input β text, images, code, documents β and generates a response. Everything else (tools, memory, harnesses) is infrastructure built around the model.
What a model actually does
A language model is trained on enormous amounts of text β books, websites, code, conversations. Through this training it learns patterns, facts, reasoning strategies, and language structure.
When you send it a message, it does not look up a pre-stored answer. It generates the most statistically likely continuation of your input, token by token, based on everything it learned during training. The remarkable result is that this process produces genuinely useful reasoning, writing, and code at scale.
Modern frontier models also support multimodality β they can process images, PDFs, audio, and even video alongside text.
Key properties to understand
Context window
How much text the model can process at once β your prompt, its reply, any attached documents, tool outputs, and conversation history. Measured in tokens (roughly 1 token β ΒΎ of a word, or ~4 characters).
A small context window means the model loses earlier parts of a long conversation. A large context window (100k+ tokens) means it can process an entire codebase or lengthy contract in one pass.
Reasoning capability
Not all models are equally capable at multi-step reasoning, catching logical errors, handling ambiguity, or following complex instructions. More capable models tend to be slower and more expensive β but for complex tasks, the quality difference justifies the cost.
Speed and cost
Smaller, faster models cost 10β100x less per token than frontier models. For high-volume, simple tasks (classifying support tickets, extracting structured data from forms, generating short summaries), a smaller model often works just as well at a fraction of the cost.
Instruction following
How reliably the model does exactly what you ask. This varies significantly across models and is often more important than raw reasoning capability for production use cases. A model that consistently follows a detailed system prompt is more useful than a more intelligent model that goes off-script.
Safety and refusals
How the model handles sensitive, ambiguous, or potentially harmful requests. Models vary significantly in their calibration β some refuse too aggressively (blocking legitimate use cases), others too permissively. For agents taking real-world actions, this matters.
The major models (2025β2026)
| Model | Made by | Context | Best for |
|---|---|---|---|
| Claude Opus 4 / Sonnet 4 | Anthropic | 200k tokens | Complex reasoning, long documents, nuanced instruction following, safety-critical applications |
| GPT-4o | OpenAI | 128k tokens | General purpose, strong tool use, multimodal (images, audio) |
| o3 / o4-mini | OpenAI | 128k tokens | Deep reasoning tasks, math, competitive coding |
| Gemini 1.5 / 2.0 | 1M tokens | Very long documents, multimodal, Google Workspace integration | |
| Llama 3.3 / 4 | Meta | 128k tokens | Self-hosted deployment, fine-tuning, cost control, no data leaving your infrastructure |
| Mistral Large | Mistral AI | 128k tokens | European data compliance, efficient inference, strong multilingual support |
| Deepseek V3 / R1 | Deepseek | 128k tokens | Strong reasoning, very low cost, competitive with frontier models on benchmarks |
Starting out? Use Claude for most tasks. It follows complex, nuanced instructions reliably, handles long documents well, and is designed with safety in mind. Switch models only when you have a specific reason β not because another model scored better on a benchmark for a task unrelated to yours.
Choosing the right model for your task
Different tasks call for different models. Here is a practical decision guide:
Use a frontier model (Claude Opus, GPT-4o, Gemini 2.0) when:
- The task requires complex multi-step reasoning
- Mistakes are costly or hard to catch
- You are processing long, unstructured documents
- Nuanced instruction following is critical
- You are still in development and optimizing for quality
Use a fast/small model (Claude Haiku, GPT-4o-mini, Gemini Flash) when:
- You have high volume and cost matters
- The task is well-defined and relatively simple (classification, extraction, short summaries)
- Latency matters β for real-time user interactions
- You can validate quality with a test set before deploying
Use an open-source model (Llama, Mistral) when:
- Data cannot leave your infrastructure
- You need to fine-tune on proprietary data
- You have the infrastructure to run it
- Cost at scale is a primary constraint
Use a reasoning model (o3, Deepseek R1) when:
- The task is a hard reasoning problem β math, logic, competitive programming
- You need the model to show its work and check itself
- You can tolerate higher latency (reasoning models think longer before answering)
Running models: your options
| Option | Example | Control | Cost | Setup |
|---|---|---|---|---|
| API (hosted) | Anthropic API, OpenAI API | Low | Pay per token | Minutes |
| API (third-party) | Together AI, Groq, Fireworks | Medium | Often cheaper | Minutes |
| Self-hosted | Ollama, vLLM with Llama | High | Infrastructure cost | Hours/days |
| Fine-tuned | Your model on OpenAI or Replicate | Very high | Higher | Days/weeks |
For most use cases, start with a hosted API. Self-hosting only makes sense when you have a genuine data sovereignty requirement or are running at a scale where the infrastructure savings outweigh the operational complexity.
Models vs. agents
A model alone is like a very capable person sitting in a room with no phone, no computer, no pen β just their mind. They can reason and advise brilliantly, but they cannot do anything in the world.
An agent gives the model hands. It can browse the web, run code, send emails, update databases. The model is the brain. The agent is the entire system built around that brain.
Next: What are agents? β