AI Engineering Glossary

A friendly glossary of the jargon you'll hit in Anthropic's "Building effective agents" essay and most other AI engineering reading.

How to use this: skim once. Don't memorize. When you hit a confusing word in the essay, come back here, read the entry, return to the essay. After a week of building, most of these will feel obvious.

The absolute basics

LLM (Large Language Model) A neural network trained on huge amounts of text. It generates text by predicting one token at a time. Examples: Claude, GPT, Gemini. When people say "the model" in AI engineering, they usually mean an LLM.

Token The chunks of text an LLM reads and writes — roughly ¾ of a word in English. "Hello world" is about 2 tokens. Pricing and context limits are measured in tokens.

Prompt The text you send to an LLM. Includes any instructions, examples, and the actual question or task.

Completion / response / output The text the LLM generates back. Same thing, three names.

Inference A fancy word for "running the model to get a response." When you "do inference" you're sending a prompt and getting back a completion.

Context / context window The total amount of text (input + output) the LLM can hold in its working memory at once, measured in tokens. Modern models have 100,000+ token windows.

System prompt A special part of the prompt that frames the model's role, persona, or rules. Higher priority than regular user messages. CLAUDE.md and .github/copilot-instructions.md are stored versions of system prompts.

User message vs assistant message In multi-turn conversations: "user" messages are what you say; "assistant" messages are what the model says. The model sees the whole history each turn.

Patterns and techniques

Prompt engineering The craft of writing prompts that reliably produce good output. Includes phrasing, structure, examples, formatting.

Zero-shot Asking the model to do something without giving examples. Just instructions.

Few-shot Giving the model 2–5 examples of input-output pairs before the real question. Dramatically improves consistency for many tasks.

Chain-of-thought (CoT) Asking the model to "think step by step" before answering. The model writes its reasoning out, then gives the answer. Improves performance on complex tasks.

Structured output Forcing the model to return data in a specific format (JSON with specific fields, for example). Critical for downstream automation.

Hallucination When a model confidently produces something false because it sounds plausible. The model isn't lying — it's pattern-matching wrong. RAG and tool use reduce hallucinations.

Grounding Giving the model real source material (documents, databases, search results) to base its answer on, instead of relying on its training data. Reduces hallucinations.

RAG and retrieval

RAG (Retrieval-Augmented Generation) A pattern: before the model answers, search a database for relevant documents and stuff them into the prompt. The model then answers based on those documents instead of just its training. This is how AI systems "know" your company's data.

Embedding A list of numbers (a vector) that represents the meaning of a piece of text. Two pieces of text with similar meanings have embeddings that are close to each other in number-space.

Vector / vector store / vector database A database that stores embeddings and lets you search for similar ones quickly. Examples: pgvector (Postgres extension), Pinecone, Turbopuffer.

Semantic search Searching by meaning rather than by exact keyword. Built on embeddings.

Chunking Splitting long documents into smaller pieces before embedding them. The chunk size and strategy matters a lot for retrieval quality.

Top-K retrieval Searching the vector store and returning the top K most similar results (e.g. top 3 or top 5).

Re-ranking After retrieving the top K, using a more accurate (and slower) model to re-order them. Improves quality at the cost of latency.

Hybrid search Combining vector (semantic) search with traditional keyword search. Often better than either alone.

Tools and agents

Tool use / function calling / tool calling Giving the LLM access to functions it can ask your code to run (e.g. get_weather, read_file, search_database). The LLM doesn't run them itself — it outputs a request, your code runs it, and you feed the result back.

Tool Any function you've made available to the LLM. "Tools" can be simple (read a file) or complex (call an API, search the web).

Agent An LLM in a loop with tools, where the LLM observes results and decides what to do next, until it thinks the task is done. This is the central concept.

Workflow A predefined sequence of LLM calls (often using tools) where the steps are decided by the developer, not the LLM. The LLM follows a script.

Workflow vs agent (the key Anthropic distinction) A workflow has the path fixed; the LLM only fills in the steps. An agent has the path decided by the LLM each turn. Workflows are more predictable and cheaper. Agents are more flexible. Most "agent" projects are actually workflows in disguise — and that's often a good thing.

Multi-agent / agent orchestration Several specialized agents working together (e.g. analyzer agent → drafter agent → reviewer agent), passing work between each other.

Tool result What your code returns to the model after running a tool. The model uses this to decide its next move.

MCP (Model Context Protocol) A standard for plugging tools into agents. Anthropic published MCP in 2024 and it's become widely adopted. Lets you connect agents to external services (databases, GitHub, Slack, etc.) without reinventing the wheel. MCP connects agents to tools.

Orchestration layer The "nervous system" of an agent — the code that runs the Think→Act→Observe loop, manages memory/state, and decides when the model should reason vs. call a tool.

ReAct (Reason + Act) A prompting/orchestration pattern where the model alternates between reasoning steps and tool actions, feeding each observation back in. The workhorse loop behind most agents.

Context engineering The successor term to "prompt engineering": deliberately curating everything in the context window for each LM call — instructions, facts, tools, history, memory — so the model has just the right information. Most "AI doesn't work" cases are really wrong-context cases.

HITL (Human-in-the-Loop) A deliberate pause where a human approves or corrects the agent before it takes a significant or irreversible action. Both a tool (ask_for_confirmation()) and a design pattern.

A2A (Agent2Agent) An open standard for agents to discover and talk to each other. Each agent publishes an Agent Card (a JSON "business card" of its capabilities and endpoint). A2A connects agents to agents (vs. MCP, which connects agents to tools).

Common Anthropic-essay words

Augmented LLM Anthropic's term for an LLM with retrieval, tools, and memory. The basic building block before you have an agent.

Prompt chaining Breaking a task into multiple sequential LLM calls, where each call's output feeds the next. (e.g. summarize → translate → format.) This is a workflow pattern.

Routing Using one LLM call to classify the input, then routing it to a specialized prompt or sub-agent based on the classification. Workflow pattern.

Parallelization Running multiple LLM calls in parallel and combining results. Two flavors: sectioning (split work into pieces, do each, combine) and voting (run the same prompt N times, take the consensus).

Orchestrator-workers A central LLM ("orchestrator") plans the work and delegates pieces to other LLMs ("workers"). Then aggregates the results.

Evaluator-optimizer Two LLMs in a loop: one generates output, the other evaluates and gives feedback, the first revises. Like having a writer and an editor.

Autonomous agent An LLM that operates in an open-ended loop, deciding its own actions until it thinks the task is done. Higher freedom, higher risk, harder to predict.

Production engineering words

Eval / evaluation A test of an AI system's output quality. Crucial for knowing if your changes help or hurt.

Golden dataset / golden set A curated set of input-output pairs used as the source of truth for evals.

LLM-as-judge Using one LLM to evaluate another LLM's output against a rubric. The standard way to evaluate at scale.

Agent Ops (GenAIOps) DevOps/MLOps adapted for agents: define business KPIs, score quality with LLM-as-judge over a golden dataset, gate deploys on metrics, debug with traces, and feed human feedback back into evals.

Latency How long it takes to get a response. Big deal in production. Streaming helps.

Streaming The model returns its response token-by-token as it generates, rather than waiting for the full response. Better UX.

Throughput How many requests you can handle per second. Different concern from latency.

Cost / token spend Every API call costs money proportional to tokens used. Cost optimization is real engineering.

Caching / prompt caching Saving the result of expensive API calls (or parts of prompts) to avoid re-paying. Anthropic and OpenAI both have prompt caching features.

Rate limiting APIs limit how many calls per minute you can make. Production systems need to handle this gracefully.

Backoff / exponential backoff When an API call fails, wait a bit, retry. If it fails again, wait longer. Standard pattern.

Idempotency A property of operations: running the same operation twice has the same effect as running it once. Important when retries can happen.

Observability Being able to see what your system is doing in production. Logs, metrics, traces. Tools: Langfuse, OpenTelemetry.

Model families and tiers

Frontier model The most capable model from a given provider. Examples in 2026: Claude Opus, GPT-5, Gemini Ultra. More expensive, slower, smarter.

Smaller / cheaper / faster models Less capable but much cheaper and faster. Good for simple tasks, classification, formatting. Examples: Claude Haiku.

Model routing Using cheaper models for simple tasks, expensive ones for hard tasks. Saves money.

Fine-tuning Training a model further on your own data so it gets better at your specific task. Less common in 2026 than people think — usually prompting + RAG is enough.

Foundation model A general-purpose pre-trained LLM (like Claude or GPT). The starting point that gets fine-tuned or used directly.

Words you'll see in the Anthropic essay specifically

"Agentic systems" Their umbrella term for both workflows and agents.

"Prescribed paths" A workflow's fixed step sequence.

"Dynamic" or "open-ended" What an agent does — chooses its own steps.

"Building blocks" The basic patterns (augmented LLM, etc.) you compose into bigger systems.

"Composability" The ability to combine simple patterns into more complex ones.

"Predictability" A virtue of workflows. You know what they'll do.

"Flexibility" A virtue of agents. They handle unexpected cases.

"Tradeoffs between latency, cost, and performance" The constant engineering tension. More LLM calls = better quality, more cost, more latency.

"Production" "Real, used by real users, with real consequences." As opposed to demos and prototypes.

Quick decoder for the essay's headings

When you read the Anthropic essay, expect these section names. Now you have a head start:

What are agents? — Their definition. The workflow vs agent distinction.
When (and when not) to use agents — Most cases don't need agents; simpler is usually better.
When and how to use frameworks — Their advice on LangChain et al.
Building blocks, workflows, and agents — The patterns:
- Augmented LLM (basic block)
- Prompt chaining (workflow)
- Routing (workflow)
- Parallelization (workflow)
- Orchestrator-workers (workflow)
- Evaluator-optimizer (workflow)
- Agents (when LLM controls the loop)
Combining and customizing these patterns — Mix and match.
Summary — Their recommendations.

How to read the essay now

With this glossary nearby, the essay should read more smoothly. A few tips:

Don't try to absorb everything. First pass is just to get the shape.
Notice the patterns. Each one solves a different kind of problem. You don't need to memorize the names — you'll come back to them.
The most important takeaway: start with the simplest thing that works. Don't reach for agents when a workflow will do. Don't reach for a workflow when one prompt will do.
The second most important: evals matter. Without them you're guessing.

When a word still confuses you after the glossary, just keep reading. Most of it clicks after you've actually built something.

After the essay: what to do

When you finish, take a 10-minute break. Then open building-an-agentic-system.md and start the build. The build will make the essay's concepts click in a way no amount of reading can.

AI Engineering Glossary

The absolute basics​

Patterns and techniques​

RAG and retrieval​

Tools and agents​

Common Anthropic-essay words​

Production engineering words​

Model families and tiers​

Words you'll see in the Anthropic essay specifically​

Quick decoder for the essay's headings​

How to read the essay now​

After the essay: what to do​