The Six Components of an AI-First Engineering Methodology
A reference for the core components that underpin modern AI-first engineering methodologies (such as the publicly documented Gen-e2 framework). Written for someone new to AI engineering.
What an AI-first methodology actually isβ
It's a methodology β a way of working β not a tool you install or a service you call. It's a coordinated approach to using an agent runtime (GitHub Copilot, Claude Code, and increasingly multi-agent systems) across an entire engineering team.
Strip away any particular branding and the methodology consists of six components:
| Component | What it is | Public learning path |
|---|---|---|
| 1. Custom Copilot Instructions | Markdown files that frame Copilot's behavior per repo | GitHub's official docs |
| 2. Reusable prompt libraries | Folders of saved prompts for recurring tasks | Anthropic Cookbook, OpenAI Cookbook |
| 3. Agentic workflows in markdown | Markdown files in .github/workflows/ that run agents on triggers | github.com/githubnext/agentics |
| 4. Multi-agent orchestration | Specialized agents working in sequence | Anthropic's "Building effective agents" essay |
| 5. Project context management | Discipline of deciding what context the agent needs | Eugene Yan, Hamel Husain, Simon Willison blogs |
| 6. Human-in-the-loop checkpoints | Designed approval gates for agent output | Standard agent design practice |
Each component is described in detail below.
Component 1 β Custom Copilot Instructionsβ
What it isβ
A markdown file at .github/copilot-instructions.md (or CLAUDE.md for Claude Code)
that tells the agent everything it should know about a repo: conventions, style,
domain context, common pitfalls, output format expectations. Loaded automatically
every time the agent operates in that folder.
Why it mattersβ
Without this file, Copilot defaults to generic behavior β works for any repo, fits no repo well. With this file, Copilot becomes specifically tuned for the codebase. Same model, dramatically better output.
This is the single highest-leverage component of the methodology. Most of its value compresses into "write good instructions files."
Structure that worksβ
# [Project Name]
## Role
[What persona should the agent adopt? Senior engineer? Reviewer? Junior pair?]
## Domain context
- [What does the codebase do?]
- [Key entities and concepts]
- [Database / framework / language specifics]
- [Anything unusual about this codebase]
## Style
- [Naming conventions]
- [Code formatting preferences]
- [Common patterns the codebase uses]
- [Patterns the codebase avoids]
## Workflow
[For complex tasks, the steps to follow]
## Constraints
- [What the agent should never do]
- [What the agent should always check]
## Output format
[How responses should be structured]
What "good" looks like in practiceβ
A bad instructions file says: "Write good Python code."
A good instructions file says:
## Style
- Python 3.11 only β no Python 2 patterns
- We use Pydantic v2 (not v1) β note: `model_validator` not `validator`
- Type hints are required on all function signatures
- Async functions throughout β never mix sync/async in the same module
- We use snake_case for variables and functions, PascalCase for classes
- Test files live alongside source files, suffixed with `_test.py`
- We use pytest, not unittest
## Common pitfalls
- Don't use `dict()` to spread Pydantic models β use `model_dump()`
- Don't import from internal modules across package boundaries; use the public `__init__.py`
- Async context managers in test fixtures must use `pytest_asyncio.fixture`
The difference is specificity. Generic advice is useless to an agent (and to a human). Specific conventions are gold.
Public learning pathβ
- GitHub's docs: search "GitHub Copilot custom instructions" β official documentation
- Anthropic's Claude Code docs: the
CLAUDE.mdfile pattern is documented in Claude Code's quickstart - Real examples: browse
.github/copilot-instructions.mdfiles in popular open-source repos that use Copilot
Component 2 β Reusable prompt librariesβ
What it isβ
A folder (commonly named prompts/) of saved prompt files for recurring tasks.
Version-controlled. Composable. Each file is a focused instruction for one type of work.
Why it mattersβ
Engineers tend to retype prompts every time. The good prompts are forgotten; the bad prompts are reinvented; quality drifts. A library captures what works and lets you reuse it.
Same impact as code reusability, applied to prompts.
Examples of useful prompts in a libraryβ
For a data-processing project:
draft-script.mdβ the standard drafting promptreview-script.mdβ review a draft for safetyextract-intent.mdβ for ambiguous ticketsrollback-strategy.mdβ generate the rollback section
For a general engineering team:
explain-this-code.mdβ explain a function or modulepropose-refactor.mdβ suggest a refactoring with rationalecode-review.mdβ review a PR by a teammatewrite-unit-tests.mdβ generate tests for a functionmigrate-deprecated-api.mdβ find and migrate uses of a deprecated APIsummarize-pr.mdβ generate a PR description from a diff
What "good" looks likeβ
Each prompt file has:
- A clear single purpose
- Context about how to behave (role, persona)
- Step-by-step workflow
- Output format
- A note on when to use it
Public learning pathβ
- Anthropic Cookbook: github.com/anthropics/anthropic-cookbook β real, working prompts for common tasks
- OpenAI Cookbook: github.com/openai/openai-cookbook
- Awesome Copilot: github.com/copilot-academy β community-curated Copilot prompts
- Practitioner blogs: Hamel Husain, Simon Willison post specific prompts they use
Component 3 β Agentic workflows in markdownβ
What it isβ
A specific GitHub feature: markdown files placed in .github/workflows/ that describe
agent tasks in natural language, run by GitHub Actions on triggers (a new issue, a new
PR, a scheduled time, etc).
Format roughly:
---
on:
issues:
types: [opened]
permissions:
contents: read
issues: write
engine: copilot
---
# Triage New Issues
When a new issue is opened:
1. Read the issue title and body
2. Determine the appropriate label from the existing label list
3. Add the label to the issue
4. If the issue is a bug report, add a comment asking for reproduction steps if missing
The agent reads this, runs the steps, takes the actions.
Why it mattersβ
This is how agents become background workers instead of just interactive chat tools. They run when something happens β without you having to invoke them.
For a team, agentic workflows can:
- Auto-triage incoming issues
- Auto-summarize merged PRs at end of day
- Auto-generate weekly status updates from repo activity
- Auto-flag PRs that change critical files
- Auto-draft responses to common bug reports
Public learning pathβ
- GitHub's official docs on Agentic Workflows β search "GitHub Agentic Workflows"
- Awesome Copilot's agentic workflows learning hub β examples and templates
- github.com/githubnext/agentics β community-maintained sample workflows you can install
When to use itβ
Agentic workflows shine once a project has moved past being a purely interactive tool. Good candidates: "when a new ticket is created, auto-draft a starting point and post it as a comment," or "every Friday, summarize this week's tickets." Don't force it β use agentic workflows where they genuinely add value, not just to use the feature.
Component 4 β Multi-agent orchestrationβ
What it isβ
Using multiple specialized agents in coordination instead of one general agent. Each agent has a focused role, focused context, focused output.
A common split for a drafting pipeline:
- Analyzer agent β reads the input, extracts intent, identifies ambiguities
- Drafter agent β takes the intent + past examples, drafts a result
- Reviewer agent β checks the draft for safety and correctness
Why it matters (and when it doesn't)β
The benefit of multi-agent: each agent has focused context, which means more accurate output. Bugs are easier to localize ("the analyzer got it wrong" vs "the giant prompt got something wrong"). You can apply evals per agent.
The cost: more orchestration complexity, more tokens used, more chances for the chain to break.
Important nuance: there's an active debate in the field about whether multi-agent systems are usually better than a single well-prompted agent. Anthropic's "Building effective agents" essay argues for them when warranted. Cognition's "Don't build multi-agents" essay argues most teams overcomplicate. Read both β the disagreement is the lesson.
Start single-agent. Move to multi-agent only when you have evidence that a single well-prompted agent can't reach the quality you need. Don't add complexity speculatively.
Public learning pathβ
- Anthropic's "Building effective agents": anthropic.com/engineering/building-effective-agents β the foundational essay
- Cognition's "Don't build multi-agents": the counter-argument
- Microsoft's Semantic Kernel cookbook: examples of multi-agent in production
Component 5 β Project context managementβ
What it isβ
The discipline of deliberately deciding what context an agent needs to do its job β and what context it doesn't need.
This sounds simple. It isn't. The natural tendency is either to dump everything ("just give the agent the whole repo") or starve it ("just send the immediate question"). Both fail. The skill is curating the right context for the task.
Why it mattersβ
Models perform dramatically better with the right context. Wrong context = hallucinations, irrelevant outputs, or just confused responses. Models also have token limits β too much context costs money and slows responses; too little misses critical information.
Most "AI doesn't work" experiences are actually "I gave the AI the wrong context" experiences.
What this looks like in practiceβ
For a drafting agent over past examples:
- The agent needs access to past examples (otherwise it has no patterns to follow)
- The agent needs the new input (otherwise it has no task)
- The agent doesn't need the entire codebase (would be overwhelming and confuse it)
- The agent needs style conventions (in
copilot-instructions.md) - The agent might benefit from the schema of affected data (consider adding to instructions)
Public learning pathβ
- Eugene Yan: eugeneyan.com/writing β read his posts on patterns for LLM systems
- Hamel Husain: hamel.dev β read his series on building LLM products
- Simon Willison: simonwillison.net β practitioner blog with frequent context-management posts
- The "context engineering" emerging field: search "context engineering LLM" β this is becoming a recognized skill area
Component 6 β Human-in-the-loop checkpointsβ
What it isβ
Designing where a human approves agent output, and what the agent is allowed to do without approval.
A simple rule for a drafting agent: the agent drafts, but never merges. The merge is the human checkpoint.
For other systems, checkpoints might be:
- Agent proposes a code change β human reviews PR β human merges
- Agent drafts an email β human reads and clicks send
- Agent identifies a problem β human approves the fix
- Agent runs in low-stakes environment freely β escalates to human for high-stakes decisions
Why it mattersβ
Two reasons:
- Trust. Even when agents are right 95% of the time, that 5% can be catastrophic in production. A human checkpoint catches the 5%.
- Adoption. Engineers won't use a tool they don't trust. Visible approval gates build trust over time.
What "good" looks likeβ
A well-designed human-in-the-loop system:
- Has the human reviewing the right things (not everything; not nothing)
- Makes the review fast and ergonomic (a 30-second yes/no, not a 30-minute deep dive)
- Gives the reviewer context (why did the agent make this choice? what did it consider?)
- Has clear escalation paths when the human isn't sure
Public learning pathβ
This is mostly absorbed from practice and from blog posts on agent design. There's no canonical text. Look for:
- Cognition's blog on Devin's design choices
- Posts on PR-review-as-checkpoint patterns
- Discussions of "agentic workflows with approval gates"
Putting it togetherβ
These six components compose. A mature AI-first setup typically has:
- Custom instructions per repo (Component 1)
- A reusable prompt library (Component 2)
- Deliberate context management (Component 5)
- Human checkpoints at the right places (Component 6)
β¦with agentic workflows (Component 3) and multi-agent orchestration (Component 4) added later, only where they earn their complexity.
The branding around any particular methodology is unimportant. The underlying skills β specificity, reuse, context discipline, and well-placed human review β are what matter. Learn them gradually, one component at a time, by applying each to a real project.