The Anatomy of an Agent Skill: Markdown, Scripts, and Repeatable AI Workflows

Agent skills are a quiet but important shift in how teams work with coding agents.

Instead of pasting the same prompt, checklist, style guide, deployment procedure, or review rubric into every session, you package that knowledge once. The agent can then discover it, decide when it applies, and load the instructions only when needed.

Both OpenAI Codex and Claude Code now use this pattern. Their implementations differ in details, but the shared idea is simple: a skill is a small folder that teaches an agent how to do one job reliably.

The important part is not the file format. The important part is operational discipline. A good skill turns repeated instructions into a versioned, reviewable workflow.

The Basic Shape

At its core, a skill is a directory with a SKILL.md file.

The OpenAI Codex skills documentation describes a skill as a package of instructions, resources, and optional scripts. Codex follows the open Agent Skills standard, which defines the same basic structure: a folder containing SKILL.md, plus optional supporting files.

A typical skill looks like this:

my-skill/
├── SKILL.md
├── scripts/
│   └── validate.sh
├── references/
│   └── policy.md
└── assets/
    └── template.md

The exact optional folders vary by tool and team convention, but the pattern is stable:

SKILL.md contains metadata and operating instructions.
scripts/ contains deterministic helpers the agent can run.
references/ contains longer documentation the agent should read only when needed.
assets/ contains templates, examples, images, fixtures, or other resources.

That structure gives the agent three things: when to use the skill, what to do, and where to find deeper material.

The Front Door: Metadata

The first part of SKILL.md is usually YAML frontmatter.

In Codex, SKILL.md must include name and description. Codex uses those fields to list available skills and decide which ones may be relevant to the user's task. The Codex docs also note that skills are loaded with progressive disclosure: Codex initially sees the skill name, description, and file path; it reads the full SKILL.md only after selecting the skill.

A minimal Codex-style skill looks like this:

---
name: pr-review
description: Review a pull request for regressions, missing tests, risky changes, and unclear behavior. Use when the user asks for a code review or PR review.
---

Review the current changes with a focus on defects, behavioral regressions, and missing tests.
Return findings first, ordered by severity.
Include file and line references where possible.

Claude Code follows the same broad idea, but its documented frontmatter is more expansive. The Claude Code skills documentation says a SKILL.md file has YAML frontmatter plus Markdown instructions. Claude recommends a description, uses the directory name as the command name when name is omitted, and supports additional fields such as when_to_use, allowed-tools, disable-model-invocation, user-invocable, argument-hint, arguments, context, agent, model, effort, paths, hooks, and shell.

The practical lesson: the description is the routing layer.

If the description is vague, the agent will load the skill at the wrong time or fail to load it when it matters. A good description is specific, front-loaded, and scoped.

Weak:

description: Helps with releases.

Better:

description: Prepare a release checklist, inspect changelog entries, verify version bumps, and identify release blockers. Use when the user asks to prepare or review a software release.

The better version tells the agent what the skill does, what tasks should trigger it, and what boundary it should not cross.

The Body: Markdown as Procedure

After the frontmatter, the body of SKILL.md is Markdown.

This is where most skills should start. Markdown is cheap to author, easy to review, and naturally version-controlled. It is also safer than executable automation when the task is mostly procedural.

A good skill body is not a long essay. It is an operating procedure.

It should answer:

What inputs should the agent inspect?
What steps should it follow?
What should it avoid?
What output format should it produce?
What evidence should it cite?
When should it stop and ask for human approval?

For example:

## Inputs

- Current git diff
- Relevant tests
- Existing code review conventions

## Procedure

1. Inspect the changed files before forming conclusions.
2. Identify behavioral regressions before style issues.
3. Check whether tests cover the changed behavior.
4. Treat missing validation, permissions, data loss, and security exposure as high-risk.

## Output

Return findings first.
If there are no findings, state that clearly and mention residual test gaps.

This works because it narrows the agent's job. It does not say "be helpful." It says what kind of help is expected.

Progressive Disclosure

Skills exist partly because context is scarce.

If every procedure, style guide, API reference, and checklist were always loaded into every session, the prompt would become noisy and expensive. Skills solve this by loading in stages.

The Agent Skills standard describes three stages: discovery, activation, and execution. During discovery, the agent sees minimal metadata. During activation, it reads the full SKILL.md. During execution, it follows instructions and may load referenced files or run scripts.

Codex uses the same idea. Its documentation says Codex starts with each skill's name, description, and file path, and then reads the full SKILL.md when it decides to use that skill. It also warns that the initial skills list has a context budget, so descriptions may be shortened when many skills are installed.

Claude Code documents a similar lifecycle: skill descriptions are available so Claude knows what can be used, while the full skill content loads only when invoked. Once invoked, the rendered skill content remains in the conversation for the rest of the session, subject to compaction behavior.

That creates a design rule: put routing information in the description and operating instructions in the body.

Do not bury trigger conditions halfway down the Markdown body. The agent may not read the body until after it has already decided whether the skill applies.

Scripts: Use Them for Determinism

Skills can include scripts, but scripts should not be the default.

OpenAI's best practices for Codex skills advise preferring instructions over scripts unless deterministic behavior or external tooling is needed. That is the right default. Markdown is easier to audit, easier to adapt, and less likely to surprise the user.

Use scripts when the task requires precision:

Parsing a file format
Validating generated output
Calling an internal CLI
Running a repeatable transformation
Producing a report from structured data
Checking policy rules
Generating artifacts from templates

For example, a documentation skill might include:

docs-refresh/
├── SKILL.md
├── scripts/
│   ├── extract_routes.py
│   └── validate_links.sh
└── references/
    └── docs-style.md

The Markdown tells the agent when and how to refresh documentation. The scripts do the parts that should not depend on model judgment: extracting route metadata and checking links.

That separation matters. Let the model handle judgment, synthesis, and adaptation. Let scripts handle deterministic mechanics.

References: Keep the Main Skill Small

Long reference material does not belong in the top-level SKILL.md.

Claude Code explicitly recommends keeping SKILL.md focused and moving detailed material into supporting files. Its docs say supporting files can include reference material, examples, templates, and scripts, and that SKILL.md should point to them so Claude knows when to load them.

Codex documents the same optional shape with references/ and assets/ folders. The goal is context hygiene. A skill should load the minimum instructions needed to start, then pull deeper material only when relevant.

For example:

## Additional resources

- Use [references/security-checklist.md](references/security-checklist.md) when reviewing authentication or permissions changes.
- Use [assets/pr-template.md](assets/pr-template.md) when drafting a pull request description.
- Run [scripts/validate-report.sh](scripts/validate-report.sh) before finalizing a compliance report.

This makes the skill navigable. The agent does not have to guess what is inside the folder.

Discovery in Codex

Codex reads skills from several scopes.

For repository skills, Codex scans .agents/skills in the current working directory and parent directories up to the repository root. It also supports user skills, admin skills, and system skills. The Codex docs list these locations as repo-level .agents/skills, user-level $HOME/.agents/skills, admin-level /etc/codex/skills, and system skills bundled with Codex.

Codex also supports explicit and implicit invocation.

Explicit invocation means the user names the skill directly, for example by using /skills or typing $ in supported Codex interfaces. Implicit invocation means Codex chooses a skill because the task matches the skill description.

Codex makes an important distinction between skills and plugins. Skills are the authoring format for reusable workflows. Plugins are the installable distribution unit when you want to package one or more skills, app mappings, MCP configuration, or presentation assets for broader use.

That is a useful architecture:

Use repo skills for team workflows tied to one codebase.
Use user skills for personal workflows across projects.
Use admin skills for default machine or container workflows.
Use plugins when you want distribution, installation, or bundling.

Discovery in Claude Code

Claude Code uses a similar but not identical filesystem convention.

Project skills live under .claude/skills/<skill-name>/SKILL.md. Personal skills live under ~/.claude/skills/<skill-name>/SKILL.md. Plugin skills live under a plugin's skills/<skill-name>/SKILL.md. Claude Code also documents enterprise-managed skills.

Claude Code skills are invoked with slash commands such as /summarize-changes. The directory name becomes the command name unless overridden. Claude can also load skills automatically when the user request matches the description.

Claude Code has additional controls that are useful for risk management:

disable-model-invocation: true prevents Claude from automatically loading the skill, which is useful for side-effectful workflows such as deploys.
user-invocable: false hides a skill from the slash menu, which is useful for background knowledge that should not be run manually.
allowed-tools can pre-approve tools while the skill is active, although baseline permission settings still matter.
context: fork can run a skill in an isolated subagent context.
paths can limit automatic activation to matching files.

Those fields make Claude Code skills feel more command-like. A skill can be a passive reference, a manually invoked operation, or an isolated task runner.

Dynamic Context

Claude Code documents a feature that is especially important: dynamic context injection.

In a SKILL.md, a line such as:

!`git diff HEAD`

can run before the skill content is sent to Claude. The command output replaces the placeholder, so Claude sees the actual current diff rather than a vague instruction to inspect it. Claude Code also supports multi-line shell injection blocks.

This is powerful, but it changes the risk profile. Dynamic context is not just text. It is executable preprocessing. Teams should be careful about where those commands come from, what they can access, and whether repository skills are trusted.

For high-trust internal workflows, dynamic context can make skills much more reliable. For example:

Include the current diff in a review skill.
Include the latest test output in a debugging skill.
Include a generated route map in a documentation skill.
Include dependency versions in an upgrade skill.

For low-trust repositories, dynamic shell execution should be treated like any other executable code.

Skills vs CLAUDE.md, AGENTS.md, and Rules

Skills should not become dumping grounds for all agent instructions.

Use persistent project guidance files for facts and norms that should always apply. Use skills for workflows that are conditional.

Good always-on guidance:

Project architecture
Coding style
Test commands
Security boundaries
Deployment environments

Good skill content:

"Review a PR using our severity model"
"Prepare a release checklist"
"Convert a customer call transcript into CRM notes"
"Generate a report from this data export"
"Migrate a component following our design-system rules"

The dividing line is invocation.

If the agent should always know it, put it in the project's standing guidance. If the agent should load it only for a particular job, make it a skill.

What Makes a Skill Good

A high-quality skill has a narrow job.

It does not try to make the agent generally better. It makes the agent reliably better at one repeatable task.

Good skills have:

A specific trigger description
A short SKILL.md
Clear inputs and outputs
Explicit stop conditions
References for deeper material
Scripts only where determinism is needed
Safe defaults around side effects
Version control and code review

Bad skills usually fail in predictable ways:

The description is too broad.
The body is a long essay instead of a procedure.
The skill mixes unrelated jobs.
Scripts perform surprising side effects.
Supporting files are present but not referenced.
The output format is not specified.
The skill assumes context the agent does not have.

Skills are not magic prompts. They are operational artifacts.

A Practical Skill Template

Here is a portable starting point that works conceptually across skills-compatible tools:

---
name: focused-task-name
description: Perform one specific workflow. Use when the user asks for the exact task, related trigger phrases, or matching outcomes. Do not use for unrelated adjacent tasks.
---

## Purpose

State the result this skill should produce.

## Inputs

- List the files, commands, systems, or user-provided data needed.

## Procedure

1. Inspect the required inputs.
2. Follow the workflow steps in order.
3. Use scripts or references only when relevant.
4. Stop and ask for approval before sensitive or irreversible actions.

## Output

Return the final result in this structure:

- Summary
- Findings or changes
- Evidence
- Open questions

## Supporting files

- Use [references/example.md](references/example.md) for detailed conventions.
- Run [scripts/validate.sh](scripts/validate.sh) before final output when artifacts are generated.

For Claude Code, you might add fields such as disable-model-invocation, allowed-tools, arguments, or context: fork. For Codex, you might package the skill into a plugin when it needs to be distributed beyond one repository or user setup.

Conclusion

The value of a skill is not that it saves typing.

The value is that it makes agent behavior reviewable.

When a workflow lives only in a chat prompt, it is temporary and inconsistent. When it lives in a skill, the team can inspect it, test it, improve it, and decide where it is allowed to run. The skill becomes part of the system design.

That is where skills fit into serious AI operations. They are small, composable control surfaces for repeatable work.

Markdown provides the procedure. Scripts provide deterministic execution. References provide depth without bloating context. Metadata provides discovery. Together, they turn "ask the agent" into something closer to a governed workflow.