Skip to main content
ai tools

Claude Code Skills Are Just Procedures With Memory

Skills aren't magic — they're documented procedures in a markdown file. Here's when to create one and when to leave it in CLAUDE.md.


5 min read

A Claude Code skill is a markdown file. That's it. It lives in .claude/skills/, Claude reads it before doing the work, and the file itself is the memory. There is no training, no fine-tuning, no persistent state between sessions. When you invoke a skill, you are asking Claude to read a documented procedure and execute it against your current codebase. If the procedure is clear, the output is consistent. If the procedure is vague, you'll get vague results every single time. Understanding this one thing changes how you design them.

The File Is the Memory

People reach for the word "memory" when talking about AI agents, and it usually means something fuzzy — some ambient awareness the model carries around. With Claude Code skills, it's literal. The skill file is a document. Claude reads it. The document tells Claude what to do, what to check, what to output, and when to stop.

Here's a real skill I use on every design-system project:

# Code Review — Accessibility + Type Safety + Bundle

## Trigger
Run this when reviewing a component PR before merge.

## Inputs
- The component file path(s) being reviewed
- The PR diff (or a description of what changed)

## Procedure

### 1. Accessibility audit (WCAG 2.1 AA)
- Check all interactive elements for keyboard accessibility (focus management, tab order)
- Verify all images have descriptive `alt` text or `aria-hidden="true"` if decorative
- Confirm color contrast meets 4.5:1 for normal text, 3:1 for large text
- Check that form inputs have associated `<label>` elements or `aria-label`
- Look for missing `role` attributes on custom interactive components

### 2. TypeScript errors
- Identify any `any` types that could be narrowed
- Flag missing return types on exported functions
- Check that component props interfaces are fully typed — no implicit `children: any`

### 3. Bundle size regression
- Note any new dependencies added in this diff
- Flag imports that pull in an entire library when a subpath import would work
  (e.g., `import _ from 'lodash'` instead of `import debounce from 'lodash/debounce'`)

## Output format
Return a markdown report with three sections: **Accessibility**, **Type Safety**, **Bundle**. 
Under each section, list findings as checkboxes. If no issues found, write "No issues found."
Do not summarize — just list findings.

## Stop condition
Stop after producing the report. Do not automatically fix issues.

A junior developer could follow those instructions. That's the test. If you hand this file to a competent human who has never seen your codebase, could they produce a useful review? If yes, you have a good skill. If they'd need to ask ten follow-up questions, the skill needs more context embedded in the file itself.

CLAUDE.md vs a Skill File

These two things serve different purposes and I see them conflated constantly.

CLAUDE.md is for ambient context — everything Claude should know before touching the project. Architecture decisions, coding standards, which component library you're using, how the monorepo is structured, what not to touch. Claude reads this on every session. It's the briefing document for a new contractor.

A skill is for a specific repeatable task. It has a trigger, defined inputs, and a defined output format. It's a procedure, not context.

If you're writing "always check accessibility when reviewing components" into CLAUDE.md, stop. That's a skill trigger, not ambient context. CLAUDE.md gets bloated fast when you dump procedures into it, and bloated context produces worse results — Claude has to weight everything equally instead of focusing on what's relevant to the task at hand.

The division I use: CLAUDE.md answers "what is this project?" A skill answers "how do I do X in this project?"

When the Trigger Condition Matters

Skills work when the task has clear entry and exit criteria.

"Make this code better" is not a skill. It has no defined input, no stop condition, and "better" is undefined. You'll get a different response every time you run it.

"Check this component against the WCAG 2.1 AA checklist and report violations as checkboxes" is a skill. The input is the component. The process is the checklist. The output is a checkbox report. The stop condition is after producing the report, not after fixing everything.

That stop condition matters more than most people realize. Without it, an agentic Claude Code session can keep going — fixing issues it found, which surfaces new issues, which it also fixes — until you've got a diff you didn't review and can't easily explain. The skill file should say explicitly when to stop and what done looks like.

What Makes a Skill Fail

Four failure modes I've hit:

Vague instructions. "Review the component for quality issues" tells Claude nothing it doesn't already know. Enumerate the checks. Specificity is the whole point.

Missing context. If the skill references your design token naming convention but that convention isn't in the skill file or CLAUDE.md, Claude will guess. Sometimes it guesses right. Often it doesn't. Either embed the context directly or reference where to find it.

No expected output format. If you don't specify the format, you'll get a different structure every run. A markdown report, a bulleted list, a paragraph of prose — all are valid unless you've said which one you want. Say which one you want.

No stop condition. Define done. "Stop after producing the report" is a valid stop condition. "Stop when the component is fully accessible" is not — it's recursive and unbounded.

The Honest Tradeoff

Skills add maintenance overhead. Every time you change your accessibility audit criteria, your component API, or your TypeScript configuration, the skill file needs to be updated. A stale skill is worse than no skill — it produces confident wrong output.

My rule: don't create a skill unless you'll run it more than five times. If it's a one-off, write the instructions inline in the chat. If it's a recurring task — PR reviews, component scaffolding, deploying a staging branch, running a specific audit — make it a skill. The fifth time you type the same instructions into a chat window is the sign you should have made a file.

The mental model that makes this click: you're not programming Claude, you're writing a procedure. The procedure lives in a file. The file is the memory. When the procedure is good, the output is good. When the procedure is bad, no amount of AI will fix it — you just get confidently executed bad instructions.

Write better procedures. That's the whole thing.