Vol. I ยท Issue 01Spring ยท 2026Builder live$49 USD ยท One-time

Blog

AI Agent Skills: Build Agents That Improve Themselves

Skills are how your AI agent does things - saved, reusable procedures it authors, loads, and prunes on its own. Here's the skills architecture that makes an agent improve every week.

AI Agent Skills: Build Agents That Improve Themselves

If memory is what your agent knows, skills are how it does things. A skill is a saved, reusable procedure โ€” the exact steps, commands, and pitfalls for a recurring task โ€” so the agent skips the trial and error next time. Done right, skills compound: the agent authors them automatically after a hard task, loads only the ones it needs, prunes the dead ones on its own, and rewrites them when you give feedback. This is the difference between an agent that's clever once and an agent that gets better every week. Here's how the architecture actually works in Hermes (Claude Code), and how to build skills that don't rot.

Why most "skills" are dead weight

Go download a pack of skills off the internet and most of them are dead markdown files: generic, static, and amnesiac. They don't know your tools, they never improve, and half of them never fire because their descriptions don't match how you actually ask.

A skill that works has four legs, like a chess strategy โ€” remove one and the whole thing falls over:

  • Authored properly, not hand-typed from scratch
  • Fed real data through connectors so it has eyes
  • Backed by persistent memory so it remembers across runs
  • Refined in a loop where you grade output and it rewrites its own file

Skip any one and you get a Superman who can't fly. Let's build all four.

Progressive disclosure: how the agent picks the right skill

You can keep a huge library of skills loaded without blowing your context budget, because of how they load.

There are three levels:

  • Level 0 โ€” only the YAML frontmatter (name + description, roughly 30 tokens each) enters the system prompt. This is all the agent reads when deciding *which* skill to invoke.
  • Level 1 โ€” once a skill is chosen, the full body loads.
  • Level 2 โ€” scripts and reference files load only if the task needs them.

The practical consequence: because only names and descriptions are always loaded, a large skill library is cheap to keep on hand. But it also means the description is your only trigger signal. Write it to match the exact prompts a user would actually type, not an abstract capability statement.

Bad: description: "Utilities for data processing and transformation workflows"

Good: description: "Export Stripe payouts to a monthly CSV and email me the summary"

The second one fires when you ask for it. The first one never gets picked because nothing you type sounds like it.

The self-improvement loop: skills that write themselves

You should almost never hand-write a skill. The agent will do it for you.

After about 5 tool calls on a single task, the agent automatically asks itself whether this pattern is worth saving as a skill. If you want to force it:

That took a lot of steps โ€” please create a skill for this procedure.

It then authors a SKILL.md capturing the working sequence. Next time you ask for the same thing, it loads the skill and skips the fumbling.

Two upgrades that separate a real skill from a toy one:

  • Use a skill-creator, don't hand-roll. Prompt: *"Use your skill-creator skill; my intended outcome is X."* A proper generator tests its own output before finalizing, so it beats anything you'd type by hand. Then actually answer its clarifying questions โ€” scope, output cadence, data sources, memory, the action it should take. That back-and-forth is the depth most people skip, and it's where the quality lives.
  • Push deterministic work into scripts. If part of the skill is deterministic โ€” call a specific API with a computed prompt โ€” write a Python script for it and reference it from the skill. That removes LLM variability from the part that should never vary, and saves tokens on every single invocation. The reasoning stays in the prompt; the mechanical call lives in scripts/.

A well-structured skill directory looks like this:

brand-banner/
  SKILL.md            # frontmatter + procedure
  references/         # your design system, style rules
  assets/            # 4 example banners for visual consistency
  scripts/           # deterministic API call

That assets/ folder is the trick behind brand-consistent output: feed a designer skill four example images, tell it *"analyze my design principles and create a skill that reproduces this style,"* and it stores the examples as references it loads every time โ€” no re-explaining your brand on each request.

The refinement loop is where the value sits

Here's the part almost everyone misses. After a skill runs, grade it:

The shortlist is too long and the picks are weak โ€” improve that in the skill.

The skill edits its own core file. Run it again and the output is better. Most people use a skill once, never give feedback, and wonder why it never improves. All the value is in the correction cycles. A self-refining morning brief that you grade a few times will outperform any static template you could write.

This pairs naturally with memory โ€” a skill that remembers your past corrections compounds faster. See how that layer works in our guide to AI agent memory.

Keep the library clean with a curator

Skills accumulate. Without maintenance you end up with forty near-duplicates and no idea which one fires. A curator handles garbage collection in two phases:

  • Phase 1 (rule-based, zero LLM cost): skills unused for 30 days get marked stale; at 90 days they're archived (recoverable, not deleted).
  • Phase 2 (LLM): consolidates near-duplicate skills into one.

Pre-built skills are protected automatically. For a workflow-critical skill you never want touched:

hermes curator pin <skill-name>

Bundles and team registries

Two patterns that scale skills beyond a single agent:

  • Skill bundles โ€” name a bundle that loads several skills in a set order. One trigger runs a whole workflow. A backend-feature bundle might load code-review, then test-driven-development, then the PR workflow, with explicit ordering instructions. Useful for shipping apps where the same multi-step process repeats.
  • GitHub team registry โ€” hermes skills tab add <username>/<skills-repo> adds a private repo as a skills source. Your team version-controls skills, collaborates on them, and installs from the registry. Skills stop being personal scratch files and become shared infrastructure.

Optimizing the prompts inside skills

Once a skill is in heavy rotation โ€” a cron job, a pipeline stage โ€” you can optimize its prompts without touching model weights using gradient-free evolutionary prompt adaptation. It runs on CPU and tends to beat reinforcement-learning approaches in multi-step agentic workflows. Feed it your skill prompts and examples, let it evolve them over iterations, and swap in the optimized version. This is the bridge into systematic AI agent optimization.

FAQ

What is an AI agent skill?

A skill is a saved, reusable procedure โ€” steps, commands, and known pitfalls for a recurring task โ€” stored as a markdown file with optional scripts and references. The agent loads it when the task comes up again so it doesn't re-derive the approach from scratch.

How do agents decide which skill to use?

Through progressive disclosure. Only each skill's name and short description load by default (about 30 tokens each), and the agent matches your prompt against those descriptions. The full skill body loads only after one is selected, which is why the description must match how you actually phrase requests.

Can AI agents create their own skills?

Yes. After roughly five tool calls on a task, the agent considers saving the pattern as a skill, and you can force it explicitly. Using a skill-creator that tests its own output produces stronger skills than writing them by hand.

How do I stop my skill library from getting messy?

Run a curator. It marks skills unused for 30 days as stale, archives them at 90, and consolidates duplicates. Pin the ones you can't afford to lose so they're never touched.

Build an agent that gets better on its own

Self-improving skills are the engine that makes a private agent worth keeping around โ€” but wiring the four legs (authoring, data, memory, refinement) by hand is fiddly. The OpenClawCrew starter kits come with the skill architecture, curator, and self-improvement loop already set up. Get a kit for $49, or let us build your skill library for you, including brand-aware and team-shared skills tuned to your stack.

Then connect the rest of the system: give your skills a memory with AI agent memory, and put them on a measurable improvement loop with AI agent optimization.