Blog

AI Agent Optimization: Self-Improving Loops That Compound

Point your AI agent at a metric and let it run a baseline-versus-challenger loop that compounds forever. Here's the self-improving experimentation loop, set up in Hermes.

June 4, 2026OpenClawCrew7 min read

AI Agent Optimization: Self-Improving Loops That Compound

Most people use an AI agent to do a task once. The real leverage is pointing it at a number you care about — conversion rate, click-through, open rate — and letting it run a baseline-versus-challenger loop that improves that number a little at a time, forever. The agent generates a variant, you deploy it, it reads the result, and if the challenger wins it becomes the new baseline. Every experiment gets logged and never forgotten. By day 365 you've stacked improvements you could never have reasoned your way to up front. Here's how to set up that loop with Hermes (Claude Code), including the part nobody covers: closing the data loop on platforms that have no API.

The core loop: baseline vs challenger

The whole system is one repeating cycle:

1. Pick one thing to change.
2. Pick one objective metric.
3. Pick one way to read that metric.
4. The agent generates a challenger variant.
5. You deploy it; it harvests the metric.
6. If the challenger beats the control, it becomes the new baseline. Repeat.

It compounds like 1% a day. The mental model is three files: a one-time data-prep step, a single file the agent edits each iteration, and an instructions file that tells one agent what to do. Point the agent at the instructions and let it run.

The phrase that matters: the log is the asset. Every iteration is recorded — the hypothesis, the change, the result. The agent never forgets a failed test, so it stops repeating dead ends and starts building on what worked. Or, as the people who run these loops put it: it's called WD-40 because WD-39 failed.

Pick the right metric, or the loop is noise

Optimization only works on the right target. A good candidate is:

Objective and measurable — a number a computer can read, not a vibe.
Data-obtainable — reachable via API or scraping.
Frequent — enough volume for fast feedback.

Start in the center: high-volume, fast-feedback metrics like email subject lines and landing-page click-through. Avoid slow, subjective targets like brand perception or SEO for your first loops — the feedback is too slow and too noisy to learn from.

A rule of thumb for statistical stability:

~100 conversion events = unstable, don't trust it
~250 events = reasonably stable
~500 events = comfortable

Size your iteration window to the metric's volume instead of reacting to every wiggle. Reading noise as signal is how you "optimize" yourself in circles.

One more discipline: change one element at a time on things that already work. If a page already converts, test exactly one variable per round. Save the radical multi-variable swings for greenfield pages where you have nothing to lose.

What you can put on the loop

Anything with a number a machine can read:

Landing-page copy and layout
Email subject lines
Ad creative
YouTube titles and thumbnails
Checkout flow steps
Chatbot scripts

If you can measure it and read it on a schedule, the loop can optimize it.

Closing the data loop when there's no API

This is the part almost everyone gets stuck on. The loop needs to *read the result*, and plenty of platforms — course platforms, community tools, internal dashboards — have no API. The loop dies there for most people.

The fix is browser automation on a schedule. The agent logs in, navigates to the dashboard, scrapes the number, and writes it to Notion or Sheets, which the optimizer then reads to generate the next variant.

A few hard-won details:

Enable keep-awake so the machine doesn't sleep through a scheduled run. Build in slack with multiple daily attempts; failed runs auto-retry when the laptop reopens.
Cookie-based scraping skips the browser agent entirely. For some sites you can grab a session cookie and run a recurring job that fires on laptop-open — no live browser automation needed.
Store keys as env vars, not in chat. Tell the agent to list the key as an environment variable and paste it after the fact if you're sensitive about secrets in conversation history. (More on locking this down in AI agent security.)

As the people running these loops put it: APIs are not going to stop us. No API means the agent does the browser task and dumps the data where the optimizer can read it.

You can run the whole scrape-and-harvest step on a schedule so it happens whether you're at your desk or not — see cron automation for AI agents.

Ground the variants in real customer language

A loop that generates generic copy plateaus fast. The upgrade is feeding the optimizer real voice-of-customer data so its variants use language your customers actually use and frameworks that actually convert.

Two things to vectorize and point the optimizer at:

An expert corpus — books and talks from a domain authority whose frameworks you trust, vectorized into a semantic index.
Your own raw signal — survey responses, YouTube comments, support tickets, sales-call notes.

With that grounding, the loop stops guessing. Instead of "make the copy better," you get reasoned, specific hypotheses: *"insert a 3-line pain block between the headline and the offer — the objection here isn't trust, it's self-identification,"* with a prioritized testing roadmap behind it. (The grounding corpus is just a knowledge base — see how to build one in AI agent memory.)

It also helps enormously to give the optimizer a business.md and about.md: brand DNA, target customer, the problems you solve, the metrics that matter. The optimizer with brand context plus raw survey data is in a different league from one handed "make better copy" cold.

A minimal setup

1. Clone the optimization repo into your project folder.
2. Pick the metric — the biggest constraining number you have.
   Confirm it's objective, frequent, and readable.
3. Add context: business.md (brand/customer DNA) + survey data.
4. Wire the data harvest:
     has API  -> give the agent the endpoint + key (env var)
     no API   -> scheduled browser scrape -> Notion/Sheets -> optimizer reads it
5. Ground the variants: point it at a vectorized expert corpus
   + your own survey/comment data.
6. Run a SMALL first iteration (one element). Review the
   hypothesis and roadmap. Deploy. Harvest. Promote the winner.
7. Tune cadence to ~250+ conversion events per window.

Start small on purpose. One element, one clear hypothesis, one readable result. The compounding does the heavy lifting.

This pairs tightly with AI agent skills: once a skill is in a pipeline, you can evolve its prompts the same way — gradient-free, on CPU, without touching model weights.

FAQ

What is AI agent optimization?

It's using an agent to run a continuous experiment on a metric you care about. The agent generates a variant, you deploy it, the agent reads the result, and winning variants become the new baseline. Every experiment is logged, so improvements compound over time.

How is this different from normal A/B testing?

Standard A/B testing is manual and episodic — you set up a test, wait, read it, and usually stop. An agent loop automates variant generation, result harvesting (including via browser scraping when there's no API), and promotion of winners, then keeps going. The logged history means it never re-tests a dead end.

What metrics should I optimize first?

Start with objective, high-volume, fast-feedback metrics like email subject lines and landing-page click-through. Avoid slow or subjective targets like brand or SEO until you have the loop running cleanly.

How many conversions do I need before trusting a result?

Roughly 250 conversion events is reasonably stable and 500 is comfortable; around 100 is too noisy to trust. Size your iteration window to the metric's volume rather than reacting to daily swings.

What if the platform has no API?

Use scheduled browser automation. The agent logs in, scrapes the number, and writes it to Notion or Sheets that the optimizer reads. For some sites a session cookie plus a recurring job skips the browser step entirely.

Put your numbers on a compounding loop

A private agent that quietly improves your conversion rate while you sleep is the highest-leverage thing you can build — and the data-loop plumbing (especially for no-API platforms) is exactly the kind of setup that eats a weekend. The OpenClawCrew starter kits ship with the optimization loop, scraping fallbacks, and grounding wiring ready to go. Grab a kit for $49, or have us stand up your full experimentation engine done-for-you.

Round out the system: feed the loop real customer data with AI agent memory, and turn your winning workflows into reusable AI agent skills.

Back to Blog