Autoresearch — tuning without a human in the loop

Where the loop closes around the agent itself — and how it tunes without anyone watching.

The Reality Twin claims the loop closes. The simulator cascade shows what closes around. This post is what closes around the agent itself.

What autoresearch is

The runtime executes an agent .ts file. Every time it runs, it emits a row of evaluation data: which intent fired, which simulator signals it consumed, which decision it made, which rollback it carried, how the success condition scored.

That trace is appended to a growing corpus. One row is one evaluation — one agent, one run, one scored outcome. The corpus is simply the accumulated memory of every run so far, and it grows every time an agent executes.

Autoresearch reads that corpus and tunes the next iteration of the agent. Not the skills. Not the language. The agent .ts directly — within the constraints the compiler embedded as comments.

Optimisation-constraint comments

The DOIL compiler doesn't just emit code. It emits code plus comments telling the runtime what can be tuned:

// @doil:tunable threshold range=0.1..0.9 step=0.05
const LOW_TRAFFIC = 0.3;

// @doil:tunable schedule_interval range=5m..60m step=5m
const SCHEDULE = "15m";

// @doil:rollback safe true
function applyOptimization(...) { ... }

Autoresearch reads those comments. It knows LOW_TRAFFIC can sweep between 0.1 and 0.9. It doesn't need to read DOIL. The contract between DOIL and the runtime is the comment.

The tuning loop

Compiler emits agent .ts with constraint comments.
Runtime executes against the simulator constellation.
Every run writes a row to autoresearch-results.tsv.
Autoresearch picks a tunable parameter, sweeps it, scores outcomes.
Best configuration becomes the next baseline.
Cycle repeats. The agent "learns" without anyone editing the .ts by hand.

What the corpus gives us

Enough density per agent to do non-trivial parameter sweeps.
Enough cross-scenario coverage to know whether a tune holds across the cascade or only on one provider.
A clustered visual map — which agents have been tuned hardest, where the gaps are.

The corpus is not training data for a model. It is a record of evaluations against a Reality Twin — used to tune constants in deterministic code. The value is in the coverage, not in any particular count.

Why this matters

Autoresearch is what makes the Reality Twin loop close around the agent itself, not just around the network. The agent is the moving part. The corpus is the memory of how it's moved. The simulators are the safe place to move it.

Last in the series. Loop back to The Reality Twin — and re-read it. Different sentences land the second time.

Tuning without a human in the loop

What autoresearch is

Optimisation-constraint comments

The tuning loop

What the corpus gives us

Why this matters