noemica · field reports

What real participants find when nobody is looking.

Notes from the field. Real participants, real environments (websites today, retail clones and product lines next), the leaks they surface, and what it cost the businesses on the other side of the screen.

argument · coding-agent UX

The AI labs are saving you from yourself.

Coding agents are asking more clarifying questions, even when they don't need to. I think the AI labs are trying to save us from ourselves, and (like me) most of us are mashing past them anyway.

June 2026·  read →

series · the loop on itself · part 1

User-testing the user-tester.

I had a coding agent run user studies on the product that runs user studies. Twenty-five-plus iterations, six hours of agent-on-task time, three phases. Here’s what it tried to get away with, and what stayed in.

May 2026·  read →

series · the loop on itself · part 2

Generation is cheap. The decisions are the artifact.

Where formal testing methods stop and user feedback begins. Formal tests prove that the system did what the spec said. User feedback proves that someone could figure out what to do.

May 2026·  read →

series · the loop on itself · part 3

The path of least resistance.

What a coding agent does when it can’t ask for help, and the skill I built that made sure it couldn’t.

May 2026·  read →

series · the loop on itself · part 4

Participants do what they’re told.

A line landed in the participant’s brief on iteration 9: “you MUST wait at least 40 minutes after launching before considering giving up.” The participant complied. The study stopped measuring anything.

May 2026·  read →

series · the loop on itself · part 5

The meta caught itself.

When the experiment became self-referential, the participant collapsed the inner study’s objective into her own and declared the run done in 65 seconds. The verdict LLM accepted it. The scaffolding around the iteration caught it.

May 2026·  read →

series · the loop on itself · part 6

Pigeons and variance.

One passing run is a coin flip. Two is a release gate. The difference between those two sentences is most of what turned the experiment into something I’d bet on.

May 2026·  read →

series · the loop on itself · part 7

The migration the experiment didn’t notice.

Mid-experiment, I moved the entire substrate from production to staging. The loop kept running. The new environment had a different bug; the loop found it. Neither I nor the audit agent surfaced the migration until I went looking.

May 2026·  read →

case study · noemica on noemica

A coding agent improved noemica autonomously across 25 iterations.

An autonomous coding agent edited noemica’s codebase, ran studies against each new build, read the verdicts, and decided the next change. Twenty-five iterations across three phases, with every run ID linked.

May 2026·  read →

field notes · primer

An engineer’s primer for autonomous-improvement loops.

The four parts of an improvement loop, the two design decisions that determine whether it converges, the three failure modes you will hit, and when a unit test is the better tool. Five minutes, end to end.

May 2026·  read →

argument · on agent autonomy

What the autonomous loop got wrong (and what the operator caught).

Three structural failures from one autonomous-improvement experiment. None were capability gaps in the agent; each was cleanup the human had to do. The receipts, and a checklist for the next person.

May 2026·  read →

framing · ML perspective

An autonomous UX-improvement loop, reframed as RL.

An LLM-driven UX-improvement loop mapped onto the canonical RL diagram. Five components, one load-bearing reward signal: user-experience feedback at code-review cadence. Includes a worked reward-shaping example with real numbers.

May 2026·  read →

field receipts · 5 vignettes

Five defects only real participants found.

Five case files from the meta-study. In each, a real participant ran the product end to end and produced evidence that exposed a defect. None of them would have been caught by a unit test, integration test, or e2e test.

May 2026·  read →

field report · four sites

Four sites. Four ways money was leaving the table.

A taxonomy of revenue defects, sorted not by industry or size, but by the shape of the leak. Each one was caught by watching real participants try to do real things on a live site.

May 2026·  read →

field report · DTC fashion

The clothes were ready. The experience wasn’t.

Nine shoppers walked into lane201.com with money in hand and Mother’s Day on the calendar. Two left empty-handed for two specific reasons.

May 2026·  read →

field report · YC startup

Two bugs that compiled. One in code, one in perception.

A YC-backed startup shipped two defects to production. One trapped a user in an auth loop. The other made the primary CTA invisible to 10 of 10 visitors.

May 2026·  read →

field report · ICP test

What your site quietly disqualifies.

Not a UX bug report. An ICP test. Two participants on the same B2B site, opposite paths. The vocabulary gap your funnel can't see is filtering customers out before they convert.

May 2026·  read →

incident report · DTC skincare

The form submitted. The content didn’t.

A teenage participant filled out the skincare quiz with intent. The page rendered. The recommendations didn’t. No alert tripped, no event fired.

May 2026·  read →

build story · chrome extension

Reverse-engineering Claude's browser extension.

The official extension blocks 58 domains across 11 categories. So I reverse-engineered the whole thing from the tool schemas up. 2,200 lines, 7 commits, 6 production bugs I didn’t expect.

April 2026·  read →

case study · developer blind spots

He knew his users. His product disagreed.

I hit two problems that would have killed a normal user’s session. I never said a word. Then synthetic participants found something worse.

March 2026·  read →

case study · vibe coding

I vibe-coded an agent and didn't know what it couldn't do.

I told Claude to give my agent calendar capabilities. It did, minus the ability to invite anyone to a meeting. I found out by accident.

March 2026·  read →

More reports as they ship. If you have a surface you would like watched the same way, write to seb@noemica.io.