Engineering notes on agentic QA
How we actually build, operate, and trust agentic test fleets in production — written from inside live engagements, with the numbers on.
Engineering Notes · Agentic QA
Making an Agentic Test Run Boring: Determinism, Retries, and the Flake Budget
9 min read
Agentic tests fail in a different shape from traditional end-to-end tests. Here's how to engineer a flake budget, a failure taxonomy, and the determinism levers that actually move the number.
Read the postEngineering Notes · Agentic QA
Evals Are the Test Suite for Your Test Suite: Running Agentic QA in Production
10 min read
Once you ship agentic QA, you have two systems that can regress — the product, and the agent. Most teams only instrument the first. Here's the eval harness, golden traces, and model-upgrade protocol that keep an agentic fleet honest in production.
Read the post