Skip to content
Blog

Engineering notes on agentic QA

How we actually build, operate, and trust agentic test fleets in production — written from inside live engagements, with the numbers on.

Engineering Notes · Agentic QA

Making an Agentic Test Run Boring: Determinism, Retries, and the Flake Budget

9 min read

Agentic tests fail in a different shape from traditional end-to-end tests. Here's how to engineer a flake budget, a failure taxonomy, and the determinism levers that actually move the number.

Read the post
Engineering Notes · Agentic QA

Evals Are the Test Suite for Your Test Suite: Running Agentic QA in Production

10 min read

Once you ship agentic QA, you have two systems that can regress — the product, and the agent. Most teams only instrument the first. Here's the eval harness, golden traces, and model-upgrade protocol that keep an agentic fleet honest in production.

Read the post