Real benchmarks, honestly scoped
Independent lab studies and reference builds — measured with retries off and numbers that trace to a logged run. Where something is unmeasured, we say so.
Case Study · Web E2E
Your CI Isn't Broken. It's Flaky — And Retries Are Hiding the Bill
3.30% — flake rate at retries:0 (2.93% genuine)We ran a mature Playwright suite five times, no code changes: 7, 2, 6, 1, 1 failures — a 3.3% flake rate, zero real bugs. What that means for your CI.
Read the case studyCase Study · Mobile / React Native
Your Mobile Test Suite Is Probably Healthy. Your Mobile QA Still Isn't
474 / 474 — unit tests passing — types and lint cleanA major React Native app on current Apple tooling: 474 unit tests green, types and lint clean — yet zero visual-regression and zero runtime a11y gating.
Read the case studyCase Study · Reference Pipeline
We Built a SaaS With Agentic QA Wired In From Commit One
30 / 30 — OpenAPI operations with passing contract testsWe built a SaaS with agentic QA wired in from commit one: contract 30/30, mutation 72%→85%, perf in budget, self-healing E2E — what it catches on day one.
Read the case study