How we measure outcomes
We publish what changes, how we calculate it, and when we call success. The same rules apply to every deployment.
What we track
Latency (p95)
95th-percentile end-to-end request time for defined operations.
Security (critical CVEs)
Count of critical vulnerabilities open at go-live (target: zero).
Infra spend
Comparable monthly run-rate for compute, storage, and egress for the scoped system.
Adoption & engagement
Usage of shipped capabilities (eligible population, active users, events).
Accuracy & drift (ML/CV)
Precision/recall or class-wise accuracy vs. a labeled sample; drift deltas on key features.
Windows & sampling
Pre-window
Minimum 14 days of production baseline (exclude incidents).
Post-window
Minimum 14 days after cutover (exclude incident days; allow warm-up of 48 hours).
Like-for-like
Identical operation sets, identical time-of-day/day-of-week distribution.
Confidence
If p-value > 0.1 or seasonality bias is detected, extend windows or rerun.
Scope
Only the system(s) touched by the engagement; shared services allocated pro-rata.
Formulas & examples
Latency (p95 Δ%)
(p95_pre − p95_post) ÷ p95_pre
Example: 840 ms → 450 ms ⇒ (840−450)/840 = 46% lower
Critical CVEs at go-live
count(severity = critical, status=open) on release branch at T0
Example: Target 0
Infra spend Δ%
(run-rate_pre − run-rate_post) ÷ run-rate_pre
Example: $42k → $33k ⇒ 21% lower
Adoption rate
active_users_feature ÷ eligible_population (same window)
Example: 1,250 active / 2,000 eligible = 62.5%
CV accuracy
per-class precision/recall vs. labeled sample, with site weighting
Example: Drift = KS/PSI on selected features and Δ accuracy vs. gate
Instrumentation & tools
Latency
Distributed tracing/metrics (e.g., OpenTelemetry → Prometheus/Grafana), sampled by operation.
Security
SCA/SAST/DAST scanners plus OS package scanners; SBOM at release.
Infra
Cloud bills and usage (compute/storage/egress), plus on-prem meter data where applicable.
Adoption
App analytics + server events; anonymous where required.
Accuracy & drift
Eval harness (fixed seed), site-stratified samples, drift monitors on features and outputs.
Acceptance criteria
Performance
p95 lower by an agreed target (typ. 30–60%), sustained for the post-window, no feature freeze.
Security
0 critical CVEs before go-live; high/medium tracked with owner and SLA.
Cost
Infra run-rate 20–40% lower for scoped workloads, same or better SLOs.
ML/CV
Accuracy at or above gate; drift bounded; reviewer load at target.
Adoption
Feature usage reaches agreed floor within the window.
Evidence we export
- Before/after KPI chart pack (PNG/PDF)
- Scanner reports + SBOM summary at release
- Cost deltas with line items and allocation notes
- Eval summary (confusion matrices, drift plots)
- Change log and rollback plan snapshot
Frequently asked questions
Last updated: October 28, 2025
Ready to ship with measurable outcomes?
Every sprint ends with verifiable metrics. Let's discuss your KPIs.