What is p95 and why not average?

Averages hide long-tail pain; p95 captures user-visible slow paths on the same operation set pre/post.

Do you cherry-pick endpoints?

No. The operation set is defined up front and sampled identically before and after changes.

How do you avoid moving the cost?

Shared services are allocated pro-rata and the allocation method is published with the cost deltas.

Can we include planned discounts or commitments?

Only if they are executed for both windows.

How is seasonality handled?

We extend measurement windows or normalize by hour/day mix if bias is detected.

What if a critical CVE appears after go-live?

It is logged, triaged, and patched per SLA. Go-live criteria apply at the release point; post-release issues are tracked separately.

How do you validate CV accuracy on my sites?

Site-stratified labeled samples; gates are set per class and agreed in advance.

Can everything be on-prem?

Yes. We run with the same instrumentation and export artifacts to your environment.

Back to Home

How we measure outcomes

We publish what changes, how we calculate it, and when we call success. The same rules apply to every deployment.

What we track

Latency (p95)

95th-percentile end-to-end request time for defined operations.

Security (critical CVEs)

Count of critical vulnerabilities open at go-live (target: zero).

Infra spend

Comparable monthly run-rate for compute, storage, and egress for the scoped system.

Adoption & engagement

Usage of shipped capabilities (eligible population, active users, events).

Accuracy & drift (ML/CV)

Precision/recall or class-wise accuracy vs. a labeled sample; drift deltas on key features.

Windows & sampling

Pre-window

Minimum 14 days of production baseline (exclude incidents).

Post-window

Minimum 14 days after cutover (exclude incident days; allow warm-up of 48 hours).

Like-for-like

Identical operation sets, identical time-of-day/day-of-week distribution.

Confidence

If p-value > 0.1 or seasonality bias is detected, extend windows or rerun.

Scope

Only the system(s) touched by the engagement; shared services allocated pro-rata.

Formulas & examples

Latency (p95 Δ%)

(p95_pre − p95_post) ÷ p95_pre

Example: 840 ms → 450 ms ⇒ (840−450)/840 = 46% lower

Critical CVEs at go-live

count(severity = critical, status=open) on release branch at T0

Example: Target 0

Infra spend Δ%

(run-rate_pre − run-rate_post) ÷ run-rate_pre

Example: $42k → $33k ⇒ 21% lower

Adoption rate

active_users_feature ÷ eligible_population (same window)

Example: 1,250 active / 2,000 eligible = 62.5%

CV accuracy

per-class precision/recall vs. labeled sample, with site weighting

Example: Drift = KS/PSI on selected features and Δ accuracy vs. gate

Instrumentation & tools

Latency

Distributed tracing/metrics (e.g., OpenTelemetry → Prometheus/Grafana), sampled by operation.

Security

SCA/SAST/DAST scanners plus OS package scanners; SBOM at release.

Infra

Cloud bills and usage (compute/storage/egress), plus on-prem meter data where applicable.

Adoption

App analytics + server events; anonymous where required.

Accuracy & drift

Eval harness (fixed seed), site-stratified samples, drift monitors on features and outputs.

Acceptance criteria

Performance

p95 lower by an agreed target (typ. 30–60%), sustained for the post-window, no feature freeze.

Security

0 critical CVEs before go-live; high/medium tracked with owner and SLA.

Cost

Infra run-rate 20–40% lower for scoped workloads, same or better SLOs.

ML/CV

Accuracy at or above gate; drift bounded; reviewer load at target.

Adoption

Feature usage reaches agreed floor within the window.

Evidence we export

Before/after KPI chart pack (PNG/PDF)
Scanner reports + SBOM summary at release
Cost deltas with line items and allocation notes
Eval summary (confusion matrices, drift plots)
Change log and rollback plan snapshot

Frequently asked questions

Last updated: October 28, 2025

Ready to ship with measurable outcomes?

Every sprint ends with verifiable metrics. Let's discuss your KPIs.

Talk to an Architect Request Demo