
Choosing sites and datasets that generalize
PoCs often succeed on carefully selected test data but fail in production when conditions vary. To build models that generalize:
Site selection strategy
Choose pilot sites that represent your operational diversity:
Don't pick your "easiest" site for pilot. Pick representative sites that will stress-test your models.
Dataset requirements
Collect training data that covers:
Aim for 1,000-5,000 labeled examples per class minimum. More is better, but quality trumps quantity—ensure diverse, representative samples.
Accuracy & latency gates (and how to measure)
Set quantitative acceptance criteria before deployment. Don't settle for "it looks good"—measure precisely.
Accuracy gates
Define per-class thresholds:
Measure on held-out test sets from production sites. Track confusion matrices to identify systematic errors.
Latency budgets
Define end-to-end latency requirements:
Measure in production conditions, not just on development machines. Account for network latency, concurrent workloads, and thermal throttling on edge devices.
| Pipeline Stage | Budget | Measurement |
|---------------|--------|-------------|
| Frame capture | 33ms (30fps) | Camera specs |
| Preprocessing | 10ms | Profiler |
| Inference | 100ms | Triton metrics |
| Post-processing | 20ms | Profiler |
| Alert dispatch | 50ms | Network monitor |
| Total | 213ms | End-to-end test |
Edge pipelines (DeepStream/Triton) with batching
NVIDIA DeepStream and Triton Inference Server provide production-grade edge inference:
Pipeline architecture
1. Capture: RTSP streams from IP cameras
2. Decode: Hardware-accelerated video decode (NVDEC)
3. Batch: Accumulate frames from multiple streams
4. Inference: Run batched inference on GPU (Triton)
5. Track: Multi-object tracking across frames
6. Alert: Detect events and dispatch to upstream systems
Use DeepStream's gst-launch pipelines for low-latency streaming or custom Python/C++ applications for complex logic.
Batching strategy
Batch size trades off latency vs. throughput:
Choose based on your use case. Real-time alerting needs small batches; overnight analysis can use large batches.
Drift detection and retraining hooks
Models degrade over time as conditions change. Detect drift and trigger retraining:
Monitoring signals
Track these metrics continuously:
Set thresholds and alert when metrics cross into red zones.
Retraining workflow
1. Detect drift signal
2. Sample recent edge cases (low confidence, human corrections)
3. Label and add to training set
4. Retrain model with augmented dataset
5. Validate on test set (must meet original accuracy gates)
6. Deploy to edge devices via OTA update
Automate steps 1-2 and 6. Keep humans in the loop for steps 3-5 until you have high confidence in automated pipelines.
Reviewer tooling and evidence packaging
Edge CV isn't fully autonomous—humans review edge cases, validate alerts, and provide ground truth for retraining.
Review UI requirements
Build tooling that enables efficient review:
Measure reviewer throughput (items/hour) and tune UI to maximize efficiency.
Evidence packaging
When CV detects events, package evidence for downstream consumers:
Export in standard formats (JSON + MP4) for integration with MES, ERP, or analyst tools.
Scaling: health telemetry, upgrades, and costs
Deploying to dozens or hundreds of edge devices requires operational discipline.
Health telemetry
Monitor every device:
Aggregate metrics in central dashboard. Alert on anomalies (thermal throttling, memory leaks, network issues).
OTA upgrade strategy
Deploy model and software updates safely:
1. Canary: Deploy to 1-2 devices, monitor for 24 hours
2. Pilot: Expand to 10% of fleet, monitor for 48 hours
3. Rollout: Deploy to remaining devices in waves
4. Rollback: Maintain previous version as fallback
Use device management platforms (Balena, AWS IoT, custom) to orchestrate deployments.
Cost model
Edge CV costs include:
Factor in total cost of ownership over 3-5 year lifespan when comparing to cloud-based alternatives.