Choosing sites and datasets that generalize
PoCs often succeed on carefully selected test data but fail in production when conditions vary. To build models that generalize:
Site selection strategy
Choose pilot sites that represent your operational diversity:
- Varied lighting conditions (natural, artificial, mixed)
- Different camera angles and mounting heights
- Range of product/object variations
- Typical background clutter and occlusions
Don't pick your "easiest" site for pilot. Pick representative sites that will stress-test your models.
Dataset requirements
Collect training data that covers:
- All expected object classes and variations
- Edge cases and failure modes
- Seasonal and temporal variations (if relevant)
- Multiple sites and camera positions
Aim for 1,000-5,000 labeled examples per class minimum. More is better, but quality trumps quantity. Ensure diverse, representative samples.
Accuracy & latency gates (and how to measure)
Set quantitative acceptance criteria before deployment. Don't settle for "it looks good." Measure precisely.
Accuracy gates
Define per-class thresholds:
- Precision: ≥95% (minimize false positives)
- Recall: ≥90% (minimize false negatives)
- F1 score: ≥92% (balanced measure)
Measure on held-out test sets from production sites. Track confusion matrices to identify systematic errors.
Latency budgets
Define end-to-end latency requirements:
- Camera capture → inference → alert: <500ms for real-time use cases
- Batch processing: <1 hour for overnight jobs
- Model load time: <30 seconds for edge device startup
Measure in production conditions, not just on development machines. Account for network latency, concurrent workloads, and thermal throttling on edge devices.
| Pipeline Stage | Budget | Measurement |
|---|---|---|
| Frame capture | 33ms (30fps) | Camera specs |
| Preprocessing | 10ms | Profiler |
| Inference | 100ms | Triton metrics |
| Post-processing | 20ms | Profiler |
| Alert dispatch | 50ms | Network monitor |
| Total | 213ms | End-to-end test |
Edge pipelines (DeepStream/Triton) with batching
NVIDIA DeepStream and Triton Inference Server provide production-grade edge inference:
Pipeline architecture
- Capture: RTSP streams from IP cameras
- Decode: Hardware-accelerated video decode (NVDEC)
- Batch: Accumulate frames from multiple streams
- Inference: Run batched inference on GPU (Triton)
- Track: Multi-object tracking across frames
- Alert: Detect events and dispatch to upstream systems
Use DeepStream's gst-launch pipelines for low-latency streaming or custom Python/C++ applications for complex logic.
Batching strategy
Batch size trades off latency vs. throughput:
- Batch size 1: Lowest latency (~50ms), lower GPU utilization
- Batch size 8: Moderate latency (~120ms), high throughput
- Batch size 32: High latency (~300ms), maximum throughput
Choose based on your use case. Real-time alerting needs small batches; overnight analysis can use large batches.
Drift detection and retraining hooks
Models degrade over time as conditions change. Detect drift and trigger retraining:
Monitoring signals
Track these metrics continuously:
- Confidence distribution: Falling average confidence indicates drift
- Prediction entropy: Rising entropy suggests uncertainty
- Human corrections: Increased override rate signals model mismatch
- Performance metrics: Declining accuracy on validation sets
Set thresholds and alert when metrics cross into red zones.
Retraining workflow
- Detect drift signal
- Sample recent edge cases (low confidence, human corrections)
- Label and add to training set
- Retrain model with augmented dataset
- Validate on test set (must meet original accuracy gates)
- Deploy to edge devices via OTA update
Automate steps 1-2 and 6. Keep humans in the loop for steps 3-5 until you have high confidence in automated pipelines.
Reviewer tooling and evidence packaging
Edge CV isn't fully autonomous. Humans review edge cases, validate alerts, and provide ground truth for retraining.
Review UI requirements
Build tooling that enables efficient review:
- Queue of flagged items (low confidence, alerts, samples)
- Side-by-side comparison (model prediction vs. ground truth)
- Quick annotation actions (approve, reject, correct)
- Keyboard shortcuts for power users
- Progress tracking and quotas
Measure reviewer throughput (items/hour) and tune UI to maximize efficiency.
Evidence packaging
When CV detects events, package evidence for downstream consumers:
- Video clip: 5-10 seconds surrounding event
- Metadata: Timestamp, camera ID, confidence scores
- Annotations: Bounding boxes, class labels
- Context: Related events, historical patterns
Export in standard formats (JSON + MP4) for integration with MES, ERP, or analyst tools.
Scaling: health telemetry, upgrades, and costs
Deploying to dozens or hundreds of edge devices requires operational discipline.
Health telemetry
Monitor every device:
- System: CPU, GPU, memory, disk, temperature
- Pipeline: Frame rate, inference latency, queue depth
- Model: Prediction counts, confidence distribution
- Network: Bandwidth usage, packet loss, latency
Aggregate metrics in central dashboard. Alert on anomalies (thermal throttling, memory leaks, network issues).
OTA upgrade strategy
Deploy model and software updates safely:
- Canary: Deploy to 1-2 devices, monitor for 24 hours
- Pilot: Expand to 10% of fleet, monitor for 48 hours
- Rollout: Deploy to remaining devices in waves
- Rollback: Maintain previous version as fallback
Use device management platforms (Balena, AWS IoT, custom) to orchestrate deployments.
Cost model
Edge CV costs include:
- Hardware: $500-$5,000 per device (Jetson Orin, industrial PCs)
- Cameras: $200-$1,000 per camera (IP cameras, lenses, mounts)
- Connectivity: $50-$200/month per site (network, VPN)
- Maintenance: 10-20% of hardware cost annually
Factor in total cost of ownership over 3-5 year lifespan when comparing to cloud-based alternatives.