People Detection, Tracking, and Zone Analytics: Technical Description
A practical breakdown of a people detection system on Python + YOLOv26: from video stream inference to entry/exit counting, zone dwell time, and realtime alerting.
TL;DR
The task is detecting people in a video stream, assigning unique IDs (tracking), and behavior analytics: line crossing, zone dwell time, counting.
YOLOv26 for detection + ByteTrack/BoT-SORT for tracking. ONNX/TensorRT for realtime GPU inference.
Line Crossing β detecting when a person crosses a virtual line with direction (entry/exit).
Zone Dwell Time β precise measurement of how long each person stays in a zone (useful for retail: checkout time, queue time, shelf time).
The system logs all events to CSV/DB and sends realtime alerts via webhook/Telegram on violations.
Problem statement
The system must process a video stream from IP cameras in realtime, detect people, assign unique IDs, and produce structured analytics: who entered, who exited, how long they spent in a zone, and whether perimeter violations occurred.
Detection
YOLOv26 (Ultralytics) β state-of-the-art detector. Variants: nano/small for edge, medium/large for servers.
Tracking
ByteTrack or BoT-SORT β assign a unique ID to each person and track across frames.
Zone analytics
Line crossing (entry/exit), dwell time (time in zone), crowd density (people count in an area).
Alerting
Realtime notifications on intrusion, loitering, zone crowd threshold exceeded.
Key scenario: a retail store: counting visitors in/out, checkout time, queue analytics, density heatmaps β all automatically from the video stream.
Architecture Pipeline
Camera (RTSP)
β
Frame Reader (OpenCV / ffmpeg)
β
YOLOv26 Inference (ONNX / TensorRT)
β
Tracking (ByteTrack / BoT-SORT)
β
Zone Engine (line crossing + dwell time)
β
Event Logger (CSV / DB / webhook / Telegram)
β
Dashboard / API
Each pipeline stage is independent: swap the model, switch the tracker, configure zone analytics via config β without code changes.
YOLOv26: people detection
YOLOv26 (Ultralytics) is the foundation. The model takes a frame (RGB, 640Γ640) and returns bounding boxes with confidence scores and class labels (person = class 0 in COCO).
Model choice depends on hardware and FPS requirements:
Inference: PyTorch for development and debugging. ONNX or TensorRT for production β 2β5x GPU speedup. For edge devices (NVIDIA Jetson), TensorRT is mandatory.
Filtering: confidence threshold (default 0.5), NMS IoU threshold (0.45), class filter (person only), minimum bbox size (removes false positives in the background).
Tracking: ByteTrack / BoT-SORT
Tracking turns a set of disconnected bounding boxes into a coherent movement history for each person. Without tracking, counting entries/exits or dwell time is impossible.
ByteTrack β associates bboxes across frames by IoU (Intersection over Union) and visual similarity. Each person gets a track_id that persists while they remain in frame.
BoT-SORT β improved alternative: handles camera motion and works better with partial occlusions. Recommended for dense scenes.
Tracking parameters:
1) track_buffer β how many frames to keep a lost track (default: 30). At 15 FPS, the track "survives" 2 seconds after disappearing.
2) match_thresh β IoU threshold for association (default: 0.8). Higher = stricter matching.
3) A new ID is assigned when a new object appears. Old tracks are removed after track_buffer expires.
Line Crossing: detecting line intersections
Line Crossing detects when the center of a person's bounding box crosses a virtual line. This is the basis for counting entries and exits.
How it works:
1) A virtual line is defined on the image by two points: (x1, y1) and (x2, y2). The line can be any orientation β horizontal, vertical, diagonal.
2) For each track on each frame, the position of the bbox center relative to the line is checked using the cross product. If the sign changes between two consecutive frames β a crossing occurred.
3) Direction is determined by the sign of the cross product: positive = crossing left-to-right (entry), negative = right-to-left (exit). Directions are configurable.
4) Each crossing is logged: track_id, direction (in/out), timestamp, crossing coordinate, bbox confidence.
def check_line_crossing(prev_pos, curr_pos, line_start, line_end):
d1 = cross_product(prev_pos, line_start, line_end)
d2 = cross_product(curr_pos, line_start, line_end)
if d1 * d2 < 0: # sign changed β crossing
direction = 'in' if d2 > 0 else 'out'
return True, direction
return False, None
Multiple independent lines are supported on a single frame β for different entry/exit points, gates, and passages.
Zone Dwell Time: time spent in a zone
Zone Dwell Time measures the total time each person spends inside a defined area. A key metric for retail (checkout time, shelf time, queue time) and industrial (time in a hazardous zone).
How it's calculated:
1) A zone is defined as a polygon (array of points) or rectangle on the frame. One camera can have multiple zones.
2) On each frame, the system checks whether the bbox center is inside the polygon (point-in-polygon test).
3) If the track is in the zone β delta_time (1/FPS seconds) is added to the dwell_time accumulator for this track_id in this zone.
4) When the track leaves the zone or is lost β a "dwell complete" event is logged with track_id, zone_id, total_dwell_time.
if point_in_polygon(center, zone_polygon):
dwell_tracker[track_id][zone_id] += 1.0 / fps
else:
if track_id in dwell_tracker:
total = dwell_tracker[track_id].pop(zone_id)
log_event('dwell_complete', track_id, zone_id, total)
Practical applications:
Retail: checkout time β average dwell time at registers = average service time. Rising dwell time = not enough cashiers.
Retail: queue analytics β dwell time in queue zone + people count in zone = predicted wait time.
Industrial: dangerous zones β dwell_time exceeding a threshold triggers an alert. Person has been in the danger zone too long.
Warehouses: loading time β dwell time in the gate zone = vehicle loading/unloading time. Efficiency analytics.
Alerting and logging
The system generates structured events of five types:
Delivery channels: CSV (default) β for analytics and reports. PostgreSQL / SQLite β for production. Webhook (HTTP POST) β for ERP/helpdesk integration. Telegram Bot API β realtime alerts to responsible staff. MQTT β for IoT/edge integrations.
Tech stack
Common implementation mistakes
1) Counting entries/exits without tracking β each new bbox is a "new person", causing double-counting on occlusions.
2) Not tuning confidence threshold for lighting β too many false positives at night or false negatives with backlight.
3) Using a single line for bidirectional passages without direction β entries and exits merge into one number.
4) Ignoring track loss during occlusions β person behind a column loses their ID and gets a new one, dwell time resets.
5) Computing dwell time without FPS reference β variable FPS causes inaccurate accumulator values.
6) Not filtering small bboxes in the background β detections on the far plane create noise in statistics.
FAQ
Which YOLO model is best for people detection?
YOLOv26s is the optimal accuracy/speed tradeoff for most tasks. For edge (Jetson) β YOLOv26n. For maximum accuracy β YOLOv26l with TensorRT.
How many cameras can one GPU handle?
Depends on the model. YOLOv26s + TensorRT on RTX 5090: up to 16 streams at 15 FPS. On Jetson Orin Nano: 2β4 streams at 10 FPS.
How is checkout time measured?
A checkout zone is defined as a polygon. When a person's track enters the zone, timing starts. On exit β dwell_time is logged. Average across all visitors = average service time.
What happens when people overlap?
The tracker (BoT-SORT) maintains the ID for track_buffer frames (default: 30). If the person reappears within that window, the ID is preserved. Otherwise, a new ID is assigned.
Can employees be distinguished from visitors?
Not through YOLO detection alone β all people are the same class. Separation is done at the business logic level: by entry zone, time of day, PPE markers, or a separate classification model.
Key Takeaways
1) YOLOv26 + ByteTrack = reliable combo for realtime people detection and tracking.
2) Line Crossing via cross product β simple and accurate method for entry/exit counting.
3) Zone Dwell Time β FPS-based accumulator tied to track_id and zone_id. Key metric for retail.
4) TensorRT is mandatory for production: 2β5x inference speedup without accuracy loss.
5) Proper filtering (confidence, min bbox, track_buffer) matters more than model choice.
Who this is for
ML engineers, CV developers, video surveillance teams, retail analytics specialists, security professionals, and anyone building people counting, queue analytics, or zone monitoring systems.
Contact via Telegram β