Human detection

Tracking people and their movement

Where It's Applied

In my practice, I developed a real-time people detection and tracking system for attendance analytics and people flow management. The system uses computer vision to detect people in camera video streams, tracks their movement, measures time spent in specific zones (queue, checkout, sales floor), and generates visitor analytics. This enables retail companies, shopping centers, and other venues to optimize operations, improve service, and make decisions based on people flow data.

Who Will Benefit

I recommend this solution to retail stores and shopping centers for attendance analytics, peak load identification, and cashier optimization. Cafes and restaurants for queue management and service time monitoring. Banks and government offices for people flow management and schedule optimization. Airports and transportation hubs for passenger flow tracking. Logistics centers and warehouses for people movement control and safety compliance. Sports venues and events for visitor counting. Security and surveillance systems for unauthorized access monitoring and suspicious activity tracking.

Technologies

YOLOv12 for Object and People Detection

I use YOLOv12 (You Only Look Once) — a modern neural network architecture for real-time object detection. YOLOv12 provides several model sizes (nano, small, medium, large, xlarge) for speed and accuracy optimization. In practice, the system detects people in video with 95%+ accuracy even in challenging conditions (crowds, partial occlusion, various poses).

YOLO's advantage over other algorithms: the system processes the image in a single pass (hence "Look Once"), enabling real-time operation. The system not only detects people but also returns bounding box coordinates (rectangle around the person) and detection confidence.

Python and OpenCV Integration

The entire system is developed in Python using OpenCV for camera video capture and frame processing. I use the ultralytics library for convenient YOLOv12 operation. Python enables rapid system development and component integration. In practice: video stream loading → YOLO processing of each frame → people tracking → data analysis and report generation.

People Tracking and Tracking Algorithms

Detection is just the first step. For tracking, I apply tracking algorithms that link detected people across frames: ByteTrack or DeepSORT enable tracking specific people throughout the video and assigning unique IDs. The system knows person with ID #123 entered the store, went to checkout, then left, and can calculate how long they spent in the store.

In practice: tracking accuracy is ~90% (sometimes the system may lose track in crowds but usually recovers). Unique IDs allow building people trajectories and analyzing their behavior.

Region of Interest (ROI) Definition

I define special zones of interest in the video: checkout queue, checkout counter, entrance door, sales floor. For each zone, the system tracks: how many people are in the zone currently, how long each spent in the zone, when people entered and exited. In practice: the system can report "checkout #1 queue has 5 people currently, average wait time is 3 minutes."

People Counting and Entry-Exit Analytics

The system defines virtual "entry" and "exit" lines in the video. When a person crosses these lines, the system counts them. This enables real-time knowledge of: how many people entered the store, how many exited, current occupancy. The system can track these metrics over time and build graphs (attendance peaks, busiest days of the week).

In practice: a store receives a detailed daily report "from 9:00 to 10:00 — 150 people entered, 120 exited, peak occupancy at 9:45 was 80 people simultaneously."

CUDA GPU Acceleration

All computations use NVIDIA GPU with CUDA for maximum performance. On CPU, processing one frame can take 100+ ms (too slow for real-time). On GPU (CUDA), processing one Full HD frame takes 10-30ms, enabling 30-60 fps real-time processing.

I use optimized model versions (INT8 quantization, pruning) that run faster without significant accuracy loss. On NVIDIA RTX 3070, the system can simultaneously process 3-5 Full HD video streams, enabling monitoring multiple cameras from one machine.

Multi-Camera System

The system supports simultaneous connection of multiple IP cameras (via RTSP protocol) or USB cameras. Each video stream is processed in a separate thread or process. In practice: a store can install cameras at entrance, checkouts, and sales floor, with the system analyzing all streams and providing unified store statistics.

Time Metrics and Dwell Time

The system tracks how long people spend in each zone. For checkouts this is critical: if average checkout time exceeds 5 minutes, another checkout needs opening. The system calculates: minimum, maximum, average, and median zone dwell time, helping identify issues (if one person stays an hour, something's wrong).

Integrated Security System

The system can detect abnormal behavior: person standing too long in one place (potentially suspicious), large crowd in a specific zone (potential safety issue), people moving in wrong direction (e.g., against flow). Upon anomaly detection, the system can alert security or record video for later analysis.

Video Recording and Archiving

The system can record video to disk or cloud (S3) with annotations (bounding boxes around people, their IDs, movement lines). Recordings are indexed by time and metadata for quick retrieval of specific periods. The system can automatically delete old videos per retention policy.

Heatmap and Trajectory Generation

Based on tracked people trajectories, the system can build a "heatmap" — a visualization showing where people spend most time or have highest movement frequency. This information helps retail stores optimize layout: if people frequently gather near a spot, high-demand merchandise can be placed there.

Dashboards and Real-Time Monitoring

I created a web application (Vue.js + FastAPI) for live statistics display: current store occupancy, queue lengths, average checkout wait times, entry-exit graphs. Managers can open the dashboard on an iPad on the store wall for real-time information. The system can send alerts for anomalies (queue too long, peak load).

Store Management System Integration

The system can integrate with POS systems and store management systems: if the system detects 20 people in queue, it can signal to open another checkout. Or: if sales floor is very empty, it can suggest launching a promotion.

Architecture and Scaling

Architecture: Python backend using ultralytics (YOLO), OpenCV, local video processing on GPU machines, PostgreSQL database for analytics storage (entries-exits, zone dwell times, trajectories), Redis for real-time data caching, S3 for video archive, Vue.js frontend for dashboards. System scales: additional GPU machines can be added for new cameras, load balanced through a load balancer.

Privacy and GDPR Compliance

The system doesn't store people's faces or other identifying data — only anonymous IDs, trajectories, and time metrics. This means the system complies with GDPR and doesn't require consent (since faces aren't processed). In practice: the system crops out faces (or blurs them) before archiving video.

Usage Examples

The system can: in a clothing store — identify overcrowding at fitting rooms, recommend opening more fitting rooms; in a supermarket — show checkout #3 is slower (customers spend more time), recommend cashier training; at an airport — warn if one zone accumulates 500+ people (safety risk); at a restaurant — show peak load 12:30-13:00, recommend increasing staffing during that period.

Important Organizational Considerations

First — camera quality and placement. The system works only if cameras are properly installed: they must see all people, without blind spots, with good lighting. I recommend auditing camera placement before system deployment — poorly placed cameras give incorrect statistics.

Second — detection accuracy depends on conditions. In crowds, poor lighting, or when people wear camouflaging clothing, the system may perform worse. I always recommend testing the system in real conditions before final deployment.

Third — privacy and notification. People must know they're being video monitored. I recommend posting visible "Video Surveillance" signs. Also ensure the system complies with local legislation (GDPR, local privacy laws).

Fourth — data interpretation requires attention. High people counts aren't always good (could indicate queues, frustrated customers). Low counts aren't always bad. Context and reasons matter. I recommend using the system alongside customer surveys and sales analysis.

Fifth — maintenance. Cameras need periodic cleaning and position verification (cameras can shift over time). The system can self-detect issues (sudden drop in detected people indicates camera problems). I recommend weekly checks.

Sixth — scaling as the business grows. If the store expands with new areas, new cameras are needed. The system scales easily but verify the GPU machine has sufficient resources. Alternatively, add a second GPU machine for processing new streams.

Seventh — business process integration. The system is most valuable when results drive actual optimization: if data shows a problem, act on it (open another checkout, change schedule, optimize process). Without action, the system becomes just "pretty visualization" with no value.

Contact me on Telegram →