Artificial Intelligence-Driven Detection of Pushing Behavior in Human Crowds

In this research, a novel automatic AI-based framework has been developed to detect pushing behavior in human crowds, specifically in video recordings and live camera streams of crowded event entrances. The primary goal is to provide organizers and security teams with the necessary knowledge to alleviate pushing behavior and its associated risks, enhancing crowd comfort and preventing potential life-threatening situations. The framework consists of three phases:

An example of pushing behavior.

First: Identifying pushing regions in video recordings of crowds

In this phase, we developed a new methodology utilizing a pre-trained deep learning optical flow model, an adapted and trained EfficientNetV1B0-based Convolutional Neural Network (CNN), and a false reduction algorithm. This approach aims to identify regions in crowd videos where individuals are engaging in pushing. Each identified region corresponds to an area between 1 and 2 square meters on the ground. By identifying these pushing regions, we can gain a better understanding of the timing and location of such behavior. This knowledge is essential for developing effective crowd management strategies and improving the design of public spaces.

The approach has been trained and evaluated using several real-world experiments, each simulating a straight entrance with a single gate. Experimental results show that the proposed approach achieved an accuracy and F1 score of 88%.

Visualized example of an annotated video using the proposed approach (first phase). Green boxes indicate identified pushing regions.
Artificial Intelligence-Driven Detection of Pushing Behavior in Human Crowds
Confusion matrix of the proposed approach in the first phase.

Second phase: Real-time detecting pushing region in live camera streams of dense crowds

In the second phase, we proposed a new cloud-based deep learning approach to identify pushing regions within dense crowds at an early stage. Automatic and timely identification of such behavior would enable organizers and security forces to intervene before situations become uncomfortable. Furthermore, this early detection can help assess the efficiency of implemented plans and strategies, allowing for identification of vulnerabilities and potential areas of improvement.

This approach combines a robust, fast, and pre-trained deep optical flow model, an adapted and trained EficientNetV2B0-based CNN model, and a color wheel method to accurately analyze the video streams and detect pushing patches. Additionally, it leverages live capturing technology and a cloud environment to provide powerful computational resources, enabling the real-time collection of crowd video streams and the delivery of early-stage results. 

Several real-world experiments, simulating both straight and 90° corner crowded event entrances with one and two gates, were used to train and evaluate the cloud-based approach. The experimental findings indicate that this approach detected pushing regions from the live camera stream of crowded event entrances with an 87% accuracy rate and within a reasonable time delay.

Visualized example of an annotated video stream. Green boxes indicate the predicted pushing regions identified by the cloud-based approach. The annotation in the stream may be delayed by up to four seconds.
Artificial Intelligence-Driven Detection of Pushing Behavior in Human Crowds
Confusion matrix of the cloud-based approach.

Third phase: Annotating individuals engaged in pushing within crowds

While the first two phases focus on region-based detection, the third phase introduced a novel approach to identify individuals engaged in pushing within crowd videos. By analyzing the dynamics of pushing behavior at the microscopic level, we can gain more precise insights into crowd behavior and interactions.

The presented approach extends the previous models by laveraging the Voronoi Diagram. Additionally, it uses pedestrian trajectory data as an auxiliary input source. Similar to the second phase but with more real-world experiments, this approach was trained and evaluated. The experimental findings demonstrate that the approach achieved an accuracy of 85%.

An illustrated example of an annotated video stream generated by the Voronoi-based CNN approach. Green boxes indicate pushing persons.
Artificial Intelligence-Driven Detection of Pushing Behavior in Human Crowds
Confusion matrix of the Voronoi-based CNN approach.

Articles

Last Modified: 04.07.2024