1 Introduction

Gathering of a large number of people in a confined area may be the source of dangerous events. Participants of concerts, sport games and other similar ceremonies are imposed to serious physical injuries, and in the worst case they might even lose their life. The emergency situations occurred many times in the history [7, 24, 25]. One of the dangerous situations is a crush which may be caused by the obstructed pedestrian pathway. Obstruction of passages or exits may occur during overcrowded events, where many people are gathered on a small area. Such a situation may take place i.e. in a sport object or an entertainment hall during a football game, a concert, etc. in the moment when every participant intends to leave the building. Regardless of the existence of multiple exits in the building, people tend to choose their known way, for example the way they entered the facility. This may result in rising of a significant slowdown or formation of blockages in places such as passages, halls near door or elevators. Nowadays, monitoring systems are commonly used in such objects. In parallel with popularization of monitoring systems, a continuous process of refining surveillance algorithms can be observed, not only for the video but also often for the audio processing algorithms [12]. Conventional video surveillance systems employ object detection and tracking in order to extract the moving people or car images from the background [4, 5]. Research subjects range from particular algorithms [6, 23] to complete video monitoring systems like systems for event detection in underground stations [13]. The analysis of crowded scenes is much more complex when compared to the non-crowded ones due to problems of detection and tracking of individuals. Various subjects of the video analysis of a crowd are undertaken by scientists. For example: crowd behavior classification results are interpreted as normal or abnormal, by finding the corresponding motion attributes related to opposite movement in a crowd, division, fighting, and others [10, 19, 21]. Another work focus on estimation of a crowd density using diverse methods, like recognition of the head contour using Haar wavelet transform (HWT) and support vector machines (SVM) [27], or image texture statistical analysis [18]. As an alternate example, a system can serve for the detection of overcrowding in underground station platform [16]. In this paper a video analytics algorithm for monitoring egress from buildings or rooms is proposed. The main focus is put on detection of blocking of pedestrian flow near bottlenecks such as door or narrow corridors. The state of pedestrian flow in a given area is determined by examining the rates of movement of pedestrians and their density, obtained by video analysis algorithms. Determination of flow velocity of the crowd does not require detection and tracking of individuals, and even it would be cumbersome to implement. Hence, a method based on optical flow combined with fuzzy logic is utilized for this purpose. The average velocity of flow is calculated in selected points in the frame, according to the direction of the movement of people. An estimation of the state of congestion is performed employing fuzzy logic. For example: a fuzzy rule can be derived like this one: if the observed velocity at checkpoints is decreasing and the occupancy of the area is large, a lock of the area is probable. Automatic recognition of the degree of congestion at critical points allows detecting the lock and responding appropriately, for example, by identifying other exit paths. The processing of the video signal in nearly real time is required to provide a practically usable solution. Therefore, a supercomputer was employed to provide the hardware base for performing the calculations on video streams. The management of the resources of the computing cluster during acquisition and processing of multimedia data is supported by the dedicated platform called KASKADA engineered at Gdansk University of Technology [14].

2 Algorithm

Crowding detection is realized in several stages. In the initial phase optical flow field is determined for estimating the speed and the direction of movement of pedestrians. The input for the algorithm performing this task consists of two consecutive image frames. The next step is to analyze the traffic by calculating the average velocity of the stream of pedestrians in each of the previously defined checkpoints. The next step is to determine the occupation of the area by the method of image background subtraction. The obtained information is provided to the input the of fuzzy logic system, whose task is to give the final result of determining the state of congestion of the analyzed area. Illustration of the concept of the detector is shown in Fig. 1. The assumption is made that no other objects are visible than people appearing in the investigated area. Therefore, any additional recognition of object type is not required. The control lines k i are perpendicular to the exit path and they are defined separately for each camera. The average speed of people v i obtained by the optical flow method is determined for vectors crossing each control line. The speed vectors v i are calculated synchronously in each image frame.

Fig. 1
figure 1

The concept of clogging detector: k 0 ,k 1 ,k 2–control lines, v 0 ,…–pedestrian flow speed

2.1 Motion detection

The method based on calculation of the optical flow was utilized for detecting of crowd motion speed and direction. The algorithm utilized for obtaining the optical flow field is employing CLG (Combined Local Global) method [3]. Similarly to the characteristic for optical flow algorithms coarse-to-fine strategy, this algorithm uses a multigrid approach, where estimates of the flow are passed both up and down along the hierarchy of approximations. The algorithm combines the advantages of the global Horn-Schunck approach [8] and the local Lucas-Kanade method [17]. Moreover, it was the best-performing algorithm according to the comparison study [1].

The CLG method computes the optical flow field (u(x,y),v(x,y))T of the image sequence f(x,y,t) at instant t by solving a system of the partial differential equations [3]. The solution is found by the multigrid methods [2, 26]. Typically 4 levels of grid density were utilized, starting from the full image size.

2.2 Pedestrian flow analysis

The obtained continuous flow field (motion direction and velocity determined for each pixel) is sampled in fixed spatial density (Δx,Δy). Vectors extracted in this way intersect with control lines. During the processing of subsequent image frames m and m + 1, having defined the number of control lines K, we obtain sets of vectors representing instantaneous velocity v m,k,i ,

where: k—control line number, k = 1…K,

i = 1… N k , N k —number of vectors which intersect control line k.

Motion velocity at each control line k is calculated according to Eq. (1):

$$ {v}_{m, k}=\frac{1}{N_k}{\displaystyle \sum_i{v}_{m, k, i}} $$
(1)

The final value of velocity v k is found as a result of temporal averaging of speed ((1)) in the defined M frames period:

$$ {v}_k=\frac{1}{M}{\displaystyle \sum_{m=1}^M{v}_{m, k}} $$
(2)

The parameter z which represents occupancy of area is obtained with the use of the background subtraction method [9] as defined in Eq. (3):

$$ z=\frac{P_{FG}}{P_{TOTAL}} $$
(3)

where: P FG —number of pixels not qualified as foreground, P TOTAL —total number of pixels in the image, or more precisely, in the detection area.

Fuzzy logic is employed for making an assessment of the state of pedestrian flow [11, 28]. The parameters determined at previous stages of the processing, namely: velocity and area occupancy constitute the input data to the decision-making system. The Mamadani’s method was used as fuzzy inference technique. Membership functions defined for parameters v k and z defined by Eqs. (2) and (3), named Speed{k} and Occupancy, have triangular shape. In the discussed case 3, fuzzy sets were used to partition the input space: low (L), medium (M) and high (H). The output fuzzy sets are as follows: none (N), low (L), medium (M) and high (H). The rules, in this example for 3 control lines, are of the form:

IF (Speed1 is {L,M,H}) AND (Speed2 is {L,M,H}) and (Speed3 is {L,M,H}) AND (Occupancy is {L,M,H}) then (Output is {N,L,M,H}). The membership functions for the discussed system are presented in Fig. 2. For the fuzzy rule inference the fuzzy union and the intersection operators are applied. The centroid method is utilized in the defuzzification procedure.

Fig. 2
figure 2

Membership functions of input and output variables

3 Experiments

The experiments were carried out on the set of gathered video recordings from the surveillance camera installed in the campus of Gdansk University of Technology. Two cameras mounted in the proximity of a lecture hall exit were utilized for gathering the test material. The camera views with indicated crowding detection areas are presented in Fig. 3. The experimental material consisted of 60 recordings. Two types of egress were recorded, namely a normal one, where people flow is fluent and an obstructed one. The efficiency of the algorithm was determined by comparing the algorithm outcomes to the reference data prepared manually by an expert. The recordings content presents people exiting from the lecture hall, whereas the reference data describing the degree of crowding near the door which were calculated for each frame of the video. The degree of crowding can be regarded as a function of two variables as in Eq. (4):

Fig. 3
figure 3

Camera view and crowding detection area in Auditorium L (left) and Auditorium R (right)

$$ R(t)= f\left( v, z, t\right) $$
(4)

where: v—an average speed of pedestrian flow, z—density of the crowd in the area adjacent to the door.

The preparation of the reference data by the expert was based on the analysis of the number of people and their movement speed on the way leading towards the exit, according to literature guidelines [20]. The annotation is based on a textual description of degree of crowding categorized as follows: none, low, medium, high. ‘None’ means normal situation (undisturbed flow), ‘high’ corresponds to crowded flow. The classification of crowding was made by observation of the number of people and their movement speed in the area adjacent to the exit door. For example, if the number of people was high, which means high density per square meter and their movement speed was low in a specified time period, then the category of ‘high’ was annotated for this time period. Mathematically each category i can be regarded as a tuple (L i,min , L i,max ), which describes its lower and upper boundary. For example ‘none’ is represented by (0, 0.25). The measure of algorithm quality (Q) is defined as the ratio of number of algorithm results (R t ) matching the expert indication (R ref ) over time to the total number of results (N):

$$ Q=\frac{R_{ref}}{N} $$
(5)

where:

$$ {R}_{r ef}={\displaystyle \sum_{t= o}^T{\displaystyle \sum_{i=1}^C{r}_{i, t}}} $$
(6)
$$ {r}_{i, t}=\left\{\begin{array}{c}\hfill 1,\kern.3em {L}_{i, min}\le {R}_t<{L}_{i, max}\hfill \\ {}\hfill 0,\kern1em otherwise\hfill \end{array}\right. $$
(7)

C—number of categories, in this case equal to 4.

For a single experiment, N is equal to the number of video frames.

An analysis of pedestrian movement speed and direction was made before testing the algorithm, for the selected recordings which represent undisturbed flow and crowded flow. Movement speed was calculated as a mean of optical flow magnitude in the detection area, and the threshold of 0.5 was applied in order to eliminate insignificant vectors. Movement direction was calculated as a mean of optical flow phase in the detection area:

$$ \overline{\phi}= \arctan \left(\frac{1}{n}{\displaystyle \sum_{j=1}^n \sin \left({\phi}_j\right),}\kern0.5em \frac{1}{n}{\displaystyle \sum_{j=1}^n \cos \left({\phi}_j\right)}\right) $$
(8)

The result of pedestrian movement speed and direction obtained for undisturbed flow is presented in Fig. 4 and crowded flow case in Fig. 5. We can observe that in case of undisturbed flow motion the speed is about 3 times higher than in the crowded case. The movement direction is uniform in the undisturbed flow case and in the second case fluctuations of the angle can be noted. It is related to commonly observed people swaying at low movement speed and is referred to lateral oscillations [15, 22]. Moreover, in the first 40 s (1,000 frames) of the crowded case duration (Fig. 5) the pedestrians move fluently, thus a high value of speed and a stable angle value are observed.

Fig. 4
figure 4

Average motion speed and mean angle in case of undisturbed flow, Auditorium L

Fig. 5
figure 5

Average motion speed and mean angle in case of crowded flow, Auditorium L

The chart depicted in Fig. 4 represents a continuous fragment of recorded video, therefore we can observe a distortion caused by one person exercising counterflow. Low pedestrian movement speed value in frames between 0 and 180, 747 and 965, 1,170 and 1,280, 2,380 and 2,460 shows time instants in which people were not present in the detection area.

Figures 6 and 7 illustrate obtained movement speed and angle values for crowded and for free flow, respectively, in the Auditorium R. Similar observations can be made as above, nevertheless in case of free flow there is a more coherent value of motion speed than in the sample presented in Fig. 4. This is caused by the continuous pedestrian flow while exiting.

Fig. 6
figure 6

Average motion speed and mean angle in case of crowded flow, Auditorium R

Fig. 7
figure 7

Average motion speed and mean angle in case of undisturbed flow, Auditorium R

Table 1 presents time-averaged values of the movement direction \( \overline{\varphi} \) and the standard deviation for each discussed case. For the undisturbed flow in Auditorium L, a time series where counterflow and movement discontinuities occurred, were not included in the calculation of time average.

Table 1 Comparison of time-averaged value and standard deviation of angle representing movement direction \( \overline{\varphi} \)

The level of crowding (R) obtained as a result of the algorithm application, is shown in Fig. 8. The temporal averaging period was 10 frames (compare to Eq. 2). High crowding is represented by a high value of R and, similarly, low crowding matches a low value of R.

Fig. 8
figure 8

Comparison of algorithm output (R) for Auditorium L for undisturbed flow (black) and for overcrowding (gray)

The algorithm quality (Q) was assessed for the discussed cases. An illustration for crowded flow in Auditorium L is presented in Fig. 9. In case of crowded flow Q was 0.94 for Auditorium L and 0.96 for Auditorium R. In case of free flow Q was 0.99 for Auditorium L and 0.95 for Auditorium R.

Fig. 9
figure 9

Comparison of algorithm output (gray) of expert decision (black) for overcrowding case in Auditorium L

4 Conclusions

The concept, the implementation and the practical utilization of the algorithm for the detecting of potentially dangerous situations in the crowd were presented. Based on experimental results shown in the paper and on a bigger set of all obtained results, a conclusion can be made that the proposed algorithm is sufficiently effective for detecting the pedestrian crowding near passage bottlenecks. In the future, some enhancements of the algorithm can be done, for example inclusion of the feature of creating statistics of crowding near a particular building exit. Moreover, a connection of multiple cameras to the system is planned in order to enable the pedestrians route prediction function correlated to the current situation in large buildings.