Open AccessArticle

Multimodal Video Analysis for Crowd Anomaly Detection Using Open Access Tourism Cameras

Alejandro Dionis-Ros

^*,

Joan Vila-Francés

Rafael Magdalena-Benedito

Fernando Mateo

and

Antonio J. Serrano-López

Intelligent Data Analysis Laboratory (IDAL), University of Valencia, 46100 Burjassot, Spain

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 11075; https://doi.org/10.3390/app142311075

Submission received: 19 October 2024 / Revised: 19 November 2024 / Accepted: 25 November 2024 / Published: 28 November 2024

(This article belongs to the Special Issue Advanced Image Analysis and Processing Technologies and Applications)

Download

Browse Figures

Figure 1
Original and background frames shown before and after applying CLAHE processing and grayscale transformation. "> Figure 2
SSIM comparison results. (a) Displays detections provided by Yolo. (b) Shows various cutouts from the background frame corresponding to examples of a false positive, a false negative, and a true positive, respectively. (c) Presents cutouts from the original frame representing examples of a false positive, a false negative, and a true positive, respectively. "> Figure 3
Weekly distribution of median detection series. X-axis in intervals of 15 min. "> Figure 4
Weekly distribution of IQR detection series. X-axis in intervals of 15 min. "> Figure 5
Augmented detection series. "> Figure 6
Weekly distribution of the median of the heatmap series. X-axis in intervals of 15 min. "> Figure 7
Weekly distribution of the standard deviation of the heatmap series. X-axis in intervals of 15 min. "> Figure 8
Augmented heatmap saturation percentage series. "> Figure 9
Diagram of the employed methodology. "> Figure 10
STL decomposition of the detection series. "> Figure 11
STL decomposition of the heatmap saturation percentage series. "> Figure 12
Trend threshold in detection series. "> Figure 13
Trend threshold in heatmap saturation percenteage series. "> Figure 14
Plot of detection residual with point anomalies. "> Figure 15
Justification of anomalies in detection series. (a) 11 October 2023 10:15:00 (Anomaly). (b) 4 October 2023 10:15:00 (Previous week). (c) 20 September 2023 10:15:00 (3 weeks earlier). "> Figure 16
Plot of heatmap saturation percentage residual with point anomalies. "> Figure 17
Justification of anomalies in heatmap series. (a) 1 October 2023 10:45:00 (Previous day) [0.001601]. (b) 2 October 2023 10:45:00 (Anomaly) [0.012045]. (c) 3 October 2023 10:45:00 (Next day) [0.008394]. ">

Versions Notes

Abstract

In this article, we propose the detection of crowd anomalies through the extraction of information in the form of time series in video format using a multimodal approach. Through pattern recognition algorithms and segmentation, informative measures of the number of people and image occupancy are extracted at regular intervals, which are then analyzed to obtain trends and anomalous behaviors. Specifically, through temporal decomposition and residual analysis, intervals or specific situations of unusual behaviors are identified, which can be used in decision-making and the improvement of actions in sectors related to human movement such as tourism or security. This methodology introduces a novel, privacy-focused approach by analyzing anonymized metrics rather than tracking or recognizing individuals, setting a new standard for ethical crowd monitoring. Applied to the webcam of Turisme Comunitat Valenciana in the town of Morella (Comunitat Valenciana, Spain), this approach has shown excellent results, correctly detecting specific anomalous situations and unusual overall increases during the previous weekend and during the October 2023 festivities. These results have been obtained while preserving the confidentiality of individuals at all times by using measures that maximize anonymity, without trajectory recording or person recognition.

Keywords:

anomaly detection; multimodal analysis; time series; open data

1. Introduction

In today’s digital era, the amount of generated and stored data is growing at an exponential rate, with unstructured data types standing out among them. These data types, including text, images, audio, and video, offer new sources of knowledge. Applying a multimodal analysis approach, as in [1,2,3], involves combining different data formats and allows leveraging the richness of information inherent in each format, providing a more comprehensive and accurate understanding of the phenomena under study. Due to the high complexity of unstructured data, exploring the application of anomaly detection techniques becomes essential. These techniques allow identifying unusual patterns or atypical behaviors within datasets that can be extremely heterogeneous and difficult to interpret using conventional methods.

Given the growing reliance on such complex data, managing crowd flow efficiently and securely has become a critical priority in settings like tourism destinations, urban areas, and large public events. Public safety concerns and the need for optimized crowd management make anomaly detection essential for timely, informed decision-making. However, traditional approaches to crowd monitoring often pose privacy challenges, particularly when using video data.

To achieve this, the application of statistical techniques for anomaly detection in time series derived from video format data is proposed, using a multimodal approach. These series represent two key metrics: the number of people detected in a given time interval and the percentage of heatmap saturation indicating image occupancy. Anomalies are identified in intervals where the detected counts or saturation percentages deviate significantly from typical patterns, providing insights that can support decision-making and improve processes in sectors such as tourism and video surveillance. By foregoing tracking algorithms and applying measures that preserve anonymity during the analysis, the confidentiality of individuals is maintained at all times.

This work presents an innovative approach focused on analyzing anonymized metrics rather than employing tracking or individual identification methods. This addresses a gap in the literature, where privacy-centered, multimodal crowd analysis methods based on time series are rare.

Section 2 provides a review of the relevant literature in this area of study. Section 3 discusses the data and methodology employed in this research. The results obtained from applying this methodology are presented in Section 4. Section 5 offers an analysis of these results. Finally, Section 6 presents the main conclusions derived from this study.

2. Related Work

This section presents a synthesis of recent research on anomaly detection, object detection in public safety, and crowd analysis. It encompasses a range of methodologies, including advanced machine learning techniques for time series analysis, video surveillance, and real-time occupancy monitoring. Prominent approaches, such as Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), and YOLO (You Only Look Once) models, are examined for their effectiveness in anomaly detection, object tracking, and crowd behavior analysis. The studies also address key challenges related to enhancing detection accuracy, managing spatio-temporal data, and optimizing efficiency in complex real-world scenarios. A summary of the principal contributions in these areas follows.

In [4], a literature review is conducted, and a taxonomy for anomaly detection in time series is proposed. This taxonomy is distinguished according to the type of anomaly detected, which can be point anomalies, contextual anomalies, or collective anomalies; the characteristics of the series, i.e., whether it deals with univariate or multivariate series; the context of the data, i.e., considering the spatio-temporal characteristics of the series in the analysis; and the methodology used, either through machine learning or statistical methods. Additionally, it also provides a review of the challenges that may be encountered when using these techniques, such as defining the normality of the data, the under-representation of anomalies in datasets, or the need to differentiate anomalies from possible noise present in the series.

In [5], a system based on the Spectral Residual (SR) algorithm and CNNs is proposed, allowing online anomaly detection in business metric time series. In [6], a system for real-time anomaly detection is proposed, capable of detecting different types of anomalies due to the integration of various anomaly detection and prediction models. In [7], a model for unsupervised anomaly detection based on GANs is proposed, with both the actors and critics of these networks based on LSTM recurrent networks, where series reconstruction allows anomaly detection. In [8], a model based on Transformers for unsupervised anomaly detection is proposed. In this case, a version of Transformer with modified attention mechanism (Anomaly-Attention) is implemented, aiming to overcome the limitations of these in the task of anomaly detection.

In [9], a methodology for real-time anomaly detection in video is introduced. This approach utilizes feature maps, extracted from the energy of each frame, enabling the detection of object movement speed. Consequently, frames where the energy undergoes abrupt changes are identified as anomalous. In [10], a methodology for detecting suspicious behaviors in video is presented. Through object detection and tracking algorithms, the spatio-temporal characteristics of individuals in the video are extracted. Subsequently, the series obtained from these characteristics are classified based on whether they exhibit anomalous behaviors or not.

In [11], a novel approach for abandoned object detection is introduced. This approach uses a dual background model that adapts to scene changes, an enhanced Pixel-based Finite State Machine (PFSM) with occlusion handling, and the SAO-FPN network for improved small-object feature extraction. Integrating the SODHead decoupling head and self-attention further emphasizes occluded features. Tests on datasets like ABODA and VisDrone2019 show significant accuracy improvements, with a mean detection accuracy of

91.1 %

, outperforming other advanced methods in recovery rate and accuracy.

In [12], a super-resolution method for enhancing public safety through the analysis of surveillance videos is presented, integrating object detection, key frame selection, and super-resolution reconstruction. The system employs a real-time object detection algorithm, a key frame selection algorithm to identify significant scene changes, and a super-resolution algorithm to improve object resolution and visual quality. The proposed super-resolution network combines pixel and feature spaces, using an asymmetric, recursive deep back-projection approach to efficiently reconstruct high-resolution images. Experiments on videos from various scenarios show significant improvements in object detection accuracy, key frame selection, and super-resolution quality, contributing to more efficient video analysis tools for public safety.

In [13], the use of YOLO models for real-time person detection to monitor occupancy in indoor spaces, a key measure during the COVID-19 pandemic, is explored. It proposes an algorithm that estimates the area of a region in square meters using YOLO-generated bounding boxes, assuming each person occupies 0.66 square meters. The maximum occupancy is calculated based on a density of one person per square meter. The performance of various YOLO models (v3, v4, v5s, v3-Tiny, v4-Tiny) was evaluated in terms of accuracy, FPS, and processing time. YOLO v3 showed the highest accuracy, while YOLO v5s had the highest FPS. The study highlights the algorithm’s ability to adapt to different camera resolutions, but notes potential inaccuracies in low-resolution videos. The findings offer insights into developing intelligent surveillance systems for occupancy monitoring and social distancing compliance.

3. Materials and Methods

This section outlines the approach taken in this study to collect, process, and analyze the data. Section 3.1 begins by describing the origin and characteristics of the data used in the analysis. Section 3.2 details the procedures employed to gather these data. Section 3.3 explains the techniques applied to prepare the data for further analysis. Section 3.4 describes the steps used to construct both time series, an essential part of the study. Finally, Section 3.5 provides an in-depth explanation of the analytical approach and techniques utilized to achieve the study’s objectives.

3.1. Source of the Data

The Valencian Tourism Agency has deployed over 50 web cameras in the Comunitat Valenciana since 2001, in a project sponsored by Tourism. These cameras are accessible on the tourist portal Webcams de la Comunitat Valenciana (https://www.comunitatvalenciana.com/es/webcams (accessed on 24 November 2024)) and broadcast images of various destinations live throughout the day, offering users the opportunity to view the status of these places in real time. Strategically located in collaboration with municipalities and sector entrepreneurs, they allow users to follow events, festivities, and sports competitions.

Specifically, the data used were obtained from the town of Morella (https://en.wikipedia.org/wiki/Morella,_Spain (accessed on 24 November 2024)), Spain. The geographical location of Morella has been crucial throughout the centuries and historical events, which is why it is currently considered a population of high tourist value. The webcam located in this town is a static camera which broadcasts at a resolution of 1920 × 1080 pixels, at a rate of 30 frames per second.

Currently, the aforementioned camera has been upgraded and features different characteristics from the previous one. Because of this, the amount of data available for the study is limited (from 20 September 2023 to 15 October 2023). Therefore, as a solution, it is proposed to extend the series backwards using probability distributions, starting from the statistics of the original series. This extension allows initializing the anomaly detection models.

3.2. Data Acquisition

Regarding data acquisition, video segments equivalent to 15 min are obtained. Due to the high amount of space required for storage, the resolution is reduced to 1280 × 720 pixels and the frame rate to 1 FPS.

3.3. Data Processing

Video processing begins with the generation of a background model, based on Gaussian mixture models [14,15], aimed at removing static objects from the image that could generate false positives. For each frame of the video, the background model was upgraded and a pre-trained YOLOv8 model [16] was applied for the object segmentation task. YOLO is an object detection model based on convolutional neural networks that processes entire images in a single stage to identify and locate objects. Specifically, the employed model provides us with information corresponding to both segmentation and object detection. We use the ’small’ version of this model with a confidence threshold of

0.5

. While more recent versions of YOLO are available, preliminary testing indicated that YOLOv8 yields optimal performance for this particular application. An analysis of video processing efficiency reported an average frame rate of

61.74

FPS for YOLOv8, in comparison to YOLOv11, which achieved an average of

59.63

FPS.

To extract detections of dynamic objects, we compare the areas corresponding to the bounding boxes of all detections in the frame with those areas in the background model. These cutouts are in grayscale and undergo Contrast Limited Adaptive Histogram Equalization (CLAHE) processing [17,18,19]. An example can be seen in Figure 1.

To compare the cutouts, we use the Structural Similarity Index (SSIM) [20,21], which can take values between −1 and 1, with 1 indicating perfect similarity between images, and −1 indicating completely dissimilar images. In our case, we consider a heuristic threshold of

0.8

, considering detections valid when their SSIM value is below this threshold. Figure 2 illustrates an example of various comparison outcomes, highlighting the occurrence of both false negatives and false positives. False negatives are defined as objects that, while not part of the background, are misclassified as such due to prolonged inactivity. Conversely, false positives refer to background objects that are mistakenly identified as belonging to distinct object classes.

The result of this process is stored in a ‘.csv’ file for each video segment. This file will store information regarding the detection timestamp, the predicted class, prediction confidence, as well as the bounding box position and segmentation mask. No other video information is saved, avoiding any privacy concerns with the data.

3.4. Generating the Time Series

From the results of the video processing, we generate two time series corresponding to the number of detections and the percentage of heatmap saturation, which represents the image occupancy within a given interval.

3.4.1. Detection Series

To generate the series corresponding to the number of detections, for each ‘.csv’ file resulting from the video processing, detections were grouped by time interval and the maximum value was obtained within these groups. This value was then used in the time series to refer to that interval. Choosing the maximum over other statistics, such as mean or median, helps correct potential false negatives that may occur. In certain detection intervals, some people may not be detected due to either the precision of the YOLO model or the SSIM threshold; however, detections may occur throughout the interval.

We used 50% of the available data to extract statistics, aiming to extend the series. We group this partition by day of the week, hour, and minute (w, h, m), obtaining both the median (Figure 3) and the interquartile range (IQR) (Figure 4) for each of the groups.

For modeling the artificial series, a

G u m b e l

distribution [22] (Equation (1)) was employed. This distribution allows for modeling the distribution of maximum or minimum values.

G u m b e l (x | μ, β) = e^{- e^{- \frac{x - μ}{β}}}

(1)

We generate the values corresponding to 8 weeks with a frequency of 15 min, where

μ

is the median of the corresponding (w, h, m), and

β

is the quartile deviation

(I Q R / 2)

of the corresponding (w, h, m) (see Figure 5).

To evaluate the consistency of the expanded series with actual observational data, the Kolmogorov–Smirnov test [23] was employed. The Kolmogorov–Smirnov test is a nonparametric method designed to assess the equality of continuous, one-dimensional probability distributions, enabling verification of whether a sample derives from a specific reference distribution or if two samples share a common distribution. With a p-value of

0.1782

obtained, and at a

95 %

confidence level, the results suggest that the null hypothesis—that the data originate from the same distribution—cannot be rejected.

3.4.2. Heatmap Saturation Percentage Series

To generate the series corresponding to the heatmap, for each ‘.csv’ file resulting from the video processing, we use the segmentation masks of each detection.

To obtain the grayscale heatmap, given a ‘.csv’ file, we accumulate the segmentation masks of the different detections into a matrix initialized with zeros, of the same size as the original image. Once accumulated, we normalize the data so that, for a point in the map to saturate to white, it must have been occupied throughout the entire interval.

Next, from these heatmaps, we generate the heatmap series. Each heatmap represents a point within the series, with this value being the sum of the values in the image. The theoretical maximum value for a heatmap is equal to

| c o l u m n s | * | r o w s | * 255

, i.e., when it is completely saturated.

We used 50% of the available data to extract statistics, aiming to extend the size of the series. We group this partition by day of the week, hour, and minute, obtaining both the median (Figure 6) and the standard deviation (Figure 7) for each.

For modeling the artificial series, we use the Laplace distribution [24] (Equation (2)). The choice of this distribution over others, such as the normal distribution, is due to its characteristics. Its sharper peak and faster decay allow us to control the amount of noise applied to the generated series.

L a p l a c e (x | μ, β) = {\frac{1}{2 β}}^{- \frac{| x - μ |}{β}}

(2)

Using a Laplace distribution, we generate the values corresponding to 8 weeks with a frequency of 15 min, where

μ

is the median of the corresponding (w, h, m), and

β

is the standard deviation of the corresponding (w, h, m).

Due to the high values of the series, we normalize it using the previously mentioned theoretical maximum value, thus representing the percentage of heatmap saturation (see Figure 8).

To evaluate the consistency of the expanded series with actual observational data, the Kolmogorov–Smirnov test is employed. With a p-value of

0.3839

obtained, and at a

95 %

confidence level, the results suggest that the null hypothesis—that the data originate from the same distribution—cannot be rejected.

3.5. Methodology

In Figure 9, the methodology employed throughout the project can be observed. Part of this diagram has already been discussed in Section 3.3 and Section 3.4, consisting of data preprocessing, while the remaining part is explained below. The central idea of this latter part is to apply anomaly detection techniques to the preprocessed data. Before applying these techniques, the series has been decomposed into its trend, seasonality, and residual components.

3.5.1. STL Decomposition

STL (Seasonal-Trend decomposition using LOESS) [25] is a statistical method used to decompose a time series into three components: seasonal, trend, and residual. This technique is robust, capable of handling various forms of seasonality, including nonlinear patterns, and can be modified to account for outliers. The application of STL results in the decomposition of the time series into its trend, seasonality, and residual components. The algorithm allows for the adjustment of parameters such as the periodicity and seasonality of the series. For this study, a periodicity of 1 day and a seasonality of 1 week were specified.

3.5.2. Collective Anomalies—Trend Threshold

One of the types of anomalies we seek to identify are collective anomalies, i.e., those that individually do not represent an anomaly but do so when considered as a sequence.

To achieve this, a threshold is set on the trend calculated in the series decomposition. For both series, this threshold

δ

δ = \tilde{x} \pm σ

, where

\tilde{x}

is the median of the original series and

σ

is its standard deviation. We consider a value anomalous only if it exceeds the defined upper threshold.

3.5.3. Point Anomalies—SESD (Seasonal ESD)

Another type of anomaly we seek to identify are point anomalies, i.e., points that show a significant deviation from the rest of the data.

For this, we employ the Seasonal Extreme Studentized Deviate (SESD) algorithm [26], which involves applying the ESD (Extreme Studentized Deviate) algorithm [27] to the result obtained after performing the STL decomposition of the series, which, in our case, is based on the previously calculated decomposition. The ESD (Extreme Studentized Deviate) algorithm is a statistical test used to detect multiple outliers in a univariate dataset. Unlike other tests, such as the Grubbs’ test [28], which can only detect a single outlier at a time, the ESD can identify multiple outliers simultaneously.

When selecting point anomalies, we discard those that have been identified as collective anomalies in the previous section. Point anomalies are chosen in descending order of residual value, with the most relevant being those with a high residual value.

4. Results

This section presents the key findings derived from the data analysis and modeling techniques applied in this study. Section 4.1 presents the results of the decomposition for each time series into their seasonal, trend, and residual components. Section 4.2 identifies periods of abnormal behavior in the trend component, highlighting collective anomalies. Lastly, Section 4.3 focuses on detecting punctual anomalies using the Seasonal ESD method.

4.1. STL Decomposition

4.1.1. Detection Series

Regarding the detection series, in Figure 10, we can observe its STL decomposition. The seasonal component shows no variation over time, and the trend is flexible, with a noticeable change in trend present in the last week.

4.1.2. Heatmap Saturation Percentage Series

Regarding the saturation percentage series of the heatmap, in Figure 11, we can observe its STL decomposition. Similarly to the previous series, the seasonal component also shows no variation over time, and the trend is flexible, with the change in trend in the last week being more evident in this case.

4.2. Collective Anomalies—Trend Threshold

4.2.1. Detection Series

In Figure 12, two sequences of anomalous points have been detected in the series corresponding to the number of detections. These sequences are between October 7th and 9th and October 12th and 15th, representing festive periods in the Comunitat Valenciana and nationally, respectively. Due to the tourist interest in the city of Morella combined with the holiday period, we can assume a higher influx of people, as we can observe in this figure.

4.2.2. Heatmap Saturation Percentage Series

In Figure 13, corresponding to the saturation percentage series of the heatmap, the anomalous sequences detected in the detection series are found again, which coincide with the mentioned festive periods. However, in this series, another sequence corresponding to October 4th appears. This sequence can be considered anomalous due to the detection of false positives in the early hours of this day.

4.2.3. Comparison with PySAD

To facilitate results comparison, the PySAD framework is employed. PySAD https://github.com/selimfirat/pysad (accessed on 24 November 2024) [29] is an open source Python framework designed for anomaly detection in univariate and multivariate streaming data, offering tools for experiment design and state-of-the-art models for supervised, semi-supervised, and unsupervised learning. Built on well-established frameworks like PyOD [30] and scikit-learn [31], PySAD enables advanced anomaly detection capabilities.

For collective anomaly detection, the experimental setup specifies a data window size and sliding window size of 672 intervals (1 week) and an averaging window of 96 intervals (1 day). The models utilized include MCD [32] for the Detection Series and OCSVM [33] for the Heatmap Series.

Table 1 presents a comparison of the PySAD framework and the proposed method in detecting collective anomalies across the Detection Series and the Heatmap Series. It displays the number of anomalies detected by each method for each series, along with a Cohen’s Kappa Score as a metric for evaluating the level of agreement between the two methods for each series. The results indicate that both methods achieve moderate to substantial agreement in anomaly detection; however, the degree of alignment varies between the series, with higher agreement observed in the Detection Series compared to the Heatmap Series.

4.3. Point Anomalies—SESD (Seasonal ESD)

4.3.1. Detection Series

In Figure 14, we can observe the different point anomalies that have been detected for the detection series after applying the ESD algorithm to the STL decomposition of the series. Despite not considering the intervals considered as collective anomalies, the algorithm is able to detect anomalies in these intervals, reinforcing the validity of the methodology used.

To validate the proposed point anomalies, the anomaly with the highest residue is verified with the available previous or subsequent instances. In Figure 15, we can observe the capture corresponding to the anomalous value of ‘11 October 2023 10:15:00’ compared to captures from 1 and 3 weeks earlier. In these, a clear change in the number of people present in the images can be observed. Checks have been carried out at more time instances and with different anomalies, but for the sake of article length, they are omitted here.

4.3.2. Heatmap Saturation Percentage Series

In Figure 16, we can observe the different point anomalies that have been detected for the saturation percentage series of the heatmap, after applying the ESD algorithm to the STL decomposition of the series. Despite not considering the intervals considered as collective anomalies, the algorithm is able to detect anomalies in these intervals, reinforcing the validity of the methodology used.

To validate the proposed point anomalies, the anomaly with the highest residue is verified with the available previous or subsequent instances. In Figure 17, we can observe the capture corresponding to the anomalous value of ‘2 October 2023 10:45:00’ compared to captures from the previous and subsequent day. The difference in saturation is evident, obtaining a higher saturation value (0.016843) in the one considered anomalous. Checks have been carried out at more time instances and with different anomalies, but for the sake of article length, they are omitted here.

4.3.3. Comparison with PySAD

To facilitate result comparison, the PySAD framework is employed.

For punctual anomaly detection, the experimental setup specifies a data window size and sliding window size of 672 intervals (1 week) and an averaging window of 1 interval (15 min). The models utilized include MCD for the Detection Series and OCSVM for the Heatmap Series.

Table 2 presents a comparison of the number of punctual anomalies detected in the Detection and Heatmap Series using two methods: PySAD and the proposed approach. It also provides a Cohen’s Kappa Score for each series. Low Cohen’s Kappa Scores indicate a notable discrepancy between PySAD and the proposed method in identifying punctual anomalies, with the proposed approach detecting a significantly higher number of anomalies in both data series. This discrepancy suggests that the two methods may be capturing different types or intensities of anomalous patterns within the data.

5. Discussion

The results obtained reveal that the utilization of time series decomposition through STL has been effective in identifying seasonal patterns and flexible trends. Specifically, a notable shift in trend is observed around 9 October and 12 October, coinciding with regional and national festivities, respectively. This observation reinforces the usefulness of decomposition in capturing seasonal events and abrupt changes in the data. Additionally, by analyzing the residual of the decomposition in Figure 10 and Figure 11, points that are distant from the rest are identified, indicative of the presence of anomalous values. The detection of collective anomalies through the application of a threshold to the trend, as shown in Figure 12 and Figure 13, confirms the presence of anomalous periods in both series, these being the festive periods mentioned previously. On the other hand, when using the SESD algorithm to detect point anomalies (Figure 14 and Figure 16), the validity of the collective anomaly detection method is confirmed, as the intervals previously identified as anomalous also stand out. This analysis further suggests that other points detected as anomalies indicate significant changes across different time intervals, as shown in Figure 15 and Figure 17. These findings reinforce the effectiveness of the approach used for anomaly detection in the studied time series. These results enable the detection of anomalies in images, such as identifying an unexpected increase in the presence of individuals during periods that would typically be expected to remain nearly empty.

6. Conclusions

This study investigates the use of anomaly detection techniques on open video data from the Comunitat Valenciana. Despite limitations in data availability due to external factors, two time series were created and expanded using probability distributions. Collective anomalies were identified by applying statistical thresholds to the data, while point anomalies were detected using the SESD algorithm. Validation of these results was conducted by comparing the most prominent anomalies across different time intervals. Findings demonstrate that anomaly detection techniques can effectively identify unusual patterns in temporal data, highlighting their value as tools for analyzing open video data in complex environments. The methodology prioritizes individual privacy by implementing measures that ensure anonymity, avoiding both trajectory tracking and person identification. Future work envisions an edge-computing approach, where privacy will be further reinforced by limiting video access solely to the camera itself, ensuring researchers handle only processed data rather than raw footage.

Author Contributions

Methodology, A.J.S.-L., J.V.-F. and A.D.-R.; Investigation, A.D.-R. and F.M.; Data curation, A.D.-R. and A.J.S.-L.; Writing—original draft, A.D.-R. and R.M.-B.; Writing—review and editing, F.M. and J.V.-F.; Supervision, F.M., R.M.-B., A.J.S.-L. and J.V.-F.; Project administration, J.V.-F. and A.J.S.-L. All authors have read and agreed to the published version of the manuscript.

Funding

Grant PID2021-127946OB-I00 funded by MCIN/AEI/10.13039/501100011033 by “ERDF A way of making Europe”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ding, C.; Sun, S.; Zhao, J. MST-GAT: A multimodal spatial–temporal graph attention network for time series anomaly detection. Inf. Fusion 2023, 89, 527–536. [Google Scholar] [CrossRef]
Guo, Y.; Liao, W.; Wang, Q.; Yu, L.; Ji, T.; Li, P. Multidimensional Time Series Anomaly Detection: A GRU-based Gaussian Mixture Variational Autoencoder Approach. In Proceedings of the 10th Asian Conference on Machine Learning, Beijing, China, 14–16 November 2018; Proceedings of Machine Learning Research, PMLR. Volume 95, pp. 97–112. [Google Scholar]
Nedelkoski, S.; Cardoso, J.; Kao, O. Anomaly Detection from System Tracing Data Using Multimodal Deep Learning. In Proceedings of the 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, Italy, 8–13 July 2019; pp. 179–186. [Google Scholar] [CrossRef]
Shaukat, K.; Alam, T.M.; Luo, S.; Shabbir, S.; Hameed, I.A.; Li, J.; Abbas, S.K.; Javed, U. A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives. In Advances in Information and Communication; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 865–877. [Google Scholar] [CrossRef]
Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar] [CrossRef]
Laptev, N.; Amizadeh, S.; Flint, I. Generic and Scalable Framework for Automated Time-series Anomaly Detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 1939–1947. [Google Scholar] [CrossRef]
Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 33–43. [Google Scholar] [CrossRef]
Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. arXiv 2022, arXiv:2110.02642. [Google Scholar] [CrossRef]
Xue, R.; Chen, J.; Fang, Y. Real-Time Anomaly Detection and Feature Analysis Based on Time Series for Surveillance Video. In Proceedings of the 2020 5th International Conference on Universal Village (UV), Boston, MA, USA, 19–22 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Nazir, A.; Mitra, R.; Sulieman, H.; Kamalov, F. Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention. Sensors 2023, 23, 5811. [Google Scholar] [CrossRef] [PubMed]
Zhou, L.; Xu, J. Enhanced Abandoned Object Detection through Adaptive Dual-Background Modeling and SAO-YOLO Integration. Sensors 2024, 24, 6572. [Google Scholar] [CrossRef] [PubMed]
Ren, S.; Li, J.; Tu, T.; Peng, Y.; Jiang, J. Towards Efficient Video Detection Object Super-Resolution with Deep Fusion Network for Public Safety; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2021; Volume 2021, p. 9999398. [Google Scholar] [CrossRef]
Gündüz, M.Ş.; Işık, G. A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models. J. Real-Time Image Process. 2023, 20, 5. [Google Scholar] [CrossRef] [PubMed]
Zivkovic, Z.; Van Der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, Cambridge, UK, 23–26 August 2004; Volume 2, pp. 28–31. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 24 November 2024).
Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vision, Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
Pizer, S.; Johnston, R.; Ericksen, J.; Yankaskas, B.; Muller, K. Contrast-limited adaptive histogram equalization: Speed and effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta, GA, USA, 22–25 May 1990; pp. 337–345. [Google Scholar] [CrossRef]
Zuiderveld, K.J. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems; Heckbert, P.S., Ed.; Elsevier: Amsterdam, The Netherlands, 1994; pp. 474–485. [Google Scholar]
Wang, Z.; Bovik, A. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Process. Mag. 2009, 26, 98–117. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Tocher, K.D.; Gumbel, E.J. Statistical Theory of Extreme Values and Some Practical Applications. J. R. Stat. Soc. Ser. A (Gen.) 1955, 118, 106. [Google Scholar] [CrossRef]
Kolmogorov–Smirnov Test. In The Concise Encyclopedia of Statistics; Springer: New York, NY, USA, 2008; pp. 283–287. [CrossRef]
Kotz, S.; Kozubowski, T.J.; Podgórski, K. The Laplace Distribution and Generalizations; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar] [CrossRef]
Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
Hochenbaum, J.; Vallis, O.S.; Kejariwal, A. Automatic Anomaly Detection in the Cloud Via Statistical Learning. arXiv 2017, arXiv:1704.07706. [Google Scholar]
Rosner, B. Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 1983, 25, 165. [Google Scholar] [CrossRef]
Grubbs, F.E. Sample criteria for testing outlying observations. Ann. Math. Statist. 1950, 21, 27–58. [Google Scholar] [CrossRef]
Yilmaz, S.F.; Kozat, S.S. PySAD: A Streaming Anomaly Detection Framework in Python. arXiv 2020, arXiv:2009.02572. [Google Scholar]
Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. arXiv 2019, arXiv:1901.01588. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hardin, J.; Rocke, D.M. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 2004, 44, 625–638. [Google Scholar] [CrossRef]
Schölkopf, B.; Williamson, R.C.; Smola, A.J.; Shawe-Taylor, J.; Platt, J.C. Support Vector Method for Novelty Detection. Adv. Neural Inf. Process. Syst. 1999, 12, 582–588. [Google Scholar]

Figure 1. Original and background frames shown before and after applying CLAHE processing and grayscale transformation.

Figure 2. SSIM comparison results. (a) Displays detections provided by Yolo. (b) Shows various cutouts from the background frame corresponding to examples of a false positive, a false negative, and a true positive, respectively. (c) Presents cutouts from the original frame representing examples of a false positive, a false negative, and a true positive, respectively.

Figure 3. Weekly distribution of median detection series. X-axis in intervals of 15 min.

Figure 4. Weekly distribution of IQR detection series. X-axis in intervals of 15 min.

Figure 5. Augmented detection series.

Figure 6. Weekly distribution of the median of the heatmap series. X-axis in intervals of 15 min.

Figure 7. Weekly distribution of the standard deviation of the heatmap series. X-axis in intervals of 15 min.

Figure 8. Augmented heatmap saturation percentage series.

Figure 9. Diagram of the employed methodology.

Figure 10. STL decomposition of the detection series.

Figure 11. STL decomposition of the heatmap saturation percentage series.

Figure 12. Trend threshold in detection series.

Figure 13. Trend threshold in heatmap saturation percenteage series.

Figure 14. Plot of detection residual with point anomalies.

Figure 15. Justification of anomalies in detection series. (a) 11 October 2023 10:15:00 (Anomaly). (b) 4 October 2023 10:15:00 (Previous week). (c) 20 September 2023 10:15:00 (3 weeks earlier).

Figure 16. Plot of heatmap saturation percentage residual with point anomalies.

Figure 17. Justification of anomalies in heatmap series. (a) 1 October 2023 10:45:00 (Previous day) [0.001601]. (b) 2 October 2023 10:45:00 (Anomaly) [0.012045]. (c) 3 October 2023 10:45:00 (Next day) [0.008394].

Table 1. Number of collective anomalies detected in both data series by PySAD and the proposed approach, along with a comparison based on Cohen’s kappa score.

	PySAD	Trend Threshold + SESD (Ours)	Cohen’s Kappa Score
Detection Series	617	545	0.7445
Heatmap Series	964	582	0.4726

Table 2. Number of punctual anomalies detected in both data series by PySAD and the proposed approach, along with a comparison based on Cohen’s Kappa Score.

	PySAD	Trend Threshold + SESD (Ours)	Cohen’s Kappa Score
Detection Series	14	545	0.0287
Heatmap Series	19	582	0.0289

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dionis-Ros, A.; Vila-Francés, J.; Magdalena-Benedito, R.; Mateo, F.; Serrano-López, A.J. Multimodal Video Analysis for Crowd Anomaly Detection Using Open Access Tourism Cameras. Appl. Sci. 2024, 14, 11075. https://doi.org/10.3390/app142311075

AMA Style

Dionis-Ros A, Vila-Francés J, Magdalena-Benedito R, Mateo F, Serrano-López AJ. Multimodal Video Analysis for Crowd Anomaly Detection Using Open Access Tourism Cameras. Applied Sciences. 2024; 14(23):11075. https://doi.org/10.3390/app142311075

Chicago/Turabian Style

Dionis-Ros, Alejandro, Joan Vila-Francés, Rafael Magdalena-Benedito, Fernando Mateo, and Antonio J. Serrano-López. 2024. "Multimodal Video Analysis for Crowd Anomaly Detection Using Open Access Tourism Cameras" Applied Sciences 14, no. 23: 11075. https://doi.org/10.3390/app142311075

APA Style

Dionis-Ros, A., Vila-Francés, J., Magdalena-Benedito, R., Mateo, F., & Serrano-López, A. J. (2024). Multimodal Video Analysis for Crowd Anomaly Detection Using Open Access Tourism Cameras. Applied Sciences, 14(23), 11075. https://doi.org/10.3390/app142311075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu