Abstract
In this paper, a subsequence time-series clustering algorithm is proposed to identify the strongly coupled aftershocks sequences and Poissonian background activity from earthquake catalogs of active regions. The proposed method considers the inter-event time statistics between the successive pair of events for characterizing the nature of temporal sequences and observing their relevance with earthquake epicenters and magnitude information simultaneously. This approach categorizes the long-earthquake time series into the finite meaningful temporal sequences and then applies the clustering mechanism to the selective sequences. The proposed approach is built on two phases: (1) a Gaussian kernel-based density estimation for finding the optimal subsequence of given earthquake time-series, and (2) inter-event time (\(\varDelta t\)) and distance-based observation of each subsequence for checking the presence of highly correlated aftershock sequences (hot-spots) in it. The existence of aftershocks is determined based on the coefficient of variation (COV). A sliding temporal window on \(\varDelta t\) with earthquake’s magnitude M is applied on the selective subsequence to filter out the presence of time-correlated events and make the meaningful time stationary Poissonian subsequences. This proposed approach is applied to the regional Sumatra-Andaman (2000–2021) and worldwide ISC-GEM (2000–2016) earthquake catalog. Simulation results indicate that meaningful subsequences (background events) can be modeled by a homogeneous Poisson process after achieving a linear cumulative rate and time-independent \(\lambda\) in the exponential distribution of \(\varDelta t\). The relations \(COV_{a}(T)>COV_{o}(T)> (COV_{b}(T)\approx 1)\) and \(COV_{a}(d)>COV_{o}(d)>COV_{b}(d)\) are achieved for both studied catalogs. Comparative analysis justifies the competitive performance of the proposed approach to the state-of-art approaches and recently introduced methods.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A time-series \(X=\{x_{t}|t=1,2,\dots n \}\) is a chronologically ordered sequence of values which are recorded over time. In most real-world applications, it is necessary to store and keep the data for a long time interval in the form of a time series. Its analysis helps to extract meaningful statistical information, underlying causes of trends, and identify hidden temporal patterns. Picoli et al. [20] reported a classification method for monitoring agricultural land in Brazil from 2001 to 2016 with the use of Moderate Resolution Imaging Spectro-radiometer (MODIS) time series data. Dad et al. [5] analyzed the climate variability and trends of change in precipitation and temperature on monthly, seasonal, and annual scales for Kashmir Himalaya between 1980 and 2017. Qi et al. [21] analyzed the daily counts of COVID-19 cases in 30 Chinese provinces and reported negative associations of temperature and humidity with COVID-19. They have suggested that countries and regions with low temperature and humidity should pay more attention. Bakker and Schaars [2] introduced a time-series model for solving groundwater flow problems rather than using regular ground-water models. This model measures the time series of heads in an observation well and helps to answer many groundwater queries. Recently, Khan et al. [13, 14] proposed a deep-learning-based novel hybrid architecture: ’AB-Net’ and ’CL-Net’ to forecast Renewable Energy Generation; and Batteries’State of Health and Power Consumption respectively. The first one uses an auto-encoder and bidirectional long short-term memory (BiLSTM) to intelligently match the power generation with the consumption for efficient energy management. Whereas the second architecture relies on the convolutional long short-term memory (ConvLSTM) and long short-term memory (LSTM) to prevent power shortage and oversupply by doing precise power consumption forecasts. Similarly, time-series analysis are effectively applied in marketing [34], IoT [37], seismic signal processing [4], flood detection [3] and many diversified applications [9, 29].
Nowadays, a clustering mechanism in time series analysis (Known as Time series Clustering; TSC) is an important tool where the sequential data with millions of rows is difficult to visually analyze and understand unusual hidden trends [7, 11, 23]. Similarly, time-series analyses on earthquake data are widely applied to characterize the main features of regional seismicity and to provide useful insights into earthquake dynamics in terms of self-similarity, self-organization, patterns, finite-scaling, and scale-free characteristics [12, 24]. Marsan et al. [16] analyzed the earthquake time series for monitoring the changes in fault loading rates by comparing the data with an earthquake triggering model. They have used the inter-earthquake temporal statistics for estimating time-varying forcing rates with recovery in terms of duration and intensity. Moustra et al. [18] developed an artificial neural network for earthquake prediction by utilizing the time series magnitude data or seismic electric signals. Michas and Vallianatos [17] reported a stochastic model with memory effects to reproduce the temporal scaling characteristics for regional seismicity. Vogel et al. [33] analyzed the earthquake time series based on the information theory approach to observe the mutability effects in the time interval between consecutive quakes over a predetermined magnitude. Kundu et al. [15] reported a method of determining correlations in earthquake time series using complex network analysis by considering each seismic event as a node.
Due to the space-time clustering behavior of earthquakes in terms of foreshock-aftershock (AFs) activities, it is necessary to know when (temporal) and where (spatially) these trends occur in a long earthquake time series. This problem is known as seismicity declustering where the aim is to isolate independent earthquakes (mainshocks/backgrounds) and dependent earthquakes (foreshocks and aftershocks) from the given overall earthquake catalogs of a region. The early method of earthquake declustering by [8] involved a deterministic space and time windows for different magnitude cutoffs to identify clustered AFs and BGs from the earthquake catalogs. The appropriate window sizes are very difficult to select, it varies from case to case and in most cases overestimate the aftershock population. However, several alternative window sizes were reported in later studies van Stiphout et al. [25], Uhrhammer [28]. Events within these windows are considered clustered AFs and the remaining are treated as BGs. Reasenberg [22] proposed a cluster-based approach by a pairing of earthquakes to make clusters according to the extent of spatial and temporal interaction zones. These zones are decided by the stress distribution near the mainshock for spatial bound and Omori’s law for time-bound. Both these methods were developed mostly for California and are heavily dependent on the parameters which need to be optimized for better results. Later on, some probabilistic methods of declustering were proposed, but most of them are model-dependent (Epidemic-Type-Aftershock Sequence (ETAS) model) with a wide range of assumptions and techniques [1, 35, 38, 39]. Recently, Zaliapin and Ben-Zion [36] reported a declustering mechanism based on the nearest-neighbor proximity (distance metric) that describes the link between event pairs in the space-time-magnitude domain for declustering the different benchmark earthquake catalogs. The determination of distance metrics in this method is time-consuming, especially after considering smaller magnitude events. This approach is highly dependent on the size of the earthquake catalog. Vijay and Nanda also reported several statistical and swarm-intelligence-based declustering models for Spatio-temporal seismicity analysis of different seismically active regions [30,31,32].
Due to the limitation of the state-of-art methods and recent approaches, this research work proposes the “earthquake subsequence time series clustering (ES-TSC)” method for categorizing a large length of earthquake time series to generate the meaningful result for earthquake declustering. Here, aim to keep the simplicity of the earthquake declustering algorithm while improving its usability and applicability to various earthquake-prone regions and also minimize the user-dependent threshold or parameter tuning.
The main contributions of this work are the following:
-
Temporal density of earthquake time-series are estimated with Gaussian kernel for obtaining the optimal sub-sequences of the earthquake, thus, increasing the quality of clustering approach with the formation of less size distance-metric.
-
The Coefficient of Variation (COV) is determined for checking the necessity of clustering mechanism for each sub-sequences or not. It further reduces the computational time of the algorithm.
-
A sliding temporal window is used on the selective sub-sequences for filtering the correlated aftershock events.
-
This proposed method is applied to Sumatra-Andaman (regional) and ISC-GEM (global) earthquake catalogs to obtain the aftershock and background seismicity in the region.
-
Background seismicity is found time stationary with exponential distribution.
The rest of the paper is organized as follows: Sect. 2 describes the detailed procedure of proposed earthquake sub-sequence time series clustering (ES-TSC). Section 3 presents concise information about the earthquake catalogs used in the paper. Results and discussion are carried out in Sect. 4. The comparative analysis is presented in Sect. 5. Section 6 highlights the significant aspects of the proposed work with future scope.
2 Earthquake sub-sequence time series clustering (ES-TSC)
The detailed step-wise procedure of the proposed ES-TSC is outlined below:
Step 1 An earthquake catalog \({\mathbf {E}}_{N \times D}\) contains primarily five potential feature vectors: occurrence time (DD:MM:YYYY-HH:MN:SS), location (Longitude \(\theta\) and latitude \(\phi\)), magnitude, and depth (h) in Km.
where \(D=5\) is number of feature vector (attributes), each having length N (number of events). The catalog \({\mathbf {E}}_{N \times D}\) provides an information of earthquake time series \({\mathbf {T}}\) of length N, an ordered sequence of real-valued earthquake temporal data.
Then, an earthquake subsequence of length n belongs to time series \({\mathbf {T}}\) is represented by
A meaningful subsequence is an arranged sequence of earthquake events that omits some events without changing the order of remaining events.
Step 2 A temporal density of the earthquake subsequence time series T of length n is estimated with the following criteria:
where K(.) is called the kernel function which is a smooth, symmetric function, and \(h > 0\) is termed as the smoothing bandwidth, which controls the amount of smoothing. The kernel function smoothes each time event \(t_{i}\) into small density bumps and then sums all these small bumps together to obtain the final temporal density estimate. In this paper, Gaussian function is selected as a kernel which is written as
Step 3 The \(\rho _{n} (t)\) in (4) has multi-model characteristics due to variation in temporal density of events which leads to identifying the value and time position of local maxima (peak) and minima w.r.t change in the density level, as follows:
where \(V_{min}=(v^{1}_{min},v^{2}_{min}, \dots v^{g+1}_{min})\) and \(P_{min}= (p^{1}_{min},p^{2}_{min}, \dots p^{g+1}_{min})\) vector represents the density value and time information of the minima present (shown in the Fig. 1 with small triangle) after estimating the density. Similarly, \(V_{max}=(v^{1}_{max},v^{2}_{max}, \dots v^{g}_{max})\) and \(P_{max}=(p^{1}_{max},p^{2}_{max}, \dots p^{g}_{max})\) is the density value and time information of present local maxima in the estimated density (Shown by red filled stars in Fig. 1).
Step 4 According to the time location of successive minima \(P_{mn}\) as shown in Fig. 1 (see the small triangle), an earthquake time series T is divided into successive subsequences \(G_{g}\):
where each \(j^{th}\) subsequence has length \(n_{j}\) and g represents the number of obtained subsequences (Fig. 1):
Step 5 After obtaining each subsequence \(G_{j}\) from the long time series \({\mathbf {T}}\) each having length \(n_{j}\). Some of the events are removed from the selective \(j^{th}\) subsequence \(G_{j}\) to make the meaningful subsequences as per the following criterion:
-
Criterion 1: A subsequence is considered meaningful which follows the homogeneous Poisson process (HPP) in time. These have a uniform arrival rate (average) with Poisson distribution and comprised of random regular (Background-BGs) earthquake events in a region.
-
Criterion 2: The subsequences also have the correlated patterns (clustered foreshock-aftershocks) in time, which are generated due to the occurrence of relevant mainshock (high magnitude event). The removal of these patterns is necessary to fulfill criterion 1. These events belong to the non-homogeneous Poisson process where their average rate of arrivals is varied w.r.t. time. These are the hot spots and more hazardous compare to BGs.
Step 6 A parameter called: Coefficient of Variation (COV(T)) in time domain is used to justify the meaningfulness of each subsequence \(G_{j}\) and to satisfy the criterion mentioned in step-5. It is the standard deviation normalized by the mean of inter-event times (distance) of the successive events (sequence) of a given time interval (coordinates). The inter-event times (\(\varDelta t\)) is defined as
then, COV(T) is calculated as follows:
where E(.) represents an average (mean) of the given quantity. The concept of inter-event time and distance are shown in Fig. 2.
The COV(T) discriminates important characteristics of the subsequence (in the time domain) in three different ways: (1) If the subsequences are periodic, then \(\varDelta t\)=constant and COV(T) should be zero. (2) If the sequences follow a Poisson distribution, then \(\varDelta t\) has exponential behavior, and \(C_{v}\) should be around 1 (Criterion-1). These events are said to be random in time (uniform arrival rate). (3) If \(COV(T)> 1\), then the process follows a power-law distribution with non-homogeneous Poisson characteristics (Criterion-2).
Here, COV(T) is determined for each subsequence to find out the temporal characteristics of the \(G_{j}\) as mentioned in the above paragraph. Those subsequences are selected for the next phase which includes the time-correlated events as determined by the COV(T). If subsequence has a COV(T) nearly equal to 1, then it is not selected for subsequence clustering in the II phase due to the absence of time-correlated events in it. Otherwise, time-correlated events in each subsequence are filtered out for making the meaningful subsequences (uniform arrival rate, independent random sequence). In the next phase, the objective is to identify and remove the events from selective subsequence to make \(COV(T) \approx 1\) (Criteria 1 for meaningful subsequences). The remaining sequences (which are filtered) are strongly related in time, follow Criterion-2 with \(COV(T)>1\).
Step 7 For the next phase of subsequence clustering, normalized inter-event times \(\varDelta t\) between the successive events \(\in G_{j}\) and magnitude information is used by taking the sliding window approach. An overlapping window of ten events is chosen in \(\varDelta t\) and observes the magnitude (presence of at least one high-intensity quake, \(\ge 7\)). If normalized average \(\varDelta t\) is less than the pre-defined threshold (i.e. short \(\varDelta t\)) then remove events from the selective subsequences. Otherwise, remains the part of subsequence. The step-by-step procedure is outlined in Algorithm 1.
3 Earthquake catalog used in the simulation analysis
This proposed method is employed on the regional earthquake data of Sumatra-Andaman downloaded from Northern California Earthquake Data Center [19] and ISC-GEM global instrumental catalog (version 7.0) obtained from International Seismological Centre [10]. The parameter and their range are highlighted in Table 1. Here, for Sumatra-Andaman, magnitude completeness \(M_c\) is taken 4.5 which is determined by fitting the magnitude data on Gutenberg-Richter relation. The details of both the catalog are as follows:
-
1.
Sumatra-Andaman Sumatra-Andaman region is highly susceptible to tsunamis followed by earthquakes, especially from the year 2002 on-wards, which are responsible for millions of deaths, infrastructure damages, and long-lasting wounds to the civilizations. This region also got public attention worldwide, when one of the most powerful and destructive 2004 boxing day earthquakes in the Indian ocean occurred with a magnitude around 9.3 at Richter scale. Indian ocean tsunami in the year 2004 was recorded second largest on a seismograph, termed as Sumatra-Andaman earthquake. It ruptured the greatest fault length of any recorded earthquake and triggered a series of tsunamis (that was up to 30m high along the northern coast of Sumatra), killing up to 280,000 people in 14 countries, and even displaced earth’s north pole by 2.5cm. Most of the events are shallow depth in this catalog. The distribution of earthquake epicenters along with the event’s depth (as shown by different colors) in the region is shown in Fig. 3a. This region is highly vulnerable to seismic activities with the occurrence of a large number of high-intensity quakes as shown in Fig. 3c and they trigger the many events that result in a hike of the seismic rate at that time (see red line in Fig. 3c).
-
2.
Global ISC-GEM instrumental Catalog This catalog was publicly released by International Seismological Centre on 9th April 2020 [10]. The catalog is refined and rebuild by adding the new earthquakes and improving the location and/or magnitude from the previous work [6, 26, 27]. It provides detailed information on earthquakes that occurred world wide in the period from 1904-2016. There is a total of 39,400 earthquake events and all are magnitude greater than 4.9 Mw during 1904–2016. Here, in this study, earthquakes, occurred in the period 2000–2016 are considered and their epicenter distribution is shown in Fig. 3b with depth variation by a different color. A slight change in seismicity rate at the time of a large earthquake is evident in Fig. 3d.
4 Results and analysis
The obtained results, their analysis and significance are presented in this section. It is discussed in the following sub-sections:
4.1 Density estimation and sub-sequence formation
A temporal density is estimated from the time-series earthquake data of duration T according to the method described in Sect. 3 (as shown in Fig. 4) for both catalogs. From this temporal density estimation, information about local minima \(P_{mn}\) and maxima \(P_{mx}\) is extracted as shown in black circle and red-square respectively in Fig. 4a and b for Sumatra-Andaman and ISC-GEM world wide catalog respectively. The highest density peak due to the Sumatra-Andaman earthquake and tsunamis (Dec 2004 at the Indian Ocean with 9.4M) is observed as shown in Fig. 4a.
According to the time position of \(P_{mn}\) obtained from Fig. 4a and b, the time series is divided into eight subsequences (for both) as shown by different colors. Each subsequence \(G_{j}~\)where\(~ j=1...8\) has at least one high magnitude event as observed in Fig. 4c and in Fig. 4d. Tables 2 and 3 highlights the properties of each sub-sequence for Sumatra-Andaman and ISC-GEM earthquake data respectively. Both tables justify the risk factor associated in each subsequence with their duration, no. of events, \(COV_{o}(T)\) value and event’s count with magnitude greater than equal to 7. \(COV_{o}(T)\) quantifies the presence of time-correlated events in each subsequence. The \(COV_{o}(T)\) value and temporal density peaks of each subsequence have a strong linear correlation as evident from tables and figures. The largest \(COV_{o}(T)\) in \(G_{3}\) (also have highest density peak) which indicates the presence of large number of correlated events, triggered by the mainshock, occurred at the Indian ocean in 2004. Similarly, subsequence \(G_{6}\) in Table 3 has the highest \(COV_{o}(T)\) with a high-density peak among in all sub-sequence in Fig. 4b.
4.2 IET (\(\varDelta t\)) and IED (\(\varDelta d\)) based observation
The inter-event times \(\varDelta t\) and inter-event distances \(\varDelta d\) among the two successive pair of earthquake are the simplest way to find the time and space correlation in less computation time. The inter-event distance \(\varDelta d\) (in Km) is determine as follows:
The coordinates longitude (\(\theta\)) & latitude (\(\phi\)) are taken in radians, \(R_{E} = 6371\) Km is the approximate radius of the earth. This definition of \(\varDelta d\) (in Kms) is based on epicenters and assuming a spherical surface. Both \(\varDelta t\) and \(\varDelta d\) avoid the frequent need of large distance matrices having size \(N \times N\) where N is the number of events (N is large in most cases) for similarity measure. The \(\varDelta t\) and \(\varDelta d\) \(~\epsilon ~G_{j}\) for both catalogs, determined from Eq.(10) and Eq.(13) are shown in Fig. 5 (for Sumatra-Andaman) and Fig. 6 (for ISC-GEM). From the figures, it is observed that some of the events \(\in G_{j}\) have occurred in a very short interval of time (less \(\varDelta t\) in days) which is indicated by the red arrow. It is also evident that the same events also have short \(\varDelta d\) after observing both sub-figures in Fig. 5a–h. This observation reveals the consecutive triggering of events at the occurrence time of the large event and hence generation of primary-secondary AFs on the same fault and hike in the seismicity rate. From Figs. 5 and 6, it is also revealed that small or/both large dip occurs at the same \(k_{th}\) position in \(\varDelta t\) and \(\varDelta d\) for almost all subsequences (see the red arrow positions in the figure). That resembles the space-time correlation theory shown by some of the events called: AFs, triggered by mainshock like \(G_{3}\) in Fig. 5c has very less \(\varDelta t\) (\(\varDelta d\) as well) for large sample duration due to the occurrence of Indian tsunami in 2004. This strong space-time correlation, observed by \(\varDelta t\) and \(\varDelta d\) for both catalogs in Figs. 5 and 6 is filtered out for obtaining the meaningful uncorrelated subsequences in the next phase.
It is evident from the IET-IED plot and corresponding COV(T) for each subsequence that they have correlated primary, secondary AFs (less \(\varDelta t\) and \(\varDelta d\) with high COV value) along with independent random BGs. So, in the 2nd phase, a meaningful subsequence \(G_j\) is obtained by filtering those correlated primary and secondary AFs from the subsequence according to the procedure presented in algorithm 1.
4.3 Results after applying the sliding window on \(\varDelta t\) with mainshock magnitude
A finite length sliding window is used to obtain the meaningful sub-sequence by filtering of the correlated events. If events in a window have short IETs (and IEDs as well) and also have at least one large event (based on the magnitude) in that interval then filter out the event from given subsequence otherwise slide the temporal window further for the remaining part. This procedure achieves meaningful subsequences due to comprised of regular occurred random BG events and filtering out the time-correlated events. They are considered primary and secondary AFs according to their characteristics in soace-time-magnitude domain.
The obtained event’s summery are presented in Tables 4 and 5 after applying the proposed method for both catalogs respectively. In both tables, the results of all eight subsequences are mentioned in terms of filtered AF population, meaningful subsequences comprised of BG events, \(COV_{a}(T)\) for AFs, \(COV_{b}(T)\) for BGs, the number of AFs cluster. In Table 4, filtered AFs population from the given sub-sequence indicates the associated risk in that zone and time-duration. The order of risk (hazard) from AFs % for Table 4: \(G_{3}>G_{1}>G_{4}>G_{6}>G_{5}>G_{7}>G_{2}>G_{8}\) holds true. Similarly, in Table 5, AF % and the risk order is \(G_{6}>G_{7}>G_{4}>G_{5}>G_{8}>G_{1}>G_{3}>G_{2}\).
Furthermore, results obtained from the second phase are presented and analyzed in detail through following plots and statistical ways:
4.3.1 \(\lambda\) plot
Strain accumulation and understanding the consecutive triggering activity of events is carried out by analyzing of the change in seismicity rate. The temporal relevance of events with magnitude in long term is interpreted from the seismic rate analysis. Seismicity rate variations is illustrated by observing the number of events arrival in a predefined time interval which is shown by the \(\lambda\) plot in Figs. 7 and 8 for both Sumatra-Andaman and ISC-GEM respectively.
The \(\lambda\) plot for original sub-sequence \(\in G_{j}\) (in black), filtered AFs (in blue) and BGs (in red) after both phases of ES-TSC is shown in Figs. 7 and 8 for Sumatra- Andaman and ISC-GEM subsequences respectively. From Figs. 7 and 8, it is evident that the \(\lambda\) rate of BGs is almost consistent w.r.t to time, there is no significant change in the arrival rate of BGs (see red line in each subplot in Figs. 7 and 8). On the other hand, when the mainshock (large event) occurs a significant change in the seismic rate is observed with a sudden hike (vertical jump) due to the post-release of seismic energy in terms of the AFs (see the peaks in black in Figs. 7 and 8) and then decays w.r.t. time. Those AFs from each sub-sequence are filtered out and their \(\lambda\) rate is shown by a blue line. The \(\lambda\) rate (time-varying) of filtered AFs follow the similar pattern as original sub-sequence in both figures shown by black and blue lines.
4.3.2 Probability density function of \(\varDelta t\)
The statistical analysis of inter-event times between earthquakes plays an important role in the modeling of seismicity and for seismic hazards assessment. However, the inter-event time (\(\varDelta t\)) distribution of earthquakes distinguishes the clustered AFs and non-clustered BGs with the occurrence of the mainshock. A sequence of independent non- clustered randomly occurring earthquake events (BGs) is generally described by a Poisson process with constant intensity \(\lambda\) (expected number of events per unit time). This is considered as homogeneous Poisson process (HPP) with probability \(P(n,\lambda , t)\) to have n events in the time interval [0, t]. It is given by:
with corresponding probability density function (PDF) of \(\varDelta t\) (IET) between successive pair of events:
Apart from negligible deviations, all histograms have the exponential distribution of \(\varDelta t\) (as described in Eq. 15) that follow the definition of HPP as shown in Figs. 9 and 10 for both catalogs. Alternatively, it means successively occurring BGs are not causally related to each other and have a time-independent mean arrival rate \(\lambda\) (uniform in nature). Whereas the occurrence of mainshock-triggered earthquakes follows the non-homogeneous Poisson process in time, those are attributed to the clustering in time domain. The behavior histogram of \(\varDelta t\) for all the meaningful sub-sequences is exponentially fitted as shown by the red color.
The complete behavior of \(\varDelta t\) for overall events, total obtained AFs and BGs are shown in terms of the probability density function of \(\varDelta t\) in Figs. 11 and 12 on a logarithmic scale. From the figure, observed that the overall event’s characteristics in the time domain are bimodal (see Figs. 11a and 12a) and that shows the evidence of two components of seismicity. After applying the proposed ES-TSC method, these two-component are separated in terms of AFs with short \(\varDelta t\) and BGs with homogeneous Poisson distribution with constant \(\lambda\) rate).
4.3.3 Coefficient of variation (COV(T), COV(d))
In Tables 2, 3, 4, 5, \(\varDelta t\) and \(\varDelta d\) based COV is determined to observe the characteristics of events in spatio-temporal domain for overall, meaningful (BGs), and filtered sub-sequences comprised of correlated AFs. Comparative analysis is carried out from obtained values (highlighted in Tables 2, 3, 4, 5) for Sumatra-Andaman and ISC-GEM catalog separately. For both catalogs, it is observed that \(COV_b(T)\)(time domain) and \(COV_{b}(d)\) (spatial domain) for each meaningful subsequences lesser than the \(COV_{o}(T)\) and \(COV_{o}(d)\) for true subsequences. \(COV_b(T)\) values are close to 1 which justifies the HPP mentioned in criterion-1. Whereas for remaining (removed) subsequences \(COV_a(T)\) and \(COV_a(d)\) have slightly higher value than \(COV_o\) (see Tables (2, 3, 4, 5)). The relation \(COV_{a}(T)> COV_{o}(T)>COV_{b}(T)\) and \(COV_{a}(d)> COV_{o}(d)>COV_{b}(d)\) reveals that the obtained patterns follow the characteristics of BGs (homogeneous Poisson process, independent and random events) and AFs (Clustered pattern in spatio-temporal domain).
4.3.4 Space-time plot and hot-spot identification
The space-time plot shows the distribution of events along with one of the coordinates and time. Here, longitude vs. time information is taken to observe the characteristic of both types of events. The longitude versus time plot for filtered AFs from original sub-sequences is shown in Fig. 13a and c for Sumatra-Andaman and ISC-GEM respectively. From both figures, it is justified that filtered AFs are triggered nearby in space and time, i.e. clustered form. Their frequency of occurrence is decayed w.r.t. time. Whereas BGs have seemed smooth and non-clustered character as evident from Fig. 13b and d.
The density distribution of earthquake epicenters (longitude vs. latitude) of BGs and AFs for the Sumatra-Andaman and ISC-GEM catalog is shown in Figs. 14 and 15 respectively. From the figure, it is evident that BGs are the smooth representation of seismicity, which helps to identify the fault plane boundary (see Figs. 14b and 15b). Whereas AFs are highly dense, hazardous, and clustered forms in the spatial domain as observed in Figs. 14a and 15a. Distribution of AFs in spatial domain reveals the information of hot-spots (as shown with color map in Figs. 14a and 15a for both catalogs) and they are useful for short and long term hazard assessment.
5 Comparison of the proposed method with state-of-art approaches
Initially, the approach reported by Gardner and Knopoff [8] based on the magnitude-dependent space-time window and Reasenberg’s cluster-based algorithm Reasenberg [22] are mostly used and considered the paradigm of earthquake declustering for decades. Firstly, the proposed approach is compared with these two state-of-art methods along with alternate space-time window size suggested by Uhrhammer [28]. From Table 6, the Gardner-Knopoff declustering approach, also Uhrhammer’s window, both overestimates the aftershock (AFs) population which leads to a low value of COV for backgrounds (\(COV_b\)). It means a majority of background events are classified as aftershocks by both methods after setting default parameter values. The execution time of both methods is almost similar but greater than the proposed method (in minutes). In Reasenberg’s declustering approach, most of the events are classified as backgrounds (BGs) which seems better as compared to G-K and Uhrhammer’method as justified from their respective COV values but not obtained the optimum COV.
Here, the performance of the proposed method is also compared with the recently introduced algorithms by Vijay and Nanda [30] that are based on the K-means clustering approach. This method has very less execution time for declustering but the results are highly dependent on the selection of the cluster centroids at the beginning. The obtained BGs from this method is fair enough but they are not justified in terms of the \(COV_b(T)\) (not close to 1). The results obtained from the recently reported declustering algorithm by Zaliapin and Ben-Zion [36] are also summarized in Table 6, after setting the optimum parameter values. The classification results are almost similar to the proposed method in terms of the number of AFs and BGs for both catalogs. But the execution time of the algorithm is the main concern which is highest among all the reported methods due to the determination of the 3D distance metric. Thus, comparative analysis indicates that the proposed ES-TSC is a competitive approach for categorizing the earthquake events in terms of AFs and BGs from the given overall earthquake catalogs.
6 Conclusion
This research work has highlighted earthquake time-series analysis with declustering capability by discriminating the clustered AFs sequences and regular background events from catalogs. The meaningful sub-sequences are comprised of random background events only and thus follow characteristics of homogeneous Poisson process in the time domain. The inter-event time statistics and COV(T) are found useful in the sliding temporal window for pointing out the when and where to filter the sequence. The filtered AFs’ population in each subsequence explicitly indicates the risk factor in terms of their spatio-temporal extent. Based on the graphical representation and statistical evidence, the proposed approach provides a competitive way to solve the problem of seismicity declustering in less computation time. Thus, large earthquake catalogs are used for experimental analysis in the future for declustering outcomes. The performance of the proposed approach indicates that the use of subsequence time series data is still important for improving time series data mining for discovering informative and salient sub-sequence features in different fields and research.
Data availability
The earthquake catalogs that support the findings of this research work are available by International Seismological Centre (2022), ISC-GEM Earthquake Catalogue, https://doi.org/10.31905/d808b825 and Northern California Earthquake Data Center (NCEDC), https://doi.org/10.7932/NCEDC.
References
Aden-Antoniów F, Frank W, Seydoux L (2022) An adaptable random forest model for the declustering of earthquake catalogs. J Geophys Res Solid Earth 127(2):e2021JB023254
Bakker M, Schaars F (2019) Solving groundwater flow problems with time series analysis: you may not even need another model. Groundwater 57(6):826–833
Chitra P, Rajasekaran UM, et al (2022) Time-series analysis and flood prediction using a deep learning approach. In: 2022 international conference on wireless communications signal processing and networking (WiSPNET), IEEE, pp 139–142
Corsaro S, Angelis PLD, Fiore U, Marino Z, Perla F, Pietroluongo M (2021) Wavelets in multi-scale time series analysis: an application to seismic data. Dynamics of disasters. Springer, Berlin, pp 93–100
Dad JM, Muslim M, Rashid I, Reshi ZA (2021) Time series analysis of climate variability and trends in kashmir himalaya. Ecol Ind 126:107690
Di Giacomo D, Engdahl ER, Storchak DA (2018) The ISC-GEM earthquake catalogue (1904–2014): status after the extension project. Earth Syst Sci Data 10:1877–1899. https://doi.org/10.5194/essd-10-1877-2018
D’Urso P, De Giovanni L, Massari R, D’Ecclesia RL, Maharaj EA (2020) Cepstral-based clustering of financial time series. Expert Syst Appl, p 113705
Gardner J, Knopoff L (1974) Is the sequence of earthquakes in southern california, with aftershocks removed, poissonian? Bull Seismol Soc Am 64(5):1363–1367
Gupta A, Gupta HP, Biswas B, Dutta T (2020) Approaches and applications of early classification of time series: a review. IEEE Trans Artif Intell 1(1):47–61
ISC (2020) International seismological centre (2020), ISC-GEM earthquake catalogue https://doi.org/10.31905/d808b825
Javed A, Lee BS, Rizzo DM (2020) A benchmark study on time series clustering. Mach Learn Appl 1:100001
Kagan YY, Jackson DD (1991) Long-term earthquake clustering. Geophys J Int 104(1):117–133
Khan N, Haq IU, Ullah FUM, Khan SU, Lee MY (2021) Cl-net: Convlstm-based hybrid architecture for batteries’ state of health and power consumption forecasting. Mathematics 9(24):3326
Khan N, Ullah FUM, Haq IU, Khan SU, Lee MY, Baik SW (2021) Ab-net: a novel deep learning assisted framework for renewable energy generation forecasting. Mathematics 9(19):2456
Kundu S, Opris A, Yukutake Y, Hatano T (2020) Extracting correlations in earthquake time series using complex network analysis. arXiv preprint arXiv:2004.05415
Marsan D, Prono E, Helmstetter A (2013) Monitoring aseismic forcing in fault zones using earthquake time series. Bull Seismol Soc Am 103(1):169–179
Michas G, Vallianatos F (2018) Stochastic modeling of nonstationary earthquake time series with long-term clustering effects. Phys Rev E 98(4):042107
Moustra M, Avraamides M, Christodoulou C (2011) Artificial neural networks for earthquake prediction using time series magnitude data or seismic electric signals. Expert Syst Appl 38(12):15032–15039
NCEDC (2021) Northern California Earthquake Data Center. UC Berkeley Seismological Laboratory Dataset. https://doi.org/10.7932/NCEDC
Picoli MCA, Camara G, Sanches I, Simões R, Carvalho A, Maciel A, Coutinho A, Esquerdo J, Antunes J, Begotti RA et al (2018) Big earth observation time series analysis for monitoring brazilian agriculture. ISPRS J Photogramm Remote Sens 145:328–339
Qi H, Xiao S, Shi R, Ward MP, Chen Y, Tu W, Su Q, Wang W, Wang X, Zhang Z (2020) Covid-19 transmission in mainland china is associated with temperature and humidity: A time-series analysis. Sci Total Environ 728:138778
Reasenberg P (1985) Second-order moment of central california seismicity, 1969–1982. J Geophys Res Solid Earth 90(B7):5479–5495
Ruiz L, Pegalajar M, Arcucci R, Molina-Solana M (2020) A time-series clustering methodology for knowledge extraction in energy consumption data. Expert Syst Appl 160:113731
Sarlis NV, Skordas ES, Varotsos PA (2018) Natural time analysis of seismic time series. Complexity of seismic time series. Elsevier, Amsterdam, pp 199–235
van Stiphout T, Zhuang J, Marsan D (2012) Seismicity declustering. Commun Online Res Stat Seism Anal 10:1–25
Storchak EA (2013) Public release of the isc-gem global instrumental earthquake catalogue (1900–2009). Seism Res Lett 84(5):810–815. https://doi.org/10.1785/0220130034
Storchak EA (2015) The ISC-GEM global instrumental earthquake catalogue (1900–2009). Introd Phys Earth Planet Int 239:48–63. https://doi.org/10.1016/j.pepi.2014.06.009
Uhrhammer R (1986) Characteristics of northern and central california seismicity. Earthq Notes 57(1):21
Ullah FUM, Khan N, Hussain T, Lee MY, Baik SW (2021) Diving deep into short-term electricity load forecasting: comparative analysis and a novel framework. Mathematics 9(6):611
Vijay RK, Nanda SJ (2017) Tetra-stage cluster identification model to analyse the seismic activities of japan, himalaya and taiwan. IET Signal Proc 12(1):95–103
Vijay RK, Nanda SJ (2019) A quantum grey wolf optimizer based declustering model for analysis of earthquake catalogs in an ergodic framework. J Comput Sci 36:101019
Vijay RK, Nanda SJ (2019) Shared nearest neighborhood intensity based declustering model for analysis of spatio-temporal seismicity. IEEE J Select Topics Appl Earth Observ Remote Sens 12(5):1619–1627
Vogel E, Saravia G, Pastén D, Muñoz V (2017) Time-series analysis of earthquake sequences by means of information recognizer. Tectonophysics 712:723–728
Wang W, Yildirim G (2022) Applied time-series analysis in marketing. Handbook of Market Research. Springer, Berlin, pp 469–513
Wu Z (2010) A hidden markov model for earthquake declustering. J Geophys Res Solid Earth 115(B3)
Zaliapin I, Ben-Zion Y (2020) Earthquake declustering using the nearest-neighbor approach in space-time-magnitude domain. J Geophys Res Solid Earth 125(4):e2018JB017120
Zhu B, Hou X, Liu S, Ma W, Dong M, Wen H, Wei Q, Du S, Zhang Y (2021) Iot equipment monitoring system based on c5. 0 decision tree and time-series analysis. IEEE Access 10:36637–36648
Zhuang J, Ogata Y, Vere-Jones D (2002) Stochastic declustering of space-time earthquake occurrences. J Am Stat Assoc 97(458):369–380
Zhuang J, Chang CP, Ogata Y, Chen YI (2005) A study on the background and clustering seismicity in the taiwan region by using point process models. J Geophys Res Solid Earth 110(B5)
Funding
There is no funding support for this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vijay, R.K., Nanda, S.J. Earthquake pattern analysis using subsequence time series clustering. Pattern Anal Applic 26, 19–37 (2023). https://doi.org/10.1007/s10044-022-01092-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01092-1