Abstract
The dependence structure of extreme events of multivariate nature plays a special role for risk management applications, in particular in hydrology (flood risk). In a high dimensional context (\(d>50\)), a natural first step is dimension reduction. Analyzing the tails of a dataset requires specific approaches: earlier works have proposed a definition of sparsity adapted for extremes, together with an algorithm detecting such a pattern under strong sparsity assumptions. Given a dataset that exhibits no clear sparsity pattern we propose a clustering algorithm allowing to group together the features that are ‘dependent at extreme level’, i.e.,that are likely to take extreme values simultaneously. To bypass the computational issues that arise when it comes to dealing with possibly \(O(2^d)\) subsets of features, our algorithm exploits the graphical structure stemming from the definition of the clusters, similarly to the Apriori algorithm, which reduces drastically the number of subsets to be screened. Results on simulated and real data show that our method allows a fast recovery of a meaningful summary of the dependence structure of extremes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. DMKD 11(1), 5–33 (2005)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of extremes: theory and applications. Wiley, Hoboken (2006)
Boldi, M.O., Davison, A.: A mixture model for multivariate extremes. JRSS-B 69(2), 217–229 (2007)
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)
Chautru, E.: Dimension reduction in multivariate extreme value analysis. Electron. J. Stat. 9(1), 383–418 (2015)
Clifton, D.A., Hugueny, S., Tarassenko, L.: Novelty detection with multivariate extreme value statistics. J. Sig. Process. Syst. 65(3), 371–389 (2011)
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. Springer, London (2001)
Coles, S., Tawn, J.: Modeling extreme multivariate events. JRSS-B 53, 377–392 (1991)
Cooley, D., Davis, R., Naveau, P.: The pairwise beta distribution: a flexible parametric multivariate model for extremes. JMVA 101(9), 2103–2117 (2010)
Einmahl, J.H., Segers, J.: Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. Ann. Stat. 37, 2953–2989 (2009)
Fougeres, A.L., De Haan, L., Mercadier, C., et al.: Bias correction in multivariate extremes. Ann. Stat. 43(2), 903–934 (2015)
Fougeres, A.L., Mercadier, C., Nolan, J.P.: Dense classes of multivariate extreme value distributions. J. Multivar. Anal. 116, 109–129 (2013)
Giuntoli, I., Renard, B., Vidal, J.P., Bard, A.: Low flows in france and their relationship to large-scale climate indices. J. Hydro. 482, 105–118 (2013)
Goix, N., Sabourin, A., Clémençon, S.: Learning the dependence structure of rare events: a non-asymptotic study. In: Proceedings of the 28th COLT (2015)
Goix, N., Sabourin, A., Clémençon, S.: Sparsity in multivariate extremes with applications to anomaly detection. arXiv preprint arXiv:1507.05899 (2015)
Goix, N., Sabourin, A., Clémençon, S.: Sparse representation of multivariate extremes with applications to anomaly ranking. In: Proceedings of the 19th AISTAT conference, pp. 287–295 (2016)
Guillotte, S., Perron, F., Segers, J.: Non-parametric Bayesian inference on bivariate extremes. JRSS-B 73(3), 377–406 (2011)
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)
Katz, R.W., Parlange, M.B., Naveau, P.: Statistics of extremes in hydrology. Adv. Water Resour. 25(8), 1287–1304 (2002)
Lee, H.-J., Roberts, S.J.: On-line novelty detection using the Kalman filter and extreme value theory. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Qi, Y.: Almost sure convergence of the stable tail empirical dependence function in multivariate extreme statistics. Acta Math. Applicatae Sin. (English Ser.) 13(2), 167–175 (1997)
Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer, Heidelberg (2013)
Sabourin, A., Naveau, P.: Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization. CSDA 71, 542–567 (2014)
Sabourin, A., Naveau, P., Fougeres, A.L.: Bayesian model averaging for multivariate extremes. Extremes 16(3), 325 (2013)
Stephenson, A.: Simulating multivariate extreme value distributions of logistic type. Extremes 6(1), 49–59 (2003)
Tawn, J.A.: Modelling multivariate extreme value distributions. Biometrika 77(2), 245–253 (1990)
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theoret. Comput. Sci. 363(1), 28–42 (2006)
Xie, Y., Philip, S.Y.: Max-clique: a top-down graph-based approach to frequent pattern mining. In: 2010 IEEE International Conference Data Mining, pp. 1139–1144. IEEE (2010)
Acknowledgments
Part of this work has been funded by the the ‘LabEx Mathématiques Hadamard’ (LMH) project, by the ‘AGREED’ project from the PEPS JCJC program (INS2I, CNRS) and by the chair ‘Machine Learning for Big Data’ from Télécom ParisTech. The authors would like to thank Benjamin Renard for interesting discussions about the hydrological use case and for sharing the data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Proof of Lemma 1
A Appendix: Proof of Lemma 1
Step 1. As a first step we show that \(\mathcal {M}\subset \mathbb {M}, \text {i.e.,} \mu (\mathcal {C}_\alpha )>0 \Rightarrow \mu (\varGamma _\alpha )>0. \)
Proof
Write \(\mathcal {C}_\alpha = \bigcup _{\epsilon >0,\epsilon \in \mathbb {Q}} R_{\alpha ,\epsilon }\), where \(R_{\alpha ,\epsilon } = \{x \in \mathbb {R}_+^d:\; \Vert x\Vert _{\infty }\ge 1; \quad x_j > \epsilon ~ (j\in \alpha ); \quad x_i = 0 ~ (i\notin \alpha ) \}.\) Assume \(\mu (\mathcal {C}_\alpha )>0\). Since \(\mu (\mathcal {C}_{\alpha } )<\infty \), by the monotonous limit property of the measure \(\mu \), we have \(\mu (\mathcal {C}_\alpha ) = \lim _{\epsilon \rightarrow 0} \mu (R_{\alpha ,\epsilon })\). Also, from the definitions, \(R_{\alpha ,\epsilon }\subset \epsilon \varGamma _{\alpha }\). Thus,
Step 2. We now prove the reverse inclusion for maximal elements of \(\mathbb {M}\), i.e.,
Proof
Consider, for \( i \notin \alpha \), the set \( \;\Delta _{i, \epsilon } = \varGamma _{\alpha }\cap \{ x \in \mathbb {R}_+^d : \quad x_i > \epsilon \}, \) so that \( \varGamma _{\alpha } = \big \{ \bigcup _{\begin{array}{c} i \in \{1,\ldots ,d\}\setminus \alpha \\ \epsilon \in \mathbb {Q} \cap (0,1) \end{array}} \Delta _{i, \epsilon }\big \} \cup R_{\alpha , 1}. \) Thus,
To prove (13), it is enough to show that
Indeed if (15) is true, and if \(\alpha \in \mathbb {M}\), then (14) implies that \(\mu (R_{\alpha ,1})>0\), and the result follows from the inclusion \(R_{\alpha ,1}\subset \mathcal {C}_\alpha \). We show (15) by contradiction. If \(\mu (\Delta _{i,\epsilon })>0\) for some \(i\notin \alpha \), then
thus \(\mu (\varGamma _{\alpha \cup \{i\}})>0\), which contradicts the maximality of \(\alpha \) in \(\mathbb {M}\).
Step 3. From (13), if \(\alpha \text { is maximal in } \mathbb {M}\) then \( \alpha \in \mathcal {M}\). Now if \(\alpha \) is maximal in \(\mathbb {M}\) but not in \(\mathcal {M}\), there exists \(\beta \supsetneq \alpha \) in \(\mathcal {M}\). Thus from Step 1, \(\beta \in \mathbb {M}\), a contradiction. Hence \(\alpha \) is also maximal in \(\mathcal {M}\). Conversely, if \(\alpha \) is maximal in \(\mathcal {M}\) then (Step 1) \(\alpha \in \mathbb {M}\). If \(\alpha \) was not maximal in \(\mathbb {M}\), there would exist \(\beta \supsetneq \alpha \) maximal in \(\mathbb {M}\), and from (13), \(\beta \in \mathcal {M}\), contradicting the maximality of \(\alpha \) in \(\mathcal {M}\).
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chiapino, M., Sabourin, A. (2017). Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data. In: Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2016. Lecture Notes in Computer Science(), vol 10312. Springer, Cham. https://doi.org/10.1007/978-3-319-61461-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-61461-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61460-1
Online ISBN: 978-3-319-61461-8
eBook Packages: Computer ScienceComputer Science (R0)