Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data

Maël Chiapino¹⁸ &
Anne Sabourin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10312))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

698 Accesses
6 Citations

Abstract

The dependence structure of extreme events of multivariate nature plays a special role for risk management applications, in particular in hydrology (flood risk). In a high dimensional context ($d>50$), a natural first step is dimension reduction. Analyzing the tails of a dataset requires specific approaches: earlier works have proposed a definition of sparsity adapted for extremes, together with an algorithm detecting such a pattern under strong sparsity assumptions. Given a dataset that exhibits no clear sparsity pattern we propose a clustering algorithm allowing to group together the features that are ‘dependent at extreme level’, i.e.,that are likely to take extreme values simultaneously. To bypass the computational issues that arise when it comes to dealing with possibly $O(2^d)$ subsets of features, our algorithm exploits the graphical structure stemming from the definition of the clusters, similarly to the Apriori algorithm, which reduces drastically the number of subsets to be screened. Results on simulated and real data show that our method allows a fast recovery of a meaningful summary of the dependence structure of extremes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A wee exploration of techniques for risk assessments of extreme events

Article Open access 01 October 2024

Clustering of extreme values: estimation and application

Article Open access 31 March 2023

Clustering time series by extremal dependence

Article Open access 28 May 2024

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. DMKD 11(1), 5–33 (2005)
MathSciNet Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of extremes: theory and applications. Wiley, Hoboken (2006)
MATH Google Scholar
Boldi, M.O., Davison, A.: A mixture model for multivariate extremes. JRSS-B 69(2), 217–229 (2007)
Article MathSciNet MATH Google Scholar
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)
Article MATH Google Scholar
Chautru, E.: Dimension reduction in multivariate extreme value analysis. Electron. J. Stat. 9(1), 383–418 (2015)
Article MathSciNet MATH Google Scholar
Clifton, D.A., Hugueny, S., Tarassenko, L.: Novelty detection with multivariate extreme value statistics. J. Sig. Process. Syst. 65(3), 371–389 (2011)
Article Google Scholar
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. Springer, London (2001)
Book MATH Google Scholar
Coles, S., Tawn, J.: Modeling extreme multivariate events. JRSS-B 53, 377–392 (1991)
MATH Google Scholar
Cooley, D., Davis, R., Naveau, P.: The pairwise beta distribution: a flexible parametric multivariate model for extremes. JMVA 101(9), 2103–2117 (2010)
MathSciNet MATH Google Scholar
Einmahl, J.H., Segers, J.: Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. Ann. Stat. 37, 2953–2989 (2009)
Article MathSciNet MATH Google Scholar
Fougeres, A.L., De Haan, L., Mercadier, C., et al.: Bias correction in multivariate extremes. Ann. Stat. 43(2), 903–934 (2015)
Article MathSciNet MATH Google Scholar
Fougeres, A.L., Mercadier, C., Nolan, J.P.: Dense classes of multivariate extreme value distributions. J. Multivar. Anal. 116, 109–129 (2013)
Article MathSciNet MATH Google Scholar
Giuntoli, I., Renard, B., Vidal, J.P., Bard, A.: Low flows in france and their relationship to large-scale climate indices. J. Hydro. 482, 105–118 (2013)
Article Google Scholar
Goix, N., Sabourin, A., Clémençon, S.: Learning the dependence structure of rare events: a non-asymptotic study. In: Proceedings of the 28th COLT (2015)
Google Scholar
Goix, N., Sabourin, A., Clémençon, S.: Sparsity in multivariate extremes with applications to anomaly detection. arXiv preprint arXiv:1507.05899 (2015)
Goix, N., Sabourin, A., Clémençon, S.: Sparse representation of multivariate extremes with applications to anomaly ranking. In: Proceedings of the 19th AISTAT conference, pp. 287–295 (2016)
Google Scholar
Guillotte, S., Perron, F., Segers, J.: Non-parametric Bayesian inference on bivariate extremes. JRSS-B 73(3), 377–406 (2011)
Article MathSciNet Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)
Article Google Scholar
Katz, R.W., Parlange, M.B., Naveau, P.: Statistics of extremes in hydrology. Adv. Water Resour. 25(8), 1287–1304 (2002)
Article Google Scholar
Lee, H.-J., Roberts, S.J.: On-line novelty detection using the Kalman filter and extreme value theory. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Google Scholar
Qi, Y.: Almost sure convergence of the stable tail empirical dependence function in multivariate extreme statistics. Acta Math. Applicatae Sin. (English Ser.) 13(2), 167–175 (1997)
Article MathSciNet MATH Google Scholar
Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer, Heidelberg (2013)
MATH Google Scholar
Sabourin, A., Naveau, P.: Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization. CSDA 71, 542–567 (2014)
MathSciNet Google Scholar
Sabourin, A., Naveau, P., Fougeres, A.L.: Bayesian model averaging for multivariate extremes. Extremes 16(3), 325 (2013)
Article MathSciNet MATH Google Scholar
Stephenson, A.: Simulating multivariate extreme value distributions of logistic type. Extremes 6(1), 49–59 (2003)
Article MathSciNet MATH Google Scholar
Tawn, J.A.: Modelling multivariate extreme value distributions. Biometrika 77(2), 245–253 (1990)
Article MathSciNet MATH Google Scholar
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theoret. Comput. Sci. 363(1), 28–42 (2006)
Article MathSciNet MATH Google Scholar
Xie, Y., Philip, S.Y.: Max-clique: a top-down graph-based approach to frequent pattern mining. In: 2010 IEEE International Conference Data Mining, pp. 1139–1144. IEEE (2010)
Google Scholar

Download references

Acknowledgments

Part of this work has been funded by the the ‘LabEx Mathématiques Hadamard’ (LMH) project, by the ‘AGREED’ project from the PEPS JCJC program (INS2I, CNRS) and by the chair ‘Machine Learning for Big Data’ from Télécom ParisTech. The authors would like to thank Benjamin Renard for interesting discussions about the hydrological use case and for sharing the data.

Author information

Authors and Affiliations

LTCI, Télécom ParisTech, Université Paris-Saclay, Paris, France
Maël Chiapino & Anne Sabourin

Authors

Maël Chiapino
View author publications
You can also search for this author in PubMed Google Scholar
Anne Sabourin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maël Chiapino .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Annalisa Appice
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Università degli Studi di Bari Aldo Moro, Bari, Italy
Corrado Loglisci
ICAR-CNR, Rende, Italy
Elio Masciari
University of North Carolina, Charlotte, North Carolina, USA
Zbigniew W. Raś

A Appendix: Proof of Lemma 1

Step 1. As a first step we show that $\mathcal {M}\subset \mathbb {M}, \text {i.e.,} \mu (\mathcal {C}_\alpha )>0 \Rightarrow \mu (\varGamma _\alpha )>0. $

Proof

Write $\mathcal {C}_\alpha = \bigcup _{\epsilon >0,\epsilon \in \mathbb {Q}} R_{\alpha ,\epsilon }$, where $R_{\alpha ,\epsilon } = \{x \in \mathbb {R}_+^d:\; \Vert x\Vert _{\infty }\ge 1; \quad x_j > \epsilon ~ (j\in \alpha ); \quad x_i = 0 ~ (i\notin \alpha ) \}.$ Assume $\mu (\mathcal {C}_\alpha )>0$. Since $\mu (\mathcal {C}_{\alpha } )<\infty $, by the monotonous limit property of the measure $\mu $, we have $\mu (\mathcal {C}_\alpha ) = \lim _{\epsilon \rightarrow 0} \mu (R_{\alpha ,\epsilon })$. Also, from the definitions, $R_{\alpha ,\epsilon }\subset \epsilon \varGamma _{\alpha }$. Thus,

$$\begin{aligned} \mu (\mathcal {C}_\alpha )>0&\Rightarrow \exists \epsilon \in (0,1) : \mu (R_{\alpha ,\epsilon })>0 \qquad \Rightarrow \mu (\epsilon \varGamma _{\alpha })>0 \\&\Rightarrow \rho _\alpha = \mu (\varGamma _{\alpha }) = \epsilon \mu (\epsilon \varGamma _{\alpha })>0. \end{aligned}$$

Step 2. We now prove the reverse inclusion for maximal elements of $\mathbb {M}$, i.e.,

$$\begin{aligned} \alpha \text { is maximal in } \mathbb {M} \quad \Rightarrow \alpha \in \mathcal {M}. \end{aligned}$$

(13)

Proof

Consider, for $ i \notin \alpha $, the set $ \;\Delta _{i, \epsilon } = \varGamma _{\alpha }\cap \{ x \in \mathbb {R}_+^d : \quad x_i > \epsilon \}, $ so that $ \varGamma _{\alpha } = \big \{ \bigcup _{\begin{array}{c} i \in \{1,\ldots ,d\}\setminus \alpha \\ \epsilon \in \mathbb {Q} \cap (0,1) \end{array}} \Delta _{i, \epsilon }\big \} \cup R_{\alpha , 1}. $ Thus,

$$\begin{aligned} \alpha \in \mathbb {M} ~~\Rightarrow ~~\mu (\varGamma _{\alpha } )>0 ~~\Rightarrow ~~\Big (\exists i, \mu (\Delta _{i,\epsilon })>0 ~~\text { or }~~ \mu ( R_{\alpha ,1} )>0\Big ) \end{aligned}$$

(14)

To prove (13), it is enough to show that

$$\begin{aligned} \alpha \in \mathbb {M}\quad \Rightarrow \quad \text { for } i\notin \alpha , \;\mu (\Delta _{i,\epsilon }) =0 . \end{aligned}$$

(15)

Indeed if (15) is true, and if $\alpha \in \mathbb {M}$, then (14) implies that $\mu (R_{\alpha ,1})>0$, and the result follows from the inclusion $R_{\alpha ,1}\subset \mathcal {C}_\alpha $. We show (15) by contradiction. If $\mu (\Delta _{i,\epsilon })>0$ for some $i\notin \alpha $, then

$$\frac{1}{\epsilon } \Delta _{i,\epsilon } = \left( \frac{1}{\epsilon }\,\varGamma _{\alpha }\right) \cap \{x\in \mathbb {R}_+^d : x_i>1 \}\subset \varGamma _{\alpha \cup \{i\}},$$

thus $\mu (\varGamma _{\alpha \cup \{i\}})>0$, which contradicts the maximality of $\alpha $ in $\mathbb {M}$.

Step 3. From (13), if $\alpha \text { is maximal in } \mathbb {M}$ then $ \alpha \in \mathcal {M}$. Now if $\alpha $ is maximal in $\mathbb {M}$ but not in $\mathcal {M}$, there exists $\beta \supsetneq \alpha $ in $\mathcal {M}$. Thus from Step 1, $\beta \in \mathbb {M}$, a contradiction. Hence $\alpha $ is also maximal in $\mathcal {M}$. Conversely, if $\alpha $ is maximal in $\mathcal {M}$ then (Step 1) $\alpha \in \mathbb {M}$. If $\alpha $ was not maximal in $\mathbb {M}$, there would exist $\beta \supsetneq \alpha $ maximal in $\mathbb {M}$, and from (13), $\beta \in \mathcal {M}$, contradicting the maximality of $\alpha $ in $\mathcal {M}$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiapino, M., Sabourin, A. (2017). Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data. In: Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2016. Lecture Notes in Computer Science(), vol 10312. Springer, Cham. https://doi.org/10.1007/978-3-319-61461-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-61461-8_9
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61460-1
Online ISBN: 978-3-319-61461-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A wee exploration of techniques for risk assessments of extreme events

Clustering of extreme values: estimation and application

Clustering time series by extremal dependence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Proof of Lemma 1

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A wee exploration of techniques for risk assessments of extreme events

Clustering of extreme values: estimation and application

Clustering time series by extremal dependence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Proof of Lemma 1

A Appendix: Proof of Lemma 1

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation