Abstract
Existing adaptive kernel density estimators (KDEs) and kernel regressions (KRs) often employ a data-independent kernel, such as Gaussian kernel. They require an additional means to adapt the kernel bandwidth locally in a given dataset in order to produce better estimations. But this comes with high computational cost. In this paper, we show that adaptive KDEs and KRs can be directly derived from Isolation Kernel with constant-time complexity for each estimation. The resultant estimators called IKDE and IKR are the first KDE and KR that are fast and adaptive. We demonstrate both the superior efficiency and efficacy of IKDE and IKR in anomaly detection and regression tasks, respectively.
Similar content being viewed by others
Notes
This is despite the fact that all adapted bandwidths can be found in the preprocessing step using the second approach.
We identify that there is a typo in the original equation \(\sigma _x = \min (mean_{y \in kNN(x)} \parallel x-y \parallel , \varepsilon )\) in paper [13], i.e., the min function should be the max function; otherwise, most points would have the same bandwidth of \(\varepsilon \).
References
Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 20(3):1236–1265
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837
Gentleman WM, Sande G (1966) Fast Fourier transforms—for fun and profit. In: Proceedings of AFIPS fall joint computer conference, pp 563–578
Scott DW (1981) Using computer-binned data for density estimation. In: Proceedings of the 13th symposium on the interface of computer science and statistics. Springer, New York, pp 292–294
Silverman BW (1982) Algorithm as 176: kernel density estimation using the Fast Fourier Transform. J Roy Stat Soc: Ser C (Appl Stat) 31(1):93–99
Wand MP (1994) Fast computation of multivariate kernel estimators. J Comput Graph Stat 3(4):433–445
O’Brien TA, Kashinath K, Cavanaugh NR, Collins WD, O’Brien JP (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Comput Stat Data Anal 101:148–160
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization, 2nd edn. John Wiley & Sons Inc, Hoboken
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13, pp 682–688
Luo C, Shrivastava A (2018) Arrays of (locality-sensitive) count estimators (ACE): high-speed anomaly detection via cache lookups. In: Proceedings of the world wide web conference, pp 1439–1448
Coleman B, Shrivastava A (2020) Sub-linear RACE sketches for approximate kernel density estimation on streaming data. In: Proceedings of the world wide web conference, pp 1739–1749
Backurs A, Indyk P, Wagner T (2019) Space and time efficient kernel density estimation in high dimensions. In: Advances in neural information processing systems, vol 32, pp 15799–15808
Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM conference on data mining, pp 542–550
Ting KM, Zhu Y, Zhou Z-H (2018) Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2329–2337
Watson GS (1964) Smooth regression analysis. Sankhyā: Indian J Stat Ser A 26(4):359–372
Sain SR (2002) Multivariate locally adaptive density estimation. Comput Stat Data Anal 39:165–186
Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl Based Syst 139:50–63
Phillips JM, Tai WM (2020) Near-optimal coresets of kernel density estimates. Discret. Comput. Geom. 63:867–887
Chen Y, Welling M, Smola A (2010) Super-samples from kernel herding. In: Proceedings of the 26th conference on uncertainty in artificial intelligence, pp 109–116
Cortes EC, Scott C (2017) Sparse approximation of a kernel mean. IEEE Trans Signal Process 65(5):1310–1323
Siminelakis P, Rong K, Bailis P, Charikar M, Levis P (2019) Rehashing kernel evaluation in high dimensions. In: Proceedings of the 36th international conference on machine learning, pp 5789–5798
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th ACM symposium on theory of computing, pp 380–388
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262
Charikar M, Siminelakis P (2017) Hashing-based-estimators for kernel density in high dimensions. In: 2017 IEEE 58th annual symposium on foundations of computer science (FOCS). IEEE, pp 1032–1043
Qin X, Ting KM, Zhu Y, Lee VCS (2019) Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, pp 4755–4762
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405
Alishahi K, Sharifitabar M (2008) Volume degeneracy of the typical cell and the chord length distribution for Poisson-Voronoi tessellations in high dimensions. Adv Appl Probab 40(4):919–938
Calka P, Chenavier N (2014) Extreme values for characteristic radii of a Poisson-Voronoi tessellation. Extremes 17(3):359–385
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Ting KM, Xu B-C, Washio T, Zhou Z-H (2020) Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 198–206
Kim J, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(82):2529–2565
Musco C, Musco C (2017) Recursive sampling for the Nyström method. In: Advances in neural information processing systems, pp 3833–3845
Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Liu FT, Ting KM, Zhou Z-H (2008) Isolation Forest. In: Proceedings of the IEEE international conference on data mining, pp 413–422
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Noda K (1976) Estimation of a regression function by the Parzen kernel-type density estimators. Ann Inst Stat Math 28(1):221–234
Tosatto S, Akrour N, Peters J (2020) An upper bound of the bias of Nadaraya–Watson kernel regression under Lipschitz assumptions. Stats 4(1):1–17
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Andoni A, Razenshteyn I (2015) Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, pp 793–801
Andoni A, Laarhoven T, Razenshteyn I, Waingarten E (2017) Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of the twenty-eighth annual ACM-SIAM symposium on discrete algorithms, pp 47–66
Lever G, Diethe T, Shawe-Taylor J (2012) Data dependent kernels in nearly-linear time. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 685–693
Ionescu C, Popa A, Sminchisescu C (2017) Large-scale data-dependent kernel approximation. In: Proceedings of the 20th international conference on artificial intelligence and statistics, pp 19–27
Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141
Ting KM, Washio T, Wells JR, Zhang H (2021) Isolation kernel density estimation. In: Proceedings of IEEE international conference on data mining, pp 619–628
Acknowledgements
The preliminary version of this paper was reported in 2021 IEEE International Conference on Data Mining [45], and it presents the work on kernel density estimation only. Kai Ming Ting was supported by Natural Science Foundation of China (62076120).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Experimental settings
The experiments ran on a Linux CPU machine: AMD 16-core CPU with each core running at 2 GHz and 32 GB RAM. The IK feature mapping ran on a machine having GPU: 2 x GTX 1080 Ti with 3584 (1.6 GHz) CUDA cores & 12 GB graphic memory and CPU: i9-7900X 3.30 GHz processor (20 cores), 64 GB RAM. Only the feature mapping of IK was conducted in GPU. Other processes in IKDE were run in CPU.
We have used the following codes:
* Isolation Kernel:
https://github.com/IsolationKernel/Codes/tree/main/IDK/IKDE to build IKDE;
* LOF:
https://github.com/scikit-learn/scikit-learn/blob/15a949460/sklearn/neighbors/_lof.py.
* Other codes are from the respective authors.
The reported results for randomized algorithms, IKDE, RACE and HBE are averaged over five trials. The results of the deterministic algorithms, e.g., OKDE and KDEOS, are obtained from a single trial.
All datasets are normalized to [0, 1] in the preprocessing.
Sources of datasets used for anomaly detection are:
* https://lapad-web.icmc.usp.br/repositories/outlier-evaluation/DAMI,
* https://odds.cs.stonybrook.edu, https://sites.google.com/site/gspangsite,
* https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
The parameter search ranges are shown in Table 6.
Appendix B: Proof of Theorem 1
Proof
Since all \(y \in Y\) is \(y \sim {\mathcal {P}}_Y\), we derive the following expectations.
By Lipschitz continuity of \(\rho _Y(y)\) and \(x,y \in \theta [z]\), the following relation holds in \(\theta [z]\).
These expressions give the following bias and variance of \({\bar{f}}_{\psi }(x \vert Y)\).
The second line is from (5) and the homogeneity of \(\theta [z]\), the fourth line is from (B1), and the last two lines are from Cauchy–Schwarz inequality and (5).
The inequality of the third to fifth lines is from (B1). The last two lines are from Cauchy–Schwarz inequality, (5) and Proposition 1.
By Proposition 2, \({\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)] \rightarrow 0\) if \(\psi \rightarrow \infty \). Accordingly, both \({\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]\) and \({\textit{Var}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]\) converge on 0 if \( n \rightarrow \infty \), \(\psi \propto n^\alpha \) and \(0<\alpha <1\). \(\square \)
Appendix C: Analysis of consistency of IKR \(_V\)
The Nadaraya–Watson-type kernel regression for one-dimensional data space is known to be consistent [37] if its kernel function \(\kappa (x,y)\) is finite, absolutely integrable and normalized and satisfies:
and its kernel bandwidth h depends on the size n of the dataset W as
and
This consistency of the kernel regression holds for both fixed and variable kernel bandwidths as far as h satisfies the last two conditions. All of GKR, GKR\(_N\) and GKROS satisfy these conditions, and thus, they are consistent.
The Voronoi diagram-based Isolation Kernel Regression (IKR\(_V\)) \(\kappa _\psi (x,y \vert U)\), shown in Eq. (4), is finite and absolutely integrable and can be normalized. When we chose \(\psi \propto n^\alpha \) (\(0<\alpha <1\)) as stated in Theorem 1, the upper bound of a Voronoi partition’s half size, i.e., \(R_{{\textit{max}}}(\psi )\) on the one-dimensional data space is probabilistically bounded for large n. That is, for large \(\psi \), Eqs. (6) and (7) give
Equation (2) means \(\kappa _\psi (x,y \vert U) = P(x,y \in \theta [z] \vert \theta [z] \in H)\). Thus, \(\kappa _\psi (x,y \vert U)\) is upper bounded by setting \(\epsilon = \vert x-y \vert / 2\) as
Accordingly, the following formula holds.
By applying L’Hospital’s rule three times, the r.h.s. is known to converge to zero in the limit \(\vert x -y \vert \rightarrow \infty \). Thus, the Isolation Kernel also satisfies Eq. (C2).
From Proposition 2, the size of a Voronoi partition, which corresponds to h(n) upper bounded by \(2R_{{\textit{max}}}(\psi )\), converges to zero if \(n \rightarrow \infty \). Thus, the condition of Eq. (C3) holds.
Furthermore, let \(R_{{\textit{min}}}(\psi )\) be the minimum inscribed radii of a Voronoi partition \(\theta [z] \in H\) generated in a homogeneous Poisson point process of intensity \(\psi / \vert {\mathcal {X}}_U \vert \). Calka and Chenavier [28] showed that
holds in the limit \(\psi \rightarrow \infty \) for one-dimensional space, where \(c_3\) is a positive constant. This fact implies
in the limit \(\psi \rightarrow \infty \). If we set \(\psi \propto n^\alpha \) (\(0<\alpha <1/2\)) and make \(n \rightarrow \infty \), \(n R_{{\textit{min}}}(\psi ) \ge \delta \) holds for any large \(\delta \) with probability one. Thus, \(n R_{{\textit{min}}}(\psi ) \rightarrow \infty \) holds with probability one under \(n \rightarrow \infty \). This fact implies Eq. (C4) because h(n) is lower bounded by \(2R_{{\textit{min}}}(\psi )\).
The above analysis shows that the consistency of IKR\(_V\) in one-dimensional data space is as good as the Gaussian kernel-based regression. The hypersphere-based IKR\(_S\) may also have the similar characteristic.
Current analyses on the behavior of the Nadaraya–Watson-type kernel regression for high-dimensional data space are limited in scope. For example, a recent analysis [38] derives an upper bound of the regression bias only. We conjecture that the consistency of kernel regression in high-dimensional data space is similar to the one-dimensional case.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ting, K.M., Washio, T., Wells, J. et al. Isolation Kernel Estimators. Knowl Inf Syst 65, 759–787 (2023). https://doi.org/10.1007/s10115-022-01765-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01765-7