Isolation Kernel Estimators

Kai Ming Ting ORCID: orcid.org/0000-0003-4776-4932¹,
Takashi Washio²,
Jonathan Wells³,
Hang Zhang¹ &
…
Ye Zhu³

499 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Existing adaptive kernel density estimators (KDEs) and kernel regressions (KRs) often employ a data-independent kernel, such as Gaussian kernel. They require an additional means to adapt the kernel bandwidth locally in a given dataset in order to produce better estimations. But this comes with high computational cost. In this paper, we show that adaptive KDEs and KRs can be directly derived from Isolation Kernel with constant-time complexity for each estimation. The resultant estimators called IKDE and IKR are the first KDE and KR that are fast and adaptive. We demonstrate both the superior efficiency and efficacy of IKDE and IKR in anomaly detection and regression tasks, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A fast and accurate explicit kernel map

Article 05 August 2019

Supervised Hyperparameter Estimation for Anomaly Detection

Comparative Analysis of Density Estimation Based Kernel Regression

Notes

This is despite the fact that all adapted bandwidths can be found in the preprocessing step using the second approach.
We identify that there is a typo in the original equation $\sigma _x = \min (mean_{y \in kNN(x)} \parallel x-y \parallel , \varepsilon )$ in paper [13], i.e., the min function should be the max function; otherwise, most points would have the same bandwidth of $\varepsilon $.

References

Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 20(3):1236–1265
Article MATH Google Scholar
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837
Article MATH Google Scholar
Gentleman WM, Sande G (1966) Fast Fourier transforms—for fun and profit. In: Proceedings of AFIPS fall joint computer conference, pp 563–578
Scott DW (1981) Using computer-binned data for density estimation. In: Proceedings of the 13th symposium on the interface of computer science and statistics. Springer, New York, pp 292–294
Silverman BW (1982) Algorithm as 176: kernel density estimation using the Fast Fourier Transform. J Roy Stat Soc: Ser C (Appl Stat) 31(1):93–99
MATH Google Scholar
Wand MP (1994) Fast computation of multivariate kernel estimators. J Comput Graph Stat 3(4):433–445
Google Scholar
O’Brien TA, Kashinath K, Cavanaugh NR, Collins WD, O’Brien JP (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Comput Stat Data Anal 101:148–160
Article MATH Google Scholar
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization, 2nd edn. John Wiley & Sons Inc, Hoboken
MATH Google Scholar
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13, pp 682–688
Luo C, Shrivastava A (2018) Arrays of (locality-sensitive) count estimators (ACE): high-speed anomaly detection via cache lookups. In: Proceedings of the world wide web conference, pp 1439–1448
Coleman B, Shrivastava A (2020) Sub-linear RACE sketches for approximate kernel density estimation on streaming data. In: Proceedings of the world wide web conference, pp 1739–1749
Backurs A, Indyk P, Wagner T (2019) Space and time efficient kernel density estimation in high dimensions. In: Advances in neural information processing systems, vol 32, pp 15799–15808
Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM conference on data mining, pp 542–550
Ting KM, Zhu Y, Zhou Z-H (2018) Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2329–2337
Watson GS (1964) Smooth regression analysis. Sankhyā: Indian J Stat Ser A 26(4):359–372
MATH Google Scholar
Sain SR (2002) Multivariate locally adaptive density estimation. Comput Stat Data Anal 39:165–186
Article MATH Google Scholar
Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl Based Syst 139:50–63
Article Google Scholar
Phillips JM, Tai WM (2020) Near-optimal coresets of kernel density estimates. Discret. Comput. Geom. 63:867–887
Article MATH Google Scholar
Chen Y, Welling M, Smola A (2010) Super-samples from kernel herding. In: Proceedings of the 26th conference on uncertainty in artificial intelligence, pp 109–116
Cortes EC, Scott C (2017) Sparse approximation of a kernel mean. IEEE Trans Signal Process 65(5):1310–1323
Article MATH Google Scholar
Siminelakis P, Rong K, Bailis P, Charikar M, Levis P (2019) Rehashing kernel evaluation in high dimensions. In: Proceedings of the 36th international conference on machine learning, pp 5789–5798
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th ACM symposium on theory of computing, pp 380–388
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262
Charikar M, Siminelakis P (2017) Hashing-based-estimators for kernel density in high dimensions. In: 2017 IEEE 58th annual symposium on foundations of computer science (FOCS). IEEE, pp 1032–1043
Qin X, Ting KM, Zhu Y, Lee VCS (2019) Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, pp 4755–4762
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405
Article Google Scholar
Alishahi K, Sharifitabar M (2008) Volume degeneracy of the typical cell and the chord length distribution for Poisson-Voronoi tessellations in high dimensions. Adv Appl Probab 40(4):919–938
Article MATH Google Scholar
Calka P, Chenavier N (2014) Extreme values for characteristic radii of a Poisson-Voronoi tessellation. Extremes 17(3):359–385
Article MATH Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
MATH Google Scholar
Ting KM, Xu B-C, Washio T, Zhou Z-H (2020) Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 198–206
Kim J, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(82):2529–2565
MATH Google Scholar
Musco C, Musco C (2017) Recursive sampling for the Nyström method. In: Advances in neural information processing systems, pp 3833–3845
Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article MATH Google Scholar
Liu FT, Ting KM, Zhou Z-H (2008) Isolation Forest. In: Proceedings of the IEEE international conference on data mining, pp 413–422
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Article MATH Google Scholar
Noda K (1976) Estimation of a regression function by the Parzen kernel-type density estimators. Ann Inst Stat Math 28(1):221–234
Article MATH Google Scholar
Tosatto S, Akrour N, Peters J (2020) An upper bound of the bias of Nadaraya–Watson kernel regression under Lipschitz assumptions. Stats 4(1):1–17
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Andoni A, Razenshteyn I (2015) Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, pp 793–801
Andoni A, Laarhoven T, Razenshteyn I, Waingarten E (2017) Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of the twenty-eighth annual ACM-SIAM symposium on discrete algorithms, pp 47–66
Lever G, Diethe T, Shawe-Taylor J (2012) Data dependent kernels in nearly-linear time. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 685–693
Ionescu C, Popa A, Sminchisescu C (2017) Large-scale data-dependent kernel approximation. In: Proceedings of the 20th international conference on artificial intelligence and statistics, pp 19–27
Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141
Article MATH Google Scholar
Ting KM, Washio T, Wells JR, Zhang H (2021) Isolation kernel density estimation. In: Proceedings of IEEE international conference on data mining, pp 619–628

Download references

Acknowledgements

The preliminary version of this paper was reported in 2021 IEEE International Conference on Data Mining [45], and it presents the work on kernel density estimation only. Kai Ming Ting was supported by Natural Science Foundation of China (62076120).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Kai Ming Ting & Hang Zhang
The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
Takashi Washio
School of Information Technology, Deakin University, Burwood, VIC, Australia
Jonathan Wells & Ye Zhu

Authors

Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Washio
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Wells
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Ming Ting.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experimental settings

The experiments ran on a Linux CPU machine: AMD 16-core CPU with each core running at 2 GHz and 32 GB RAM. The IK feature mapping ran on a machine having GPU: 2 x GTX 1080 Ti with 3584 (1.6 GHz) CUDA cores & 12 GB graphic memory and CPU: i9-7900X 3.30 GHz processor (20 cores), 64 GB RAM. Only the feature mapping of IK was conducted in GPU. Other processes in IKDE were run in CPU.

We have used the following codes:

* Isolation Kernel:

https://github.com/IsolationKernel/Codes/tree/main/IDK/IKDE to build IKDE;

* LOF:

https://github.com/scikit-learn/scikit-learn/blob/15a949460/sklearn/neighbors/_lof.py.

* Other codes are from the respective authors.

The reported results for randomized algorithms, IKDE, RACE and HBE are averaged over five trials. The results of the deterministic algorithms, e.g., OKDE and KDEOS, are obtained from a single trial.

All datasets are normalized to [0, 1] in the preprocessing.

Sources of datasets used for anomaly detection are:

* https://lapad-web.icmc.usp.br/repositories/outlier-evaluation/DAMI,

* https://odds.cs.stonybrook.edu, https://sites.google.com/site/gspangsite,

* https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

The parameter search ranges are shown in Table 6.

Table 6 Parameter search ranges

Full size table

Appendix B: Proof of Theorem 1

Proof

Since all $y \in Y$ is $y \sim {\mathcal {P}}_Y$, we derive the following expectations.

$$\begin{aligned} {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)]&= \int _{{\mathcal {X}}_Y} \rho _Y(y) \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \rho _Y(y)\,\mathrm{d}y\\ {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2]&= \int _{{\mathcal {X}}_Y} \rho _Y(y) \left\{ \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r \right\} ^2\,\mathrm{d}y\\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \left( \int _{\theta [z \vert x]} \delta (r-y)dr\right) ^2\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H')\\&\quad \times \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\int _{\theta '[z \vert x]} \delta (r'-y)\,\mathrm{d}r'\,\mathrm{d}y \\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H') \int _{{\mathcal {X}}_Y} \rho _Y(y) \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} \delta (r-y)\,\mathrm{d}r\,\mathrm{d}y \\&= \frac{\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} \rho _Y(y)\,\mathrm{d}y\\&\quad + \frac{2\psi ^2}{ \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H') \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} \rho _Y(y)\,\mathrm{d}y \end{aligned}$$

By Lipschitz continuity of $\rho _Y(y)$ and $x,y \in \theta [z]$, the following relation holds in $\theta [z]$.

$$\begin{aligned} \rho _Y(y)-\rho _Y(x) \le \vert \rho _Y(y)-\rho _Y(x) \vert \le L_Y \vert y-x \vert \le 2 L_Y R_{{\textit{max}}}(\psi ). \end{aligned}$$

(B1)

These expressions give the following bias and variance of ${\bar{f}}_{\psi }(x \vert Y)$.

$$\begin{aligned} {\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]&= {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)] - \rho _Y(x)\\&= {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)] - \frac{\psi }{ \vert {\mathcal {X}}_U \vert }{{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\ \left| \theta [z \vert x]\right| ]\rho _Y(x)\\&= \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} (\rho _Y(y) - \rho _Y(x))\,\mathrm{d}y\\&\le \frac{\psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} 2 L_Y R_{{\textit{max}}}(\psi )\,\mathrm{d}r\\&= \frac{2 L_Y \psi }{ \vert {\mathcal {X}}_U \vert } \sum _{H \in \mathbb {H}_\psi (U)} p(H) R_{{\textit{max}}}(\psi ) \left| \theta [z \vert x]\right| \\&\le \frac{2 L_Y \psi }{ \vert {\mathcal {X}}_U \vert } {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )] {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ]\\&= 2 L_Y {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]. \end{aligned}$$

The second line is from (5) and the homogeneity of $\theta [z]$, the fourth line is from (B1), and the last two lines are from Cauchy–Schwarz inequality and (5).

$$\begin{aligned} {\textit{Var}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]&= \frac{1}{ n } ({{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2] - {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)]^2)\\&\le \frac{1}{ n } {{\mathbb {E}}}_{\rho _Y}[{\bar{\kappa }}_{\psi }( x, y\vert U)^2]\\&\le \frac{\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} \sum _{H \in \mathbb {H}_\psi (U)} p(H) \int _{\theta [z \vert x]} (\rho _Y(x) + 2 L_Y R_{{\textit{max}}}(\psi ))\,\mathrm{d}y \\&\quad + \frac{2\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} \sum _{H,H' \in \mathbb {H}_\psi (U)} p(H)p(H')\\&\quad \times \int _{\theta [z \vert x] \cap \theta '[z' \vert x]} (\rho _Y(x) + 2 L_Y R_{{\textit{max}}}(\psi ))\,\mathrm{d}y\\&= \frac{\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ] \\&\quad + \frac{2\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\quad + \frac{2 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )\left| \theta [z \vert x]\right| ]\\&\quad + \frac{4 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\le \frac{\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ] \\&\quad + \frac{2\rho _Y(x)\psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&\quad + \frac{2 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]{{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[\left| \theta [z \vert x]\right| ]\\&\quad + \frac{4 L_Y \psi ^2}{ n \vert {\mathcal {X}}_U \vert ^2} {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]{{\mathbb {E}}}_{\mathbb {H}_\psi (U) \times \mathbb {H}_\psi (U)}[\left| \theta [z \vert x] \cap \theta '[z' \vert x]\right| ]\\&= \frac{\psi +2}{ n \vert {\mathcal {X}}_U \vert } (\rho _Y(x) + 2 L_Y {{\mathbb {E}}}_{\mathbb {H}_\psi (U)}[R_{{\textit{max}}}(\psi )]). \end{aligned}$$

The inequality of the third to fifth lines is from (B1). The last two lines are from Cauchy–Schwarz inequality, (5) and Proposition 1.

By Proposition 2, ${\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)] \rightarrow 0$ if $\psi \rightarrow \infty $. Accordingly, both ${\textit{Bias}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]$ and ${\textit{Var}}_\psi [{\bar{f}}_{\psi }(x \vert Y)]$ converge on 0 if $ n \rightarrow \infty $, $\psi \propto n^\alpha $ and $0<\alpha <1$. $\square $

Appendix C: Analysis of consistency of IKR $_V$

The Nadaraya–Watson-type kernel regression for one-dimensional data space is known to be consistent [37] if its kernel function $\kappa (x,y)$ is finite, absolutely integrable and normalized and satisfies:

$$\begin{aligned} \lim _{\vert x-y \vert \rightarrow \infty } \vert (x-y)\kappa (x,y) \vert = 0, \end{aligned}$$

(C2)

and its kernel bandwidth h depends on the size n of the dataset W as

$$\begin{aligned} \lim _{n \rightarrow \infty } h(n) = 0, \end{aligned}$$

(C3)

and

$$\begin{aligned} \lim _{n \rightarrow \infty } n h(n) = \infty . \end{aligned}$$

(C4)

This consistency of the kernel regression holds for both fixed and variable kernel bandwidths as far as h satisfies the last two conditions. All of GKR, GKR$_N$ and GKROS satisfy these conditions, and thus, they are consistent.

The Voronoi diagram-based Isolation Kernel Regression (IKR$_V$) $\kappa _\psi (x,y \vert U)$, shown in Eq. (4), is finite and absolutely integrable and can be normalized. When we chose $\psi \propto n^\alpha $ ($0<\alpha <1$) as stated in Theorem 1, the upper bound of a Voronoi partition’s half size, i.e., $R_{{\textit{max}}}(\psi )$ on the one-dimensional data space is probabilistically bounded for large n. That is, for large $\psi $, Eqs. (6) and (7) give

$$\begin{aligned} \forall \epsilon > 0, P(R_{{\textit{max}}}(\psi ) \le \epsilon ) \simeq e^{-c \psi e^{- \psi \epsilon }}. \end{aligned}$$

Equation (2) means $\kappa _\psi (x,y \vert U) = P(x,y \in \theta [z] \vert \theta [z] \in H)$. Thus, $\kappa _\psi (x,y \vert U)$ is upper bounded by setting $\epsilon = \vert x-y \vert / 2$ as

$$\begin{aligned} \kappa _\psi (x,y \vert U) \le P(R_{{\textit{max}}}(\psi ) > \vert x-y \vert / 2) \simeq 1- e^{-c \psi e^{- \psi \vert x-y \vert / 2}}. \end{aligned}$$

Accordingly, the following formula holds.

$$\begin{aligned} \vert (x-y)\kappa _\psi (x,y \vert U)\vert \lesssim \vert x-y \vert (1- e^{-c \psi e^{- \psi \vert x-y \vert / 2}}). \end{aligned}$$

By applying L’Hospital’s rule three times, the r.h.s. is known to converge to zero in the limit $\vert x -y \vert \rightarrow \infty $. Thus, the Isolation Kernel also satisfies Eq. (C2).

From Proposition 2, the size of a Voronoi partition, which corresponds to h(n) upper bounded by $2R_{{\textit{max}}}(\psi )$, converges to zero if $n \rightarrow \infty $. Thus, the condition of Eq. (C3) holds.

Furthermore, let $R_{{\textit{min}}}(\psi )$ be the minimum inscribed radii of a Voronoi partition $\theta [z] \in H$ generated in a homogeneous Poisson point process of intensity $\psi / \vert {\mathcal {X}}_U \vert $. Calka and Chenavier [28] showed that

$$\begin{aligned} P(R_{{\textit{min}}}(\psi ) \ge \epsilon ) = e^{-c_3 \psi ^2 \epsilon } \end{aligned}$$

holds in the limit $\psi \rightarrow \infty $ for one-dimensional space, where $c_3$ is a positive constant. This fact implies

$$\begin{aligned} P(n R_{{\textit{min}}}(\psi ) \ge \delta ) = e^{-c_3 \psi ^2 \delta /n} \end{aligned}$$

in the limit $\psi \rightarrow \infty $. If we set $\psi \propto n^\alpha $ ($0<\alpha <1/2$) and make $n \rightarrow \infty $, $n R_{{\textit{min}}}(\psi ) \ge \delta $ holds for any large $\delta $ with probability one. Thus, $n R_{{\textit{min}}}(\psi ) \rightarrow \infty $ holds with probability one under $n \rightarrow \infty $. This fact implies Eq. (C4) because h(n) is lower bounded by $2R_{{\textit{min}}}(\psi )$.

The above analysis shows that the consistency of IKR$_V$ in one-dimensional data space is as good as the Gaussian kernel-based regression. The hypersphere-based IKR$_S$ may also have the similar characteristic.

Current analyses on the behavior of the Nadaraya–Watson-type kernel regression for high-dimensional data space are limited in scope. For example, a recent analysis [38] derives an upper bound of the regression bias only. We conjecture that the consistency of kernel regression in high-dimensional data space is similar to the one-dimensional case.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ting, K.M., Washio, T., Wells, J. et al. Isolation Kernel Estimators. Knowl Inf Syst 65, 759–787 (2023). https://doi.org/10.1007/s10115-022-01765-7

Download citation

Received: 23 February 2022
Revised: 12 September 2022
Accepted: 19 September 2022
Published: 01 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10115-022-01765-7

Abstract

Access this article

Subscribe and save

Buy Now