Model-based exception mining for object-relational data

453 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Outlying property detection with numerical attributes

Article 29 March 2016

Outliers Detection in Multi-label Datasets

A decomposition of the outlier detection problem into a set of supervised learning problems

Article 20 June 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The conditions under which the score of Eq. (2) sums to 1 over all relational data sets are discussed by Schulte (2011). Briefly, if the first-order Bayesian Network is viewed as a template for an unrolled ground network, then (1) the ground network cannot contain cycles, and (2) each ground network cannot have multiple parent instantiations; see also (Heckerman et al. 2007).
We are indebted to Daniel Lowd for this point.

References

Achtert E, Kriegel H-P, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel coordinate trees. In: Proceedings ACM special interest group on management of data, New York, NY, USA, pp 1009–1012. https://doi.org/10.1145/2463676.2463696
Aggarwal CC (2013) Outlier analysis. Springer, New York. ISBN 9781461463955. http://books.google.ca/books?id=900CkgEACAAJ
Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 410–421. https://doi.org/10.1007/978-3-642-13672-6_40
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
MathSciNet Google Scholar
Albert J, Glickman ME, Swartz TB, Koning RH (2017) Handbook of statistical methods and analyses in sports. CRC Press, Boca Raton
Google Scholar
Anderson G, Pfahringer B (2008) Exploiting propositionalization based on random relational rules for semi-supervised learning. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 494–502. https://doi.org/10.1007/978-3-540-68125-0_43
Angiulli F, Greco G, Palopoli L (2004) Outlier detection by logic programming. ACM Trans Comput Logic 9(1(7)):7
MathSciNet MATH Google Scholar
Beirlant J, Györfi L, Lugosi G (1994) On the asymptotic normality of the L1-and L2-errors in histogram density estimation. Can J Stat 22(3):309–318
MATH Google Scholar
Beirlant J, Devroye L, Györfi L, Vajda I (2001) Large deviations of divergence measures on partitions. J Stat Plan Inference 93(1–2):1–16
MathSciNet MATH Google Scholar
Breunig M, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings ACM special interest group on management of data, pp 93–104. https://doi.org/10.1145/342009.335388
Cansado A, Soto A (2008) Unsupervised anomaly detection in large databases using Bayesian networks. Appl Artif Intell 22:309–330. https://doi.org/10.1080/08839510801972801 ISSN 0883-9514
Article Google Scholar
de Campos L (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7:2149–2187
MathSciNet MATH Google Scholar
Domingos P, Lowd D (2009) Markov logic: an interface layer for artificial intelligence. Morgan and Claypool Publishers, San Francisco
MATH Google Scholar
Domingos P, Richardson M (2007) Markov logic: a unifying framework for statistical relational learning. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge
Google Scholar
Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining. Data Min Knowl Discov 30(1):47–98
MathSciNet MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article MathSciNet Google Scholar
Fisher RA (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32
Google Scholar
Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings ACM special interest group on knowledge discovery and data mining, New York, NY, USA, pp 813–822. ACM. ISBN 978-1-4503-0055-1. https://doi.org/10.1145/1835804.1835907
Garcia-del Barrio P, Pujol F (2004) Pay and performance in the Spanish soccer league: who gets the expected monopsony rents? Faculty Working Papers 05/04, School of Economics and Business Administration, University of Navarra, March 2004. https://ideas.repec.org/p/una/unccee/wp0504.html
Getoor L (2001) Learning statistical models from relational data. PhD thesis, Department of Computer Science, Stanford University
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
MATH Google Scholar
Hall S, Szymanski S, Zimbalist AS (2002) Testing causality between team performance and payroll: the cases of Major League Baseball and English Soccer. J Sports Econ 3(2):149–168
Google Scholar
Halpern JY (1990) An analysis of first-order logics of probability. Artif Intell 46(3):311–350. https://doi.org/10.1016/0004-3702(90)90019-V
Article MathSciNet MATH Google Scholar
Heckerman D, Meek C, Koller D (2007) Probabilistic entity-relationship models, PRMs, and plate models. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge
Google Scholar
Horváth T, Alexin Z, Gyimóthy T, Wrobel S (1999) Application of different learning methods to Hungarian part-of-speech tagging. In: Dzeroski S, Flach P (eds) Inductive logic programming: 9th international workshop. ILP-99 Bled. Springer, Berlin, pp 128–139
Horváth T, Wrobel S, Bohnebeck U (2001) Relational instance-based learning with lists and terms. Mach Learn 43(1):53–80 ISSN 1573-0565
MATH Google Scholar
Khosravi H, Man T, Hu J, Gao E, Mar R, Schulte O (2019) Factorbase code. https://github.com/sfu-ml-lab/FactorBase. Accessed 15 Nov 2016
Khot T, Natarajan S, Shavlik JW (2014) Relational one-class classification: a non-parametric approach. In: Proceedings association for the advancement of artificial intelligence, Quebec City, Quebec, Canada, pp 2453–2459. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8578. Accessed 10 Dec 2017
Kimmig A, Mihalkova L, Getoor L (2014) Lifted graphical models: a survey. Mach Learn 99(1):1–45
MathSciNet MATH Google Scholar
Kirsten M, Wrobel S, Horváth T (2001) Distance-based approaches to relational learning and clustering. In: Dzeroski S, Lavrac N (eds) Relational data mining. Springer, Berlin, pp 213–232
Google Scholar
Knobbe AJ (2006) Multi-relational data mining, vol 145. IOS Press, Amsterdam
MATH Google Scholar
Koh JLY, Lee ML, Hsu W, Ang WT (2008) Correlation-based attribute outlier detection in XML. In: Proceedings international council for open and distance education, Cancun, Mexico. IEEE, pp 1522–1524. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4492792
Koller D, Pfeffer A (1997) Object-oriented Bayesian networks. In: Geiger D, Shenoy PP (eds) Proceedings uncertainty in artificial intelligence. Morgan Kaufmann, Burlington, pp 302–313. arXiv:1302.1554
Kramer S, Lavrac N, Flach P (2000) Propositionalization approaches to relational data mining. In: Dzeroski S (ed) Relational data mining. Springer, Berlin, pp 262–286
Kuzelka O, Zeleznỳ F (2008) Hifi: tractable propositionalization through hierarchical feature construction. In: Late breaking papers, inductive logic programming, p 69
Liu G, Schulte O (2018) Deep reinforcement learning in ice hockey for context-aware player evaluation. In: Proceedings international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, pp 3442–3448. https://doi.org/10.24963/ijcai.2018/478
Maervoet J, Vens C, Berghe GV, Blockeel H, Causmaecker PD (2012) Outlier detection in relational data: a case study in geographical information systems. Expert Syst Appl 39(5):4718–4728. https://doi.org/10.1016/j.eswa.2011.09.125 ISSN 0957-4174
Article Google Scholar
Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings international conference on data mining (ICDM), pp 529–538
Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33. https://doi.org/10.1109/JPROC.2015.2483592
Article Google Scholar
Nielsen F, Nock R (2014) On the chi square and higher-order Chi distances for approximating f-divergences. IEEE Signal Process Lett 21(1):10–13
Google Scholar
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
MATH Google Scholar
Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington
MATH Google Scholar
Peralta V (2007) Extraction and integration of MovieLens and IMDb. Technical report. Alternative Project Delivery Methods
Perovsek M, Vavpetic A, Cestnik B, Lavrac N (2013) A wordification approach to relational data mining. In: Proceedings DS, lecture notes in computer science, pp 141–154. Springer, Singapore. https://doi.org/10.1007/978-3-642-40897-7_10
Poole D (2003) First-order probabilistic inference. In: Proceedings international joint conference on artificial intelligence, pp 985–991
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings ACM special interest group on management of data, pp 427–438. https://doi.org/10.1145/342009.335437
Riahi F, Schulte O (2015a) Model-based outlier detection for object-relational data. In: Proceedings symposium series on computational intelligence. IEEE, pp 1590–1598. https://doi.org/10.1109/SSCI.2015.224
Riahi F, Schulte O (2015b) Codes and datasets. ftp://ftp.fas.sfu.ca/pub/cs/oschulte/CodesAndDatasets/. Accessed 15 Nov 2016
Riahi F, Schulte O (2016) Propositionalization for unsupervised outlier detection in multi-relational data. In: Proceedings international conference of the Florida artificial intelligence, Key Largo, Florida, USA, pp 448–453. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS16/paper/view/12786. Accessed 2 Jan 2017
Riedel S, Yao L, McCallum A, Marlin BM (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings annual conference of the North American Chapter of the Association for Computational Linguistics, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp 74–84. http://aclweb.org/anthology/N/N13/N13-1008.pdf
Routley K, Schulte O (2015) A Markov game model for valuing player actions in ice hockey. In: Proceedings uncertainty in artificial intelligence, pp 782–791
Sarawagi S, Agrawal R, Megiddo N (1998) Discovery-driven exploration of OLAP data cubes. In: Proceedings extending database technology, Valencia, Spain, pp 168–182. Springer, Berlin. https://doi.org/10.1007/BFb0100984
Schulte O (2011) A tractable pseudo-likelihood function for Bayesian networks applied to relational data. In: Proceedings society for industrial and applied mathematics, pp 462–473. https://doi.org/10.1137/1.9781611972818.40
Schulte O, Gholami S (2017) Locally consistent Bayesian network scores for multi-relational data. In: Proceedings international joint conference on artificial intelligence, Melbourne, Australia, pp 2693–2700. https://doi.org/10.24963/ijcai.2017/375
Schulte O, Khosravi H (2012) Learning graphical models for relational data via lattice search. Mach Learn 88(3):331–368
MathSciNet Google Scholar
Schulte O, Routley K (2014) Aggregating predictions versus aggregating features for relational classification. In: Proceedings center for information-development management, Orlando, FL, USA, pp 121–128. IEEE. https://doi.org/10.1109/CIDM.2014.7008657
Schulte O, Khosravi H, Kirkpatrick A, Gao T, Zhu Y (2014) Modelling relational statistics with Bayesian networks. Mach Learn 94:105–125. https://doi.org/10.1007/s10994-013-5362-7
Article MathSciNet MATH Google Scholar
Sing T, Sander O, Beerenwinkel N, Lengauer T (2012) ROCR: visualizing the performance of scoring classifiers. http://cran.r-project.org/package=ROCR. Accessed 15 Nov 2016
Sun Y, Han J, Zhao P (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings extending database technology, New York, NY, USA, pp 565–576. ACM
Tang G, Bailey J, Pei J, Dong G (2013) Mining multidimensional contextual outliers from categorical relational data. In: Proceedings scientific and statistical database management conference, pp 1171–1192. https://doi.org/10.3233/IDA-150764
Tuffery S (2011) Data mining and statistics for decision making. Wiley series in computational statistics. http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0470688297.html. Accessed 15 Nov 2016
Wang DZ, Michelakis E, Garofalakis M, Hellerstein JM (2008) BayesStore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings very large data bases. VLDB Endowment, pp 340–351. https://doi.org/10.14778/1453856.1453896. http://www.vldb.org/pvldb/1/1453896.pdf. Accessed 15 Nov 2016
Xiang R, Neville J (2011) Relational learning with one network: an asymptotic analysis. In: Proceedings artificial intelligence and statistics, pp 779–788. http://proceedings.mlr.press/v15/xiang11a/xiang11a.pdf. Accessed 15 Nov 2016

Download references

Acknowledgements

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (Grant No. 217331). We are indebted for helpful discussions to the participants at the 2018 Workshop on Statistical Relational AI. The action editor and reviewers for the Data Mining and Knowledge Discovery Journal provided extensive helpful comments. We thank Peter Flach for clarifying the connection with exceptional model mining.

Author information

Authors and Affiliations

Simon Fraser University, Burnaby, Canada
Fatemeh Riahi & Oliver Schulte

Authors

Fatemeh Riahi
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Schulte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatemeh Riahi.

Additional information

Responsible editor: Kristian Kersting, Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.

Appendix: Proofs

We prove the results about f-divergences from Sect. 6. We first review the known result that twice differentiable f-divergences are approximately scaled $\chi ^{2}$-divergences. Second we prove our new result that our new ${ ELD}$ metric is approximately total variation distance (plus a quadratic term with small impact).

Assume that the generatorfis twice differentiable forf-divergence$I_{f}$. Then$I_{f}(P_{1}||P_{2}) \approx \frac{f''(1)}{2} \chi ^{2}(P_{1}||P_{2}) = \frac{f''(1)}{2} \sum _{i=1}^{m} \frac{(P_{2}(v_{i})-P_{1}(v_{i}))^{2}}{P_{1}(v_{i})}$.

Proof

Truncating the Taylor series for f at $i=2$, the f-divergence expression (9) becomes

$$\begin{aligned}&\sum _{i=1}^{m} P_{1}(v_{i}) \left[ f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) + \frac{f''(1)}{2} \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) ^{2}\right] \\&\quad = \sum _{i=1}^{m} P_{1}(v_{i}) \left[ f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) + \frac{f''(1)}{2} \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) ^{2}\right] \\&\quad = f'(1) \left[ \sum _{i=1}^{m} P_{2}(v_{i}) - \sum _{i=1}^{m} P_{1}(v_{i})\right] + \sum _{i=1}^{m} \frac{f''(1)}{2} \frac{(P_{2}(v_{i})-P_{1}(v_{i}))^{2}}{P_{1}(v_{i})} \\&\quad = 0 + \frac{f''(1)}{2} \chi ^{2}(P_{1}||P_{2}) \end{aligned}$$

$\square $

The first-order Taylor series approximation for ${ ELD}$ is total variation distance:

$$\begin{aligned} { ELD}(P_{1}||P_{2}) \approx TVD. \end{aligned}$$

Proof

Truncating the Taylor series for $f=-\ln (u)$ at $i=1$, the |f|-divergence expression (11) becomes

$$\begin{aligned}&\sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})>0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) - \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})<0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) \nonumber \\&\quad = \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})>0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) \nonumber \\&\qquad - \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})<0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) \nonumber \\&\quad = - \sum _{i:P_{2}(v_{i})<P_{1}(v_{i})} P_{2}(v_{i})-P_{1}(v_{i}) + \sum _{i:P_{2}(v_{i})>P_{1}(v_{i})} P_{2}(v_{i})-P_{1}(v_{i})\nonumber \\&\quad = \sum _{i} |P_{2}(v_{i})-P_{1}(v_{i}) | \end{aligned}$$

(12)

Equation (12) holds for $f = -\ln $ because then $f\left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}\right) >0$ if and only if $P_{2}(v_{i}) < P_{1}(v_{i})$ and also $f'(1) = -1$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riahi, F., Schulte, O. Model-based exception mining for object-relational data. Data Min Knowl Disc 34, 681–722 (2020). https://doi.org/10.1007/s10618-020-00677-w

Download citation

Received: 01 March 2016
Accepted: 31 January 2020
Published: 19 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10618-020-00677-w

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Outlying property detection with numerical attributes

Outliers Detection in Multi-label Datasets

A decomposition of the outlier detection problem into a set of supervised learning problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Model-based exception mining for object-relational data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Outlying property detection with numerical attributes

Outliers Detection in Multi-label Datasets

A decomposition of the outlier detection problem into a set of supervised learning problems

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation