Abstract
This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The conditions under which the score of Eq. (2) sums to 1 over all relational data sets are discussed by Schulte (2011). Briefly, if the first-order Bayesian Network is viewed as a template for an unrolled ground network, then (1) the ground network cannot contain cycles, and (2) each ground network cannot have multiple parent instantiations; see also (Heckerman et al. 2007).
We are indebted to Daniel Lowd for this point.
References
Achtert E, Kriegel H-P, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel coordinate trees. In: Proceedings ACM special interest group on management of data, New York, NY, USA, pp 1009–1012. https://doi.org/10.1145/2463676.2463696
Aggarwal CC (2013) Outlier analysis. Springer, New York. ISBN 9781461463955. http://books.google.ca/books?id=900CkgEACAAJ
Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 410–421. https://doi.org/10.1007/978-3-642-13672-6_40
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Albert J, Glickman ME, Swartz TB, Koning RH (2017) Handbook of statistical methods and analyses in sports. CRC Press, Boca Raton
Anderson G, Pfahringer B (2008) Exploiting propositionalization based on random relational rules for semi-supervised learning. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 494–502. https://doi.org/10.1007/978-3-540-68125-0_43
Angiulli F, Greco G, Palopoli L (2004) Outlier detection by logic programming. ACM Trans Comput Logic 9(1(7)):7
Beirlant J, Györfi L, Lugosi G (1994) On the asymptotic normality of the L1-and L2-errors in histogram density estimation. Can J Stat 22(3):309–318
Beirlant J, Devroye L, Györfi L, Vajda I (2001) Large deviations of divergence measures on partitions. J Stat Plan Inference 93(1–2):1–16
Breunig M, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings ACM special interest group on management of data, pp 93–104. https://doi.org/10.1145/342009.335388
Cansado A, Soto A (2008) Unsupervised anomaly detection in large databases using Bayesian networks. Appl Artif Intell 22:309–330. https://doi.org/10.1080/08839510801972801 ISSN 0883-9514
de Campos L (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7:2149–2187
Domingos P, Lowd D (2009) Markov logic: an interface layer for artificial intelligence. Morgan and Claypool Publishers, San Francisco
Domingos P, Richardson M (2007) Markov logic: a unifying framework for statistical relational learning. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge
Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining. Data Min Knowl Discov 30(1):47–98
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Fisher RA (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32
Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings ACM special interest group on knowledge discovery and data mining, New York, NY, USA, pp 813–822. ACM. ISBN 978-1-4503-0055-1. https://doi.org/10.1145/1835804.1835907
Garcia-del Barrio P, Pujol F (2004) Pay and performance in the Spanish soccer league: who gets the expected monopsony rents? Faculty Working Papers 05/04, School of Economics and Business Administration, University of Navarra, March 2004. https://ideas.repec.org/p/una/unccee/wp0504.html
Getoor L (2001) Learning statistical models from relational data. PhD thesis, Department of Computer Science, Stanford University
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
Hall S, Szymanski S, Zimbalist AS (2002) Testing causality between team performance and payroll: the cases of Major League Baseball and English Soccer. J Sports Econ 3(2):149–168
Halpern JY (1990) An analysis of first-order logics of probability. Artif Intell 46(3):311–350. https://doi.org/10.1016/0004-3702(90)90019-V
Heckerman D, Meek C, Koller D (2007) Probabilistic entity-relationship models, PRMs, and plate models. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge
Horváth T, Alexin Z, Gyimóthy T, Wrobel S (1999) Application of different learning methods to Hungarian part-of-speech tagging. In: Dzeroski S, Flach P (eds) Inductive logic programming: 9th international workshop. ILP-99 Bled. Springer, Berlin, pp 128–139
Horváth T, Wrobel S, Bohnebeck U (2001) Relational instance-based learning with lists and terms. Mach Learn 43(1):53–80 ISSN 1573-0565
Khosravi H, Man T, Hu J, Gao E, Mar R, Schulte O (2019) Factorbase code. https://github.com/sfu-ml-lab/FactorBase. Accessed 15 Nov 2016
Khot T, Natarajan S, Shavlik JW (2014) Relational one-class classification: a non-parametric approach. In: Proceedings association for the advancement of artificial intelligence, Quebec City, Quebec, Canada, pp 2453–2459. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8578. Accessed 10 Dec 2017
Kimmig A, Mihalkova L, Getoor L (2014) Lifted graphical models: a survey. Mach Learn 99(1):1–45
Kirsten M, Wrobel S, Horváth T (2001) Distance-based approaches to relational learning and clustering. In: Dzeroski S, Lavrac N (eds) Relational data mining. Springer, Berlin, pp 213–232
Knobbe AJ (2006) Multi-relational data mining, vol 145. IOS Press, Amsterdam
Koh JLY, Lee ML, Hsu W, Ang WT (2008) Correlation-based attribute outlier detection in XML. In: Proceedings international council for open and distance education, Cancun, Mexico. IEEE, pp 1522–1524. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4492792
Koller D, Pfeffer A (1997) Object-oriented Bayesian networks. In: Geiger D, Shenoy PP (eds) Proceedings uncertainty in artificial intelligence. Morgan Kaufmann, Burlington, pp 302–313. arXiv:1302.1554
Kramer S, Lavrac N, Flach P (2000) Propositionalization approaches to relational data mining. In: Dzeroski S (ed) Relational data mining. Springer, Berlin, pp 262–286
Kuzelka O, Zeleznỳ F (2008) Hifi: tractable propositionalization through hierarchical feature construction. In: Late breaking papers, inductive logic programming, p 69
Liu G, Schulte O (2018) Deep reinforcement learning in ice hockey for context-aware player evaluation. In: Proceedings international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, pp 3442–3448. https://doi.org/10.24963/ijcai.2018/478
Maervoet J, Vens C, Berghe GV, Blockeel H, Causmaecker PD (2012) Outlier detection in relational data: a case study in geographical information systems. Expert Syst Appl 39(5):4718–4728. https://doi.org/10.1016/j.eswa.2011.09.125 ISSN 0957-4174
Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings international conference on data mining (ICDM), pp 529–538
Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33. https://doi.org/10.1109/JPROC.2015.2483592
Nielsen F, Nock R (2014) On the chi square and higher-order Chi distances for approximating f-divergences. IEEE Signal Process Lett 21(1):10–13
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington
Peralta V (2007) Extraction and integration of MovieLens and IMDb. Technical report. Alternative Project Delivery Methods
Perovsek M, Vavpetic A, Cestnik B, Lavrac N (2013) A wordification approach to relational data mining. In: Proceedings DS, lecture notes in computer science, pp 141–154. Springer, Singapore. https://doi.org/10.1007/978-3-642-40897-7_10
Poole D (2003) First-order probabilistic inference. In: Proceedings international joint conference on artificial intelligence, pp 985–991
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings ACM special interest group on management of data, pp 427–438. https://doi.org/10.1145/342009.335437
Riahi F, Schulte O (2015a) Model-based outlier detection for object-relational data. In: Proceedings symposium series on computational intelligence. IEEE, pp 1590–1598. https://doi.org/10.1109/SSCI.2015.224
Riahi F, Schulte O (2015b) Codes and datasets. ftp://ftp.fas.sfu.ca/pub/cs/oschulte/CodesAndDatasets/. Accessed 15 Nov 2016
Riahi F, Schulte O (2016) Propositionalization for unsupervised outlier detection in multi-relational data. In: Proceedings international conference of the Florida artificial intelligence, Key Largo, Florida, USA, pp 448–453. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS16/paper/view/12786. Accessed 2 Jan 2017
Riedel S, Yao L, McCallum A, Marlin BM (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings annual conference of the North American Chapter of the Association for Computational Linguistics, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp 74–84. http://aclweb.org/anthology/N/N13/N13-1008.pdf
Routley K, Schulte O (2015) A Markov game model for valuing player actions in ice hockey. In: Proceedings uncertainty in artificial intelligence, pp 782–791
Sarawagi S, Agrawal R, Megiddo N (1998) Discovery-driven exploration of OLAP data cubes. In: Proceedings extending database technology, Valencia, Spain, pp 168–182. Springer, Berlin. https://doi.org/10.1007/BFb0100984
Schulte O (2011) A tractable pseudo-likelihood function for Bayesian networks applied to relational data. In: Proceedings society for industrial and applied mathematics, pp 462–473. https://doi.org/10.1137/1.9781611972818.40
Schulte O, Gholami S (2017) Locally consistent Bayesian network scores for multi-relational data. In: Proceedings international joint conference on artificial intelligence, Melbourne, Australia, pp 2693–2700. https://doi.org/10.24963/ijcai.2017/375
Schulte O, Khosravi H (2012) Learning graphical models for relational data via lattice search. Mach Learn 88(3):331–368
Schulte O, Routley K (2014) Aggregating predictions versus aggregating features for relational classification. In: Proceedings center for information-development management, Orlando, FL, USA, pp 121–128. IEEE. https://doi.org/10.1109/CIDM.2014.7008657
Schulte O, Khosravi H, Kirkpatrick A, Gao T, Zhu Y (2014) Modelling relational statistics with Bayesian networks. Mach Learn 94:105–125. https://doi.org/10.1007/s10994-013-5362-7
Sing T, Sander O, Beerenwinkel N, Lengauer T (2012) ROCR: visualizing the performance of scoring classifiers. http://cran.r-project.org/package=ROCR. Accessed 15 Nov 2016
Sun Y, Han J, Zhao P (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings extending database technology, New York, NY, USA, pp 565–576. ACM
Tang G, Bailey J, Pei J, Dong G (2013) Mining multidimensional contextual outliers from categorical relational data. In: Proceedings scientific and statistical database management conference, pp 1171–1192. https://doi.org/10.3233/IDA-150764
Tuffery S (2011) Data mining and statistics for decision making. Wiley series in computational statistics. http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0470688297.html. Accessed 15 Nov 2016
Wang DZ, Michelakis E, Garofalakis M, Hellerstein JM (2008) BayesStore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings very large data bases. VLDB Endowment, pp 340–351. https://doi.org/10.14778/1453856.1453896. http://www.vldb.org/pvldb/1/1453896.pdf. Accessed 15 Nov 2016
Xiang R, Neville J (2011) Relational learning with one network: an asymptotic analysis. In: Proceedings artificial intelligence and statistics, pp 779–788. http://proceedings.mlr.press/v15/xiang11a/xiang11a.pdf. Accessed 15 Nov 2016
Acknowledgements
This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (Grant No. 217331). We are indebted for helpful discussions to the participants at the 2018 Workshop on Statistical Relational AI. The action editor and reviewers for the Data Mining and Knowledge Discovery Journal provided extensive helpful comments. We thank Peter Flach for clarifying the connection with exceptional model mining.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Kristian Kersting, Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.
Appendix: Proofs
Appendix: Proofs
We prove the results about f-divergences from Sect. 6. We first review the known result that twice differentiable f-divergences are approximately scaled \(\chi ^{2}\)-divergences. Second we prove our new result that our new \({ ELD}\) metric is approximately total variation distance (plus a quadratic term with small impact).
Assume that the generatorfis twice differentiable forf-divergence\(I_{f}\). Then\(I_{f}(P_{1}||P_{2}) \approx \frac{f''(1)}{2} \chi ^{2}(P_{1}||P_{2}) = \frac{f''(1)}{2} \sum _{i=1}^{m} \frac{(P_{2}(v_{i})-P_{1}(v_{i}))^{2}}{P_{1}(v_{i})}\).
Proof
Truncating the Taylor series for f at \(i=2\), the f-divergence expression (9) becomes
\(\square \)
The first-order Taylor series approximation for \({ ELD}\) is total variation distance:
Proof
Truncating the Taylor series for \(f=-\ln (u)\) at \(i=1\), the |f|-divergence expression (11) becomes
Equation (12) holds for \(f = -\ln \) because then \(f\left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}\right) >0\) if and only if \(P_{2}(v_{i}) < P_{1}(v_{i})\) and also \(f'(1) = -1\). \(\square \)
Rights and permissions
About this article
Cite this article
Riahi, F., Schulte, O. Model-based exception mining for object-relational data. Data Min Knowl Disc 34, 681–722 (2020). https://doi.org/10.1007/s10618-020-00677-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00677-w