[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Model-based exception mining for object-relational data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The conditions under which the score of Eq. (2) sums to 1 over all relational data sets are discussed by Schulte (2011). Briefly, if the first-order Bayesian Network is viewed as a template for an unrolled ground network, then (1) the ground network cannot contain cycles, and (2) each ground network cannot have multiple parent instantiations; see also (Heckerman et al. 2007).

  2. We are indebted to Daniel Lowd for this point.

References

  • Achtert E, Kriegel H-P, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel coordinate trees. In: Proceedings ACM special interest group on management of data, New York, NY, USA, pp 1009–1012. https://doi.org/10.1145/2463676.2463696

  • Aggarwal CC (2013) Outlier analysis. Springer, New York. ISBN 9781461463955. http://books.google.ca/books?id=900CkgEACAAJ

  • Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 410–421. https://doi.org/10.1007/978-3-642-13672-6_40

  • Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688

    MathSciNet  Google Scholar 

  • Albert J, Glickman ME, Swartz TB, Koning RH (2017) Handbook of statistical methods and analyses in sports. CRC Press, Boca Raton

    Google Scholar 

  • Anderson G, Pfahringer B (2008) Exploiting propositionalization based on random relational rules for semi-supervised learning. In: Proceedings Pacific-Asia conference on knowledge discovery and data mining, pp 494–502. https://doi.org/10.1007/978-3-540-68125-0_43

  • Angiulli F, Greco G, Palopoli L (2004) Outlier detection by logic programming. ACM Trans Comput Logic 9(1(7)):7

    MathSciNet  MATH  Google Scholar 

  • Beirlant J, Györfi L, Lugosi G (1994) On the asymptotic normality of the L1-and L2-errors in histogram density estimation. Can J Stat 22(3):309–318

    MATH  Google Scholar 

  • Beirlant J, Devroye L, Györfi L, Vajda I (2001) Large deviations of divergence measures on partitions. J Stat Plan Inference 93(1–2):1–16

    MathSciNet  MATH  Google Scholar 

  • Breunig M, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings ACM special interest group on management of data, pp 93–104. https://doi.org/10.1145/342009.335388

  • Cansado A, Soto A (2008) Unsupervised anomaly detection in large databases using Bayesian networks. Appl Artif Intell 22:309–330. https://doi.org/10.1080/08839510801972801 ISSN 0883-9514

    Article  Google Scholar 

  • de Campos L (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7:2149–2187

    MathSciNet  MATH  Google Scholar 

  • Domingos P, Lowd D (2009) Markov logic: an interface layer for artificial intelligence. Morgan and Claypool Publishers, San Francisco

    MATH  Google Scholar 

  • Domingos P, Richardson M (2007) Markov logic: a unifying framework for statistical relational learning. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge

    Google Scholar 

  • Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining. Data Min Knowl Discov 30(1):47–98

    MathSciNet  MATH  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  MathSciNet  Google Scholar 

  • Fisher RA (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32

    Google Scholar 

  • Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings ACM special interest group on knowledge discovery and data mining, New York, NY, USA, pp 813–822. ACM. ISBN 978-1-4503-0055-1. https://doi.org/10.1145/1835804.1835907

  • Garcia-del Barrio P, Pujol F (2004) Pay and performance in the Spanish soccer league: who gets the expected monopsony rents? Faculty Working Papers 05/04, School of Economics and Business Administration, University of Navarra, March 2004. https://ideas.repec.org/p/una/unccee/wp0504.html

  • Getoor L (2001) Learning statistical models from relational data. PhD thesis, Department of Computer Science, Stanford University

  • Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Hall S, Szymanski S, Zimbalist AS (2002) Testing causality between team performance and payroll: the cases of Major League Baseball and English Soccer. J Sports Econ 3(2):149–168

    Google Scholar 

  • Halpern JY (1990) An analysis of first-order logics of probability. Artif Intell 46(3):311–350. https://doi.org/10.1016/0004-3702(90)90019-V

    Article  MathSciNet  MATH  Google Scholar 

  • Heckerman D, Meek C, Koller D (2007) Probabilistic entity-relationship models, PRMs, and plate models. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge

    Google Scholar 

  • Horváth T, Alexin Z, Gyimóthy T, Wrobel S (1999) Application of different learning methods to Hungarian part-of-speech tagging. In: Dzeroski S, Flach P (eds) Inductive logic programming: 9th international workshop. ILP-99 Bled. Springer, Berlin, pp 128–139

  • Horváth T, Wrobel S, Bohnebeck U (2001) Relational instance-based learning with lists and terms. Mach Learn 43(1):53–80 ISSN 1573-0565

    MATH  Google Scholar 

  • Khosravi H, Man T, Hu J, Gao E, Mar R, Schulte O (2019) Factorbase code. https://github.com/sfu-ml-lab/FactorBase. Accessed 15 Nov 2016

  • Khot T, Natarajan S, Shavlik JW (2014) Relational one-class classification: a non-parametric approach. In: Proceedings association for the advancement of artificial intelligence, Quebec City, Quebec, Canada, pp 2453–2459. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8578. Accessed 10 Dec 2017

  • Kimmig A, Mihalkova L, Getoor L (2014) Lifted graphical models: a survey. Mach Learn 99(1):1–45

    MathSciNet  MATH  Google Scholar 

  • Kirsten M, Wrobel S, Horváth T (2001) Distance-based approaches to relational learning and clustering. In: Dzeroski S, Lavrac N (eds) Relational data mining. Springer, Berlin, pp 213–232

    Google Scholar 

  • Knobbe AJ (2006) Multi-relational data mining, vol 145. IOS Press, Amsterdam

    MATH  Google Scholar 

  • Koh JLY, Lee ML, Hsu W, Ang WT (2008) Correlation-based attribute outlier detection in XML. In: Proceedings international council for open and distance education, Cancun, Mexico. IEEE, pp 1522–1524. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4492792

  • Koller D, Pfeffer A (1997) Object-oriented Bayesian networks. In: Geiger D, Shenoy PP (eds) Proceedings uncertainty in artificial intelligence. Morgan Kaufmann, Burlington, pp 302–313. arXiv:1302.1554

  • Kramer S, Lavrac N, Flach P (2000) Propositionalization approaches to relational data mining. In: Dzeroski S (ed) Relational data mining. Springer, Berlin, pp 262–286

  • Kuzelka O, Zeleznỳ F (2008) Hifi: tractable propositionalization through hierarchical feature construction. In: Late breaking papers, inductive logic programming, p 69

  • Liu G, Schulte O (2018) Deep reinforcement learning in ice hockey for context-aware player evaluation. In: Proceedings international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, pp 3442–3448. https://doi.org/10.24963/ijcai.2018/478

  • Maervoet J, Vens C, Berghe GV, Blockeel H, Causmaecker PD (2012) Outlier detection in relational data: a case study in geographical information systems. Expert Syst Appl 39(5):4718–4728. https://doi.org/10.1016/j.eswa.2011.09.125 ISSN 0957-4174

    Article  Google Scholar 

  • Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings international conference on data mining (ICDM), pp 529–538

  • Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33. https://doi.org/10.1109/JPROC.2015.2483592

    Article  Google Scholar 

  • Nielsen F, Nock R (2014) On the chi square and higher-order Chi distances for approximating f-divergences. IEEE Signal Process Lett 21(1):10–13

    Google Scholar 

  • Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403

    MATH  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  • Peralta V (2007) Extraction and integration of MovieLens and IMDb. Technical report. Alternative Project Delivery Methods

  • Perovsek M, Vavpetic A, Cestnik B, Lavrac N (2013) A wordification approach to relational data mining. In: Proceedings DS, lecture notes in computer science, pp 141–154. Springer, Singapore. https://doi.org/10.1007/978-3-642-40897-7_10

  • Poole D (2003) First-order probabilistic inference. In: Proceedings international joint conference on artificial intelligence, pp 985–991

  • Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings ACM special interest group on management of data, pp 427–438. https://doi.org/10.1145/342009.335437

  • Riahi F, Schulte O (2015a) Model-based outlier detection for object-relational data. In: Proceedings symposium series on computational intelligence. IEEE, pp 1590–1598. https://doi.org/10.1109/SSCI.2015.224

  • Riahi F, Schulte O (2015b) Codes and datasets. ftp://ftp.fas.sfu.ca/pub/cs/oschulte/CodesAndDatasets/. Accessed 15 Nov 2016

  • Riahi F, Schulte O (2016) Propositionalization for unsupervised outlier detection in multi-relational data. In: Proceedings international conference of the Florida artificial intelligence, Key Largo, Florida, USA, pp 448–453. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS16/paper/view/12786. Accessed 2 Jan 2017

  • Riedel S, Yao L, McCallum A, Marlin BM (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings annual conference of the North American Chapter of the Association for Computational Linguistics, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp 74–84. http://aclweb.org/anthology/N/N13/N13-1008.pdf

  • Routley K, Schulte O (2015) A Markov game model for valuing player actions in ice hockey. In: Proceedings uncertainty in artificial intelligence, pp 782–791

  • Sarawagi S, Agrawal R, Megiddo N (1998) Discovery-driven exploration of OLAP data cubes. In: Proceedings extending database technology, Valencia, Spain, pp 168–182. Springer, Berlin. https://doi.org/10.1007/BFb0100984

  • Schulte O (2011) A tractable pseudo-likelihood function for Bayesian networks applied to relational data. In: Proceedings society for industrial and applied mathematics, pp 462–473. https://doi.org/10.1137/1.9781611972818.40

  • Schulte O, Gholami S (2017) Locally consistent Bayesian network scores for multi-relational data. In: Proceedings international joint conference on artificial intelligence, Melbourne, Australia, pp 2693–2700. https://doi.org/10.24963/ijcai.2017/375

  • Schulte O, Khosravi H (2012) Learning graphical models for relational data via lattice search. Mach Learn 88(3):331–368

    MathSciNet  Google Scholar 

  • Schulte O, Routley K (2014) Aggregating predictions versus aggregating features for relational classification. In: Proceedings center for information-development management, Orlando, FL, USA, pp 121–128. IEEE. https://doi.org/10.1109/CIDM.2014.7008657

  • Schulte O, Khosravi H, Kirkpatrick A, Gao T, Zhu Y (2014) Modelling relational statistics with Bayesian networks. Mach Learn 94:105–125. https://doi.org/10.1007/s10994-013-5362-7

    Article  MathSciNet  MATH  Google Scholar 

  • Sing T, Sander O, Beerenwinkel N, Lengauer T (2012) ROCR: visualizing the performance of scoring classifiers. http://cran.r-project.org/package=ROCR. Accessed 15 Nov 2016

  • Sun Y, Han J, Zhao P (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings extending database technology, New York, NY, USA, pp 565–576. ACM

  • Tang G, Bailey J, Pei J, Dong G (2013) Mining multidimensional contextual outliers from categorical relational data. In: Proceedings scientific and statistical database management conference, pp 1171–1192. https://doi.org/10.3233/IDA-150764

  • Tuffery S (2011) Data mining and statistics for decision making. Wiley series in computational statistics. http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0470688297.html. Accessed 15 Nov 2016

  • Wang DZ, Michelakis E, Garofalakis M, Hellerstein JM (2008) BayesStore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings very large data bases. VLDB Endowment, pp 340–351. https://doi.org/10.14778/1453856.1453896. http://www.vldb.org/pvldb/1/1453896.pdf. Accessed 15 Nov 2016

  • Xiang R, Neville J (2011) Relational learning with one network: an asymptotic analysis. In: Proceedings artificial intelligence and statistics, pp 779–788. http://proceedings.mlr.press/v15/xiang11a/xiang11a.pdf. Accessed 15 Nov 2016

Download references

Acknowledgements

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (Grant No. 217331). We are indebted for helpful discussions to the participants at the 2018 Workshop on Statistical Relational AI. The action editor and reviewers for the Data Mining and Knowledge Discovery Journal provided extensive helpful comments. We thank Peter Flach for clarifying the connection with exceptional model mining.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Riahi.

Additional information

Responsible editor: Kristian Kersting, Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.

Appendix: Proofs

Appendix: Proofs

We prove the results about f-divergences from Sect. 6. We first review the known result that twice differentiable f-divergences are approximately scaled \(\chi ^{2}\)-divergences. Second we prove our new result that our new \({ ELD}\) metric is approximately total variation distance (plus a quadratic term with small impact).

Assume that the generatorfis twice differentiable forf-divergence\(I_{f}\). Then\(I_{f}(P_{1}||P_{2}) \approx \frac{f''(1)}{2} \chi ^{2}(P_{1}||P_{2}) = \frac{f''(1)}{2} \sum _{i=1}^{m} \frac{(P_{2}(v_{i})-P_{1}(v_{i}))^{2}}{P_{1}(v_{i})}\).

Proof

Truncating the Taylor series for f at \(i=2\), the f-divergence expression (9) becomes

$$\begin{aligned}&\sum _{i=1}^{m} P_{1}(v_{i}) \left[ f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) + \frac{f''(1)}{2} \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) ^{2}\right] \\&\quad = \sum _{i=1}^{m} P_{1}(v_{i}) \left[ f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) + \frac{f''(1)}{2} \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) ^{2}\right] \\&\quad = f'(1) \left[ \sum _{i=1}^{m} P_{2}(v_{i}) - \sum _{i=1}^{m} P_{1}(v_{i})\right] + \sum _{i=1}^{m} \frac{f''(1)}{2} \frac{(P_{2}(v_{i})-P_{1}(v_{i}))^{2}}{P_{1}(v_{i})} \\&\quad = 0 + \frac{f''(1)}{2} \chi ^{2}(P_{1}||P_{2}) \end{aligned}$$

\(\square \)

The first-order Taylor series approximation for \({ ELD}\) is total variation distance:

$$\begin{aligned} { ELD}(P_{1}||P_{2}) \approx TVD. \end{aligned}$$

Proof

Truncating the Taylor series for \(f=-\ln (u)\) at \(i=1\), the |f|-divergence expression (11) becomes

$$\begin{aligned}&\sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})>0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) - \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})<0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}-1\right) \nonumber \\&\quad = \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})>0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) \nonumber \\&\qquad - \sum _{i:f(\frac{P_{2}(v_{i})}{P_{1}(v_{i})})<0} P_{1}(v_{i}) f'(1) \left( \frac{P_{2}(v_{i})-P_{1}(v_{i})}{P_{1}(v_{i})}\right) \nonumber \\&\quad = - \sum _{i:P_{2}(v_{i})<P_{1}(v_{i})} P_{2}(v_{i})-P_{1}(v_{i}) + \sum _{i:P_{2}(v_{i})>P_{1}(v_{i})} P_{2}(v_{i})-P_{1}(v_{i})\nonumber \\&\quad = \sum _{i} |P_{2}(v_{i})-P_{1}(v_{i}) | \end{aligned}$$
(12)

Equation (12) holds for \(f = -\ln \) because then \(f\left( \frac{P_{2}(v_{i})}{P_{1}(v_{i})}\right) >0\) if and only if \(P_{2}(v_{i}) < P_{1}(v_{i})\) and also \(f'(1) = -1\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riahi, F., Schulte, O. Model-based exception mining for object-relational data. Data Min Knowl Disc 34, 681–722 (2020). https://doi.org/10.1007/s10618-020-00677-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00677-w

Keywords

Navigation