Abstract
Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. The accuracy of the outlier detection depends on sufficient data. However, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. Unfortunately, competitive advantage, privacy concerns and regulations, and issues surrounding data sovereignty and jurisdiction prevent many organizations from openly sharing their data. In this paper, we address precisely this issue. We present new and efficient protocols for privacy preserving outlier detection to find outliers from arbitrarily partitioned categorical data. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who detects on the joint data using secure two-party computation (2PC). Our method is based on Local Distance-based Outlier Factor (LDOF) using the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We provide the privacy guarantee by using secure multiparty computation techniques. We implement our system in C++ on real data. Our experiments validate that our protocols are both effective and efficient.
Supported by the National Key Research and Development Program of China under Grant 2016YFB0800601, and the Key Program of NSFC-Tongyong Union Foundation under Grant U1636209.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vaidya, J., Clifton, C., Kantarcioglu, M.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4, 28–34 (2002)
Clifton, C., Kantarcioglu, M.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 1026–1037 (2004)
Clifton, C., Vaidya, J.: Privacy preserving association rule mining in vertically partitioned data. In: 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD02), pp. 639–644 (2002)
Wright, R., Jagannathan, G.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD05), pp 593–599 (2005)
Ghosh, J., Merugu, S.: Privacy-preserving distributed clustering using generative models. In: 3rd IEEE International Conference on Data Mining (ICDM03), pp. 211–218 (2003)
Clifton, C., Vaidya, J.: Privacy-preserving k-means clustering over vertically partitioned data. In: 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp 206–215 (2003)
Srikant, R., Agrawal, R.: Privacy-preserving data mining. In: ACMSIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 439–450 (2000)
Pinkas, B., Lindell, Y.: Privacy preserving data mining. In: 20th Annual International Cryptology Conference (CRYPTO 2000), pp. 36–54 (2000)
Clifton, C., Vaidya, J.: Privacy preserving naive Bayes classifier for vertically partitioned data. In: SIAM International Conference on Data Mining (SDM 2004), pp. 522–526 (2004)
Wang, S., Zhao, W., Zhang, N.: A new scheme on privacy-preserving data classification. In: 11th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005), pp. 374–383 (2005)
Aggarwal, C.C.: Outlier Analysis. Springer-Verlag, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2
Rastogi, R., Shim, K., Ramaswame, S.: Efficient algorithms formining outliers from large data sets. In: ACMSIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 427–438 (2000)
Atallah, M., Du, W.: Privacy-preserving cooperative statistical analysis. In: 17th Annual Computer Security Applications Conference (ACSAC 2001), pp. 102–110 (2001)
Syverson, P., Goldschlag, D., Reed, M.: Onion routing. Commun. ACM 42, 39–41 (1999)
Vaidya, J., Clifton, C.: Privacy-preserving outlier detection. In: 4th IEEE International Conference on Data Mining, pp. 233–240 (2004)
Asif, H., Talukdar, T., Vaidya, J.: Differentially private outlier detection in a collaborative environment. Int. J. Coop. Inf. Syst. 27(03), 1850005 (2018)
Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., Boneh, D.: Privacy-preserving matrix factorization. In: ACM SIGSAC Conference on Computer Communications Security, pp. 801–812 (2013)
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: IEEE Symposium on Security and Privacy, pp. 334–348 (2013)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Lindell, Y., Pinkas, B.: A proof of security of Yaos protocol for two-party computation. J. Cryptology 22, 161–188 (2009)
Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4_9
Schneider, T., Asharov, G., Lindell, Y., Zohner, M.: More efficient oblivious transfer and extensions for faster secure computation. In: ACM CCS 2013, pp. 535–548 (2013)
Nielsen, J.B., Nordholt, P.S., Orlandi, C., Burra, S.S.: A new approach to practical active-secure two-party computation. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 681–700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_40
Demmler, D., Schneider, T., Zohner, M.: ABY-a framework for efficient mixed-protocol secure two-party computation. In: NDSS (2015)
Nisan, N., Pinkas, B., Sella, Y., Malkhi, D., et al. : Fairplay a secure two-party computation system. In: 13th Conference on USENIX Security Symposium, vol. 13, pp. 20–20 (2004)
Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1_34
Kamara, S., Mohassel, P., Raykova, M.: Outsourcing multiparty computation. IACR Cryptology ePrint Archive 272 (2011)
Canetti, R.: Universally composable security: a new paradigm for cryptographic protocols. In: 42nd IEEE Symposium on Foundations of Computer Science (FOCS 2001), pp. 136–145 (2001)
Zhang, Y., Mohassel, P.: SecureML: a system for scalable privacy-preserving machine learning. In: IEEE Symposium on Security and Privacy (SP), pp. 19–38 (2017)
GMP library. https://gmplib.org
NTL library. http://www.shoup.net/ntl
Acknowledgments
This work is supported by the National Key Research and Development Program of China under Grant 2016YFB0800601, and the Key Program of NSFC-Tongyong Union Foundation under Grant U1636209.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, Z., Pei, Q., Liu, X., Ma, L. (2019). Efficient Privacy Preserving Cross-Datasets Collaborative Outlier Detection. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-37352-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37351-1
Online ISBN: 978-3-030-37352-8
eBook Packages: Computer ScienceComputer Science (R0)