[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Parallel outlier detection on uncertain data for GPUs

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Outlier detection, also known as anomaly detection, is a common data mining task in identifying data points that are outside expected patterns in a given dataset. It has useful applications such as network intrusion, system faults, and fraudulent activity. In addition, real world data are uncertain in nature and they may be represented as uncertain data. In this paper, we propose an improved parallel algorithm for outlier detection on uncertain data using density sampling and develop an implementation running on both GPUs and multi-core CPUs, using the OpenCL framework. Our main focus is on GPUs, as they are a cost effective massively parallel floating point processor that is suitable for many data mining applications. Our implementation exploits some key features in GPUs, and is significantly different from a traditional CPU implementation. We first present an improved uncertain outlier detection algorithm. Then, we demonstrate two parallel micro-clustering implementations. The performance and detection quality comparisons demonstrate the benefits of the improved algorithm and parallel implementation on GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Acklam, P.J.: An algorithm for computing the inverse normal cumulative distribution function. Tech. Rep. (2003)

  2. Advanced Micro Devices Inc: AMD accelerated parallel processing opencl programming guide

  3. Aggarwal, C.C. (ed.): Managing and Mining Uncertain Data. Springer, New York, NY (2009)

  4. Aggarwal, C.C., Yu, P.S.: Outlier detection with uncertain data. In: Proceedings of the SIAM International Conference on Data Mining 2008 (2008)

  5. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE TKDE 21(5), 609–623 (2009)

    Google Scholar 

  6. Alshawabkeh, M., Jang, B., Kaeli, D.: Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (2010)

  7. Azmandian, F., Yilmazer, A., Dy, J.G., Aslam, J.A., Kaeli, D.R.: GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In: Proceedings of the 12th IEEE ICDM (2012)

  8. Bastke, S., Deml, M., Schmidt, S.: Combining statistical network data, probabilistic neural networks and the computational power of GPUs for anomaly detection in computer networks. In: 1st Workshop on Intelligent Security (Security and Artificial Intelligence) (2009)

  9. Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17(3), 235–255 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of SIGMOD 2000 (2000)

  11. Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Proceedings of the 10th PAKDD (2006)

  12. Denoeux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE TKDE 25(1), 119–130 (2013)

    Google Scholar 

  13. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)

  14. Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection of outliers. IEEE TKDE 18(2), 145–160 (2006)

    Google Scholar 

  15. Fang, W., Lau, K.K., Lu, M., Xiao, X., Lam, C.K., Yang, P.Y., He, B., Luo, Q., Sander, P.V., Yang, K.: Parallel data mining on graphics processors. Tech. Rep., Hong Kong University of Science and Technology (2008)

  16. He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2007)

  17. Hung, E., Cheung, D.W.: Parallel mining of outliers in large database. Distrib. Parallel Databases 12(1), 5–26 (2002)

    Article  Google Scholar 

  18. Kao, B., Lee, S.D., Cheung, D.W., Ho, W.S., Chan, K.F.: Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th IEEE ICDM (2008)

  19. Khronos Group: OpenCL. http://www.khronos.org/opencl/ (2011). Accessed 9 October 2012

  20. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of VLDB 1998 (1998)

  21. Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of VLDB 1999, pp. 211–222 (1999)

  22. Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD (2005)

  23. Kriegel, H.P., Pfeifle, M.: Heirarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE ICDM (2005)

  24. Krulis, M., Skopal, T., Lokoc, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30, 179–207 (2012)

    Article  MATH  Google Scholar 

  25. Lan, Z., Zheng, Z., Li, Y.: Toward automated anomaly identification in large-scale systems. IEEE TPDS 21(2), 174–187 (2010)

    Google Scholar 

  26. Lozano, E., Acuna, E.: Parallel algorithms for distance-based and density-based outliers. In: Proceedings of the 5th IEEE ICDM (2005)

  27. Lu, M., Tan, Y., Bai, G., Luo, Q.: High-performance short sequence alignment with GPU acceleration. Distrib. Parallel Databases 30, 385–399 (2012)

    Article  Google Scholar 

  28. Marsaglia, G.: Xorshift RNGs. J. Stat. Softw. 8(14), 1–6 (2003)

    Google Scholar 

  29. Matsumoto, T., Hung, E.: Accelerating outlier detection with uncertain data using graphics processors. In: Advances in Knowledge Discovery and Data Mining, vol. LNCS 7302, pp. 169–180 (2012)

  30. Micikevicius, P.: Analysis-driven optimization. In: GPU Technology Conference 2010 (2010)

  31. Murakami, T., Kasahara, R., Saito, T.: An implementation and its evaluation of password cracking tool parallelized on GPGPU. In: Proceedings of the 2010 International Symposium on Communications and Information Technologies (2010)

  32. Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: Proceedings of the 6th IEEE ICDM (2006)

  33. NVIDIA Corporation: CUDA. http://www.nvidia.com/object/cuda_home_new.html (2011). Accessed 9 October 2012

  34. Reif, M., Goldstein, M., Stahl, A.: Anomaly detection by combining decision trees and parametric densities. In: 19th International Conference on Pattern Recognition 2008 (2008)

  35. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD (2000)

  36. Sequeria, K., Zaki, M.: ADMIT: Anomaly-based data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD (2002)

  37. Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl. Inf. Syst. 11(1), 45–84 (2006)

    Article  Google Scholar 

  38. Tarabalka, Y., Haavardsholm, T.V., Kaasen, I., Skauli, T.: Real-time anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing. J. Real-Time Image Process. 4(3), 287–300 (2009)

    Article  Google Scholar 

  39. Wang, L., Cheung, D.W.L., Cheng, R., Lee, S.D., Yang, X.S.: Efficient mining of frequent item sets on large uncertain databases. IEEE TKDE 24(12), 2170–2183 (2012)

    Google Scholar 

  40. Zhanchun, G., Yuying, L.: Improving the collaborative filtering recommender system by using GPU. In: Proceedings of 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (2012)

  41. Zhang, Y., Lin, X., Tao, Y., Zhang, W., Wang, H.: Efficient computation of range aggregates against uncertain location-based queries. IEEE TKDE 24(7), 1244–1258 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takazumi Matsumoto.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matsumoto, T., Hung, E. & Yiu, M.L. Parallel outlier detection on uncertain data for GPUs. Distrib Parallel Databases 33, 417–447 (2015). https://doi.org/10.1007/s10619-014-7155-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-014-7155-9

Keywords

Navigation