[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

An Effective Method for Identifying Unknown Unknowns with Noisy Oracle

  • Conference paper
  • First Online:
Case-Based Reasoning Research and Development (ICCBR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11156))

Included in the following conference series:

  • 1193 Accesses

Abstract

Unknown Unknowns (UUs) are referred to the error predictions that with high confidence. The identifying of the UUs is important to understand the limitation of predictive models. Some proposed solutions are effective in such identifying. All of them assume there is a perfect Oracle to return the correct labels of the UUs. However, it is not practical since there is no perfect Oracle in real world. Even experts will make mistakes in UUs labelling. Such errors will lead to the terrible consequence since fake UUs will mislead the existing algorithms and reduce their performance. In this paper, we identify the impact of noisy Oracle and propose a UUs identifying algorithm that can be adapted to the setting of noisy Oracle. Experimental results demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://aiweb.cs.washington.edu/ai/unkunk18/.

  2. 2.

    https://www.kaggle.com/c/dogs-vs-cats/data.

  3. 3.

    http://scikit-learn.org/.

  4. 4.

    Actually, \(\tau \) is a parameter worth discussing, and different thresholds will construct different search spaces. However, we tried several candidate values such as 0.70 and 0.75 in our experiments, and the results basically consistent, so we use the value in previous works [3, 14] without further discussion.

  5. 5.

    https://en.wikipedia.org/wiki/Elbow_method_(clustering).

References

  1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Man, D.: Concrete problems in ai safety (2016)

    Google Scholar 

  2. Attenberg, J., Ipeirotis, P., Provost, F.: Beat the machine: challenging humans to find a predictive model’s unknown unknowns. J. Data Inf. Qual. (JDIQ) 6(1), 1 (2015)

    Article  Google Scholar 

  3. Bansal, G., Weld, D.S.: A coverage-based utility model for identifying unknown unknowns. In: AAAI (2018)

    Google Scholar 

  4. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp. 187–205 (2007)

    Google Scholar 

  5. Chandola, V., Banerjee, A., Kumar, V.: Outlier detection: a survey. ACM Comput. Surv. (2007)

    Google Scholar 

  6. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)

    Article  Google Scholar 

  7. Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)

    Google Scholar 

  8. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML, pp. 513–520 (2011)

    Google Scholar 

  9. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Burlington (2011)

    MATH  Google Scholar 

  10. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  11. Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: NAACL HLT Workshop on Active Learning and NLP, pp. 27–35. Association for Computational Linguistics (2009)

    Google Scholar 

  12. Hu, R., Delany, S., MacNamee, B.: Sampling with confidence: using K-NN confidence measures in active learning. In: ICCBR, p. 50 (2009)

    Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  14. Lakkaraju, H., Kamar, E., Caruana, R., Horvitz, E.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: AAAI, pp. 2124–2132 (2017)

    Google Scholar 

  15. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1

    Chapter  Google Scholar 

  16. McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: KDD, pp. 785–794. ACM (2015)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  18. Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)

    Google Scholar 

  19. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL, p. 271. Association for Computational Linguistics (2004)

    Google Scholar 

  20. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL, pp. 115–124. Association for Computational Linguistics (2005)

    Google Scholar 

  21. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)

    Google Scholar 

  22. Sani, S., Wiratunga, N., Massie, S., Cooper, K.: kNN sampling for personalised human activity recognition. In: Aha, D.W., Lieber, J. (eds.) ICCBR 2017. LNCS (LNAI), vol. 10339, pp. 330–344. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61030-6_23

    Chapter  Google Scholar 

  23. Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)

    Google Scholar 

  24. Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: NIPS, pp. 1289–1296 (2008)

    Google Scholar 

  25. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT, pp. 287–294. ACM (1992)

    Google Scholar 

  26. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622. ACM (2008)

    Google Scholar 

  27. Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90(2), 227–244 (2000)

    Article  MathSciNet  Google Scholar 

  28. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  29. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  30. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)

    Article  Google Scholar 

  31. Zhang, T., Oles, F.: The value of unlabeled data for classification problems. In: ICML, pp. 1191–1198. Citeseer (2000)

    Google Scholar 

  32. Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum From Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)

    Google Scholar 

Download references

Acknowledgments

We thank all reviewers who provided the thoughtful and constructive comments on this paper. This research is funded by the National Key R&D Program of China (No. 2017YFC0803700), the National Natural Science Foundation of China (No. 61773167), the Shanghai Municipal Commission of Economy and Informatization (No. 170513), and the Open Research Fund of Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University. The computation is performed in the Supercomputer Center of ECNU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, B., Lin, X., Xiao, Y., Yang, J., He, L. (2018). An Effective Method for Identifying Unknown Unknowns with Noisy Oracle. In: Cox, M., Funk, P., Begum, S. (eds) Case-Based Reasoning Research and Development. ICCBR 2018. Lecture Notes in Computer Science(), vol 11156. Springer, Cham. https://doi.org/10.1007/978-3-030-01081-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01081-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01080-5

  • Online ISBN: 978-3-030-01081-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics