[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Corrective feedback and persistent learning for information extraction

Published: 01 October 2006 Publication History

Abstract

To successfully embed statistical machine learning models in real world applications, two post-deployment capabilities must be provided: (1) the ability to solicit user corrections and (2) the ability to update the model from these corrections. We refer to the former capability as corrective feedback and the latter as persistent learning. While these capabilities have a natural implementation for simple classification tasks such as spam filtering, we argue that a more careful design is required for structured classification tasks. One example of a structured classification task is information extraction, in which raw text is analyzed to automatically populate a database. In this work, we augment a probabilistic information extraction system with corrective feedback and persistent learning components to assist the user in building, correcting, and updating the extraction model. We describe methods of guiding the user to incorrect predictions, suggesting the most informative fields to correct, and incorporating corrections into the inference algorithm. We also present an active learning framework that minimizes not only how many examples a user must label, but also how difficult each example is to label. We empirically validate each of the technical components in simulation and quantify the user effort saved. We conclude that more efficient corrective feedback mechanisms lead to more effective persistent learning.

References

[1]
B. Anderson, A. Moore, Active learning for hidden Markov models: Objective functions and algorithms, in: ICML, 2005
[2]
Argamon-Engelson, S. and Dagan, I., Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence. 335-360.
[3]
J. Baldridge, M. Osborne, Active learning and the total cost of annotation, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004
[4]
P.N. Bennett, Assessing the calibration of naive Bayes' posterior estimates, Tech. Rep. CMU-CS-00-155, Computer Science Department, School of Computer Science, Carnegie Mellon University, September 2000
[5]
C. Cardie, D. Pierce, Proposal for an interactive environment for information extraction, Tech. Rep. TR98-1702, Cornell University, 1998
[6]
R. Caruana, P. Hodor, J. Rosenberg, High precision information extraction, in: KDD-2000 Workshop on Text Mining, August 2000
[7]
D. Cohn, Z. Ghahramani, M. Jordan, Active learning with statistical models, in: Advances in Neural Information Processing Systems, vol. 9, 1996, pp. 705--712
[8]
Cohn, D.A., Ghahramani, Z. and Jordan, M.I., Active learning with statistical models. In: Tesauro, G., Touretzky, D., Leen, T. (Eds.), Advances in Neural Information Processing Systems, vol. 7. MIT Press, Cambridge, MA. pp. 705-712.
[9]
A. Culotta, A. McCallum, Confidence estimation for information extraction, in: Human Language Technology Conference (HLT 2004), Boston, MA, 2004
[10]
A. Culotta, A. McCallum, Reducing labeling effort for structured prediction tasks, in: Twentieth National Conference on Artificial Intelligence, Pittsburgh, PA, 1990, pp. 746--751
[11]
M. Franzini, K.F. Lee, A. Waibel, Connectionist Viterbi training: a new hybrid method for continuous speech recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, 1990
[12]
Freund, Y., Seung, S., Shamir, E. and Tishby, N., Selective sampling using the query by committee algorithm. Machine Learning. v28. 133-168.
[13]
S. Gandrabur, G. Foster, Confidence estimation for text prediction, in: Proceedings of the Conference on Natural Language Learning, Edmonton, Canada, 2003
[14]
R. Ghani, R. Jones, T. Mitchell, E. Riloff, Active learning for information extraction with multiple view feature sets, in: ICML, 2003
[15]
A. Gunawardana, H. Hon, L. Jiang, Word-based acoustic confidence measures for large-vocabulary speech recognition, in: Proc. ICSLP-98, Sydney, Australia, 1998, pp. 791--794
[16]
Jaynes, E.T., Where do we stand on maximum entropy?. In: Levine, R.D., Tribus, M. (Eds.), The Maximum Entropy Formalism, MIT Press, Cambridge, MA. pp. 15-118.
[17]
T. Kristjannson, A. Culotta, P. Viola, A. McCallum, Interactive information extraction with conditional random fields, in: Nineteenth National Conference on Artificial Intelligence, 2004
[18]
N. Kushmerick, D.S. Weld, R.B. Doorenbos, Wrapper induction for information extraction, in: IJCAI, 1997, p. 729
[19]
Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA. pp. 282-289.
[20]
Lewis, D.D. and Catlett, J., Heterogeneous uncertainty sampling for supervised learning. In: Cohen, W.W., Hirsh, H. (Eds.), Proceedings of ICML-94, 11th International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, US, New Brunswick, US. pp. 148-156.
[21]
Liu, D.C. and Nocedal, J., On the limited memory BFGS method for large scale optimization. Math. Programming Ser. B. v45 i3. 503-528.
[22]
J. Mankoff, G.D. Abowd, Error correction techniques for handwriting, speech, and other ambiguous or error prone systems, Tech. Rep., GVU Center and College of Computing Georgia Institute of Technology, 1999
[23]
A. McCallum, Efficiently inducing features of conditional random fields, in: Nineteenth Conference on Uncertainty in Artificial Intelligence, 2003
[24]
McCallum, A. and Li, W., Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Hearst, M., Ostendorf, M. (Eds.), HLT-NAACL, Association for Computational Linguistics, Edmonton, Alberta, Canada.
[25]
A. McCallum, K. Nigam, Employing em in pool-based active learning for text classification, in: Proceedings of the 15th International Conference on Machine Learning, 1998, pp. 359--367
[26]
I. Muslea, S. Minton, C. Knoblock, Active learning with strong and weak views: a case study on wrapper induction, in: Proceedings of International Joint Conference on Artificial Intelligence, 2003, pp. 415--420
[27]
D. Pinto, A. McCallum, X. Wei, W.B. Croft, Table extraction using conditional random fields, in: SIGIR '03: Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003
[28]
L. Rabiner, A tutorial on hidden Markov models, in: IEEE, vol. 77, 1989, pp. 257--286
[29]
N. Roy, A. McCallum, Toward optimal active learning through sampling estimation of error reduction, in: Proceedings of the 18th International Conference on Machine Learning, 2001, pp. 441--448
[30]
T. Scheffer, C. Decomain, S. Wrobel, Active hidden Markov models for information extraction, in: Advances in Intelligent Data Analysis, 4th International Conference, IDA 2001, 2001
[31]
A.I. Schein, Active learning for logistic regression, Ph.D. thesis, University of Pennsylvania, 2005
[32]
R. Schwartz, Y.-L. Chow, The n-best algorithms: an efficient and exact procedure for finding the n most likely sentence hypotheses, in: International Conference on Acoustics, Speech, and Signal Processing (ICASSP-90), 1990
[33]
Sha, F. and Pereira, F., Shallow parsing with conditional random fields. In: Hearst, M., Ostendorf, M. (Eds.), HLT-NAACL: Main Proceedings, Association for Computational Linguistics, Edmonton, Alberta, Canada. pp. 213-220.
[34]
B. Suhm, B.A. Myers, A. Waibel, Model-based and empirical evaluation of multimodal interactive error correction, in: CHI, 1999
[35]
Sutton, C. and McCallum, A., An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (Eds.), Introduction to Statistical Relational Learning, MIT Press, Cambridge, MA.
[36]
Thompson, C.A., Califf, M.E. and Mooney, R.J., Active learning for natural language parsing and information extraction. In: Proc. 16th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA. pp. 406-414.
[37]
A. Vlachos, Active annotation, in: Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction, 2006
[38]
Zuker, M., Suboptimal sequence alignment in molecular biology: Alignment with error analysis. Journal of Molecular Biology. v221. 403-420.

Cited By

View all
  • (2020)Using long short‐term memory neural networks to analyze SEC 13D filingsInternational Journal of Intelligent Systems in Accounting and Finance Management10.1002/isaf.146426:4(153-163)Online publication date: 14-Feb-2020
  • (2019)AnchorVizACM Transactions on Interactive Intelligent Systems10.1145/324137910:1(1-38)Online publication date: 9-Aug-2019
  • (2017)A Probabilistically Integrated System for Crowd-Assisted Text Labeling and ExtractionJournal of Data and Information Quality10.1145/30120038:2(1-23)Online publication date: 9-Feb-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence
Artificial Intelligence  Volume 170, Issue 14-15
October, 2006
43 pages

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 October 2006

Author Tags

  1. Active learning
  2. Graphical models
  3. Information extraction

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Using long short‐term memory neural networks to analyze SEC 13D filingsInternational Journal of Intelligent Systems in Accounting and Finance Management10.1002/isaf.146426:4(153-163)Online publication date: 14-Feb-2020
  • (2019)AnchorVizACM Transactions on Interactive Intelligent Systems10.1145/324137910:1(1-38)Online publication date: 9-Aug-2019
  • (2017)A Probabilistically Integrated System for Crowd-Assisted Text Labeling and ExtractionJournal of Data and Information Quality10.1145/30120038:2(1-23)Online publication date: 9-Feb-2017
  • (2014)Effective balancing error and user effort in interactive handwriting recognitionPattern Recognition Letters10.1016/j.patrec.2013.03.01037(135-142)Online publication date: 1-Feb-2014
  • (2014)Combining human analysis and machine data mining to obtain credible data relationsInformation Sciences: an International Journal10.1016/j.ins.2014.08.014288:C(254-278)Online publication date: 20-Dec-2014
  • (2014)Eliciting good teaching from humans for machine learnersArtificial Intelligence10.1016/j.artint.2014.08.005217:C(198-215)Online publication date: 1-Dec-2014
  • (2012)End-user interactions with intelligent and autonomous systemsCHI '12 Extended Abstracts on Human Factors in Computing Systems10.1145/2212776.2212713(2755-2758)Online publication date: 5-May-2012
  • (2012)Continuous user feedback learning for data capture from business documentsProceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II10.1007/978-3-642-28931-6_51(538-549)Online publication date: 28-Mar-2012
  • (2011)Video annotation and tracking with active learningProceedings of the 25th International Conference on Neural Information Processing Systems10.5555/2986459.2986463(28-36)Online publication date: 12-Dec-2011
  • (2011)Exploring the corporate ecosystem with a semi-supervised entity graphProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063844(1857-1866)Online publication date: 24-Oct-2011
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media