[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1367832.1367873acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesdg-oConference Proceedingsconference-collections
research-article

Active learning for e-rulemaking: public comment categorization

Published: 18 May 2008 Publication History

Abstract

We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking --- by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.

References

[1]
K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2003.
[2]
Claire Cardie, Cynthia Farina, Matt Rawding, Adil Aijaz, and Stephen Purpura. A Study in Rule-Specific Issue Categorization for e-Rulemaking. In Proceedings of the 9th Annual International Conference on Digital Government Research, 2008.
[3]
C. Coglianese. Weak democracy, strong information: The role of information technology in the rulemaking process. In V. Mayer-Schoenberger and D. Lazer, editors, Electronic Government to Information Government: Governing in the 21ST Century, 2007.
[4]
D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.
[5]
Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--168, 1997.
[6]
C. Kerwin. The state of rulemaking in the federal government. Technical report, Transcript Panel 1, 2005.
[7]
N. Kwon and E. Hovy. Information acquisition using multiple classifications. In Proceedings of the Fourth International Conference on Knowledge Capture (K-CAP 2007), 2007.
[8]
N. Kwon, E. Hovy, and S. Shulman. Multidimensional text analysis for erulemaking. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.
[9]
D. D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.
[10]
P. Melville and R. Mooney. Diverse ensembles for active learning. In Proceedings of ICML-04, 21st International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2004.
[11]
I. Muslea, S. Minton, and C. Knoblock. Selective sampling with redundant views. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 621--626, 2000.
[12]
K. Papineni. Why inverse document frequency? In Proceedings of the North American Association for Computational Linguistics, NAACL, pages 25--32, 2001.
[13]
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[14]
S. Purpura and D. Hillard. Automated Classification of Congressional Legislation. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.
[15]
G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
[16]
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press, Cambridge, MA, 2002.
[17]
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning Theory, pages 287--294, 1992.
[18]
S. Shulman. Perverse incentives: The case against mass e-mail campaigns. In Proceedings of the Annual Meeting of the American Political Science Association, 2008.
[19]
P. Strauss, T. Rakoff, and C. Farina. Administrative Law. 10th edition, 2003.
[20]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.
[21]
H. Yang and J. Callan. Near-duplicate detection for erulemaking. In Proceedings of the Fifth National Conference on Digital Government Research, 2005.
[22]
H. Yang and J. Callan. Near-duplicate detection by instance-level constrained clustering. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.

Cited By

View all
  • (2018)E-RulemakingInternational Journal of Technology and Human Interaction10.4018/IJTHI.201804010314:2(35-53)Online publication date: 1-Apr-2018
  • (2015)Introducing textual analysis tools for policy informaticsProceedings of the 16th Annual International Conference on Digital Government Research10.1145/2757401.2757421(10-19)Online publication date: 27-May-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
dg.o '08: Proceedings of the 2008 international conference on Digital government research
May 2008
488 pages
ISBN:9781605580999

Sponsors

  • Routledge
  • Springer
  • Elsevier
  • Cefrio
  • NCDG: National Center for Digital Government

Publisher

Digital Government Society of North America

Publication History

Published: 18 May 2008

Check for updates

Author Tags

  1. active learning
  2. e-rulemaking
  3. machine learning
  4. public comment
  5. text categorization

Qualifiers

  • Research-article

Conference

dg.o '08
Sponsor:
  • NCDG
dg.o '08: Digital government research
May 18 - 21, 2008
Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 150 of 271 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)E-RulemakingInternational Journal of Technology and Human Interaction10.4018/IJTHI.201804010314:2(35-53)Online publication date: 1-Apr-2018
  • (2015)Introducing textual analysis tools for policy informaticsProceedings of the 16th Annual International Conference on Digital Government Research10.1145/2757401.2757421(10-19)Online publication date: 27-May-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media