More Web Proxy on the site http://driver.im/

research-article

Active learning for e-rulemaking: public comment categorization

Authors:

Stephen Purpura,

Jesse SimonsAuthors Info & Claims

dg.o '08: Proceedings of the 2008 international conference on Digital government research

Pages 234 - 243

Published: 18 May 2008 Publication History

Abstract

We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking --- by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.

References

[1]

K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2003.

[2]

Claire Cardie, Cynthia Farina, Matt Rawding, Adil Aijaz, and Stephen Purpura. A Study in Rule-Specific Issue Categorization for e-Rulemaking. In Proceedings of the 9th Annual International Conference on Digital Government Research, 2008.

Digital Library

[3]

C. Coglianese. Weak democracy, strong information: The role of information technology in the rulemaking process. In V. Mayer-Schoenberger and D. Lazer, editors, Electronic Government to Information Government: Governing in the 21ST Century, 2007.

[4]

D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.

[5]

Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--168, 1997.

Digital Library

[6]

C. Kerwin. The state of rulemaking in the federal government. Technical report, Transcript Panel 1, 2005.

[7]

N. Kwon and E. Hovy. Information acquisition using multiple classifications. In Proceedings of the Fourth International Conference on Knowledge Capture (K-CAP 2007), 2007.

Digital Library

[8]

N. Kwon, E. Hovy, and S. Shulman. Multidimensional text analysis for erulemaking. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.

Digital Library

[9]

D. D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.

Digital Library

[10]

P. Melville and R. Mooney. Diverse ensembles for active learning. In Proceedings of ICML-04, 21st International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2004.

Digital Library

[11]

I. Muslea, S. Minton, and C. Knoblock. Selective sampling with redundant views. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 621--626, 2000.

Digital Library

[12]

K. Papineni. Why inverse document frequency? In Proceedings of the North American Association for Computational Linguistics, NAACL, pages 25--32, 2001.

Digital Library

[13]

M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.

[14]

S. Purpura and D. Hillard. Automated Classification of Congressional Legislation. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.

Digital Library

[15]

G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

Digital Library

[16]

B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press, Cambridge, MA, 2002.

Digital Library

[17]

H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning Theory, pages 287--294, 1992.

Digital Library

[18]

S. Shulman. Perverse incentives: The case against mass e-mail campaigns. In Proceedings of the Annual Meeting of the American Political Science Association, 2008.

[19]

P. Strauss, T. Rakoff, and C. Farina. Administrative Law. 10th edition, 2003.

[20]

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

[21]

H. Yang and J. Callan. Near-duplicate detection for erulemaking. In Proceedings of the Fifth National Conference on Digital Government Research, 2005.

Digital Library

[22]

H. Yang and J. Callan. Near-duplicate detection by instance-level constrained clustering. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.

Digital Library

Cited By

Carvalho NLourenço R(2018)E-RulemakingInternational Journal of Technology and Human Interaction10.4018/IJTHI.201804010314:2(35-53)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.4018/IJTHI.2018040103
Hagen LHarrison TUzuner ÖFake TLamanna DKotfila CMossberger KHelbig NZhang JKim Y(2015)Introducing textual analysis tools for policy informaticsProceedings of the 16th Annual International Conference on Digital Government Research10.1145/2757401.2757421(10-19)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2757401.2757421

Index Terms

Active learning for e-rulemaking: public comment categorization

Recommendations

A -Nearest Neighbor Based Algorithm for Multi-Instance Multi-Label Active Learning
Artificial Neural Networks in Pattern Recognition
Abstract
Multi-instance multi-label learning (MIML) is a framework in machine learning in which each object is represented by multiple instances and associated with multiple labels. This relatively new approach has achieved success in various applications, ...
A Novel Active Learning Method Using SVM for Text Classification

Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information ...
Large-scale text categorization by batch mode active learning
WWW '06: Proceedings of the 15th international conference on World Wide Web

Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the human efforts in labeling text documents for building reliable classification models. In ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

dg.o '08: Proceedings of the 2008 international conference on Digital government research

May 2008

488 pages

ISBN:9781605580999

Conference Chairs:
Monique Charbonneau
CEFRIO
,
Lester Diamond
US Social Security Administration
,
Stuart Shulman
University of Pittsburgh
,
Program Chairs:
Soon Ae Chun,
Marijn Janssen,
J. Ramon Gil-Garcia

Sponsors

Routledge
Springer
Elsevier
Cefrio
NCDG: National Center for Digital Government

Publisher

Digital Government Society of North America

Publication History

Published: 18 May 2008

Check for updates

Author Tags

Qualifiers

Research-article

Conference

dg.o '08

Sponsor:

NCDG

dg.o '08: Digital government research

May 18 - 21, 2008

Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 150 of 271 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
160
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Carvalho NLourenço R(2018)E-RulemakingInternational Journal of Technology and Human Interaction10.4018/IJTHI.201804010314:2(35-53)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.4018/IJTHI.2018040103
Hagen LHarrison TUzuner ÖFake TLamanna DKotfila CMossberger KHelbig NZhang JKim Y(2015)Introducing textual analysis tools for policy informaticsProceedings of the 16th Annual International Conference on Digital Government Research10.1145/2757401.2757421(10-19)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2757401.2757421

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten