Abstract
This paper proposes an automatic code classification for Korean census data by using information retrieval technique and memoory-based learning technique. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was trained by memory based learning and experimented with 46,762 industry records and occupation 36,286 records. It was evaluated by using 10-fold cross-validation method. As experimental results, the proposed system showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apeel, M.V., Hellerman, E.: Census Bureau Experiments with Automated Industry and Occupation Coding. In: Proceedings of the American Statistical Association, pp. 32–40 (1983)
Baeza-Yates, Ribeiro-Neto: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Chen, B., Creecy, R.H., Appel, M.: On Error Control of Automated Industry and Occupation Coding. Journal of Official Statistics 9(4), 729–745 (1993)
Creecy, R.H., Masand, B.M., Smith, S.J., Walts, D.L.: Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8), 48–64 (1992)
Gilman, D.W., Appel, M.V.: Automated Coding Research At the Census Bureau. U.S. Census Bureau, http://www.census.gov/srd/papers/pdf/rr94-4.pdf
Korean Standard Industrial Classification. National Statistical Office (January 2000)
Korean Standard Classification of Occupations. National Statistical Office (January 2000)
Lee, D.G.: A High Speed Index Term Extracting System Considering the Morphological Configuration of Noun. M.S. Thesis, Dept. of Computer Science and Engineering, Korea Univ., Korea (2000)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Rowe, E., Wong, C.: An Introduction to the ACRT Coding System. Bureau of the Census Statistical Research Report Series No. RR94/02 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, H.S., Lee, W.K.H., Kim, H.C., Jeong, S.Y., Yu, H.C. (2005). An Automatic Code Classification System by Using Memory-Based Learning and Information Retrieval Technique. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_53
Download citation
DOI: https://doi.org/10.1007/11562382_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)