Abstract
Due to the increasing amount of data provided by news sources and the user specific information needs, recently, many news personalization systems have been proposed. Often, these systems process news data automatically into information, while relying on underlying knowledge bases, containing concepts and their relations for specific domains. For this, information extraction rules are frequently used, yet they are usually manually constructed. As it is difficult to efficiently maintain a balance between precision and recall, while using a manual approach, we present a genetic programming-based approach for automatically learning semantic information extraction rules from (financial) news that extract events. Our evaluation results show that compared to information extraction rules constructed by expert users, our rules yield a 27% higher F 1-measure after the same amount of rules construction time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angeline, P.J.: Subtree Crossover: Building Block Engine or Macromutation? In: 2nd Ann. Conf. on Genetic Programming (GP 1997), pp. 9–17. Morgan Kaufmann (1997)
Black, W.J., Mc Naught, J., Vasilakopoulos, A., Zervanou, K., Theodoulidis, B., Rinaldi, F.: CAFETIERE: Conceptual Annotations for Facts, Events, Terms, Individual Entities, and RElations. Technical Report TR–U4.3.1, UMIST (2005)
Borg, C., Rosner, M., Pace, G.J.: Automatic Grammar Rule Extraction and Ranking for Definitions. In: 7th Int. Conf. of Language Resources and Evaluation (LREC 2010). European Language Resources Association (2010)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled Semi-Supervised Learning for Information Extraction. In: 3rd Int. Conf. on Web Search and Data Mining (WSDM 2010), pp. 101–110. ACM (2010)
Castellanos, M., Gupta, C., Wang, S., Dayal, U.: Leveraging Web Streams for Contractual Situational Awareness in Operational BI. In: Int. Workshop on Business intelligencE and the WEB (BEWEB 2010) in Conjunction with EDBT/ICDT 2010 Joint Conf., pp. 1–8. ACM (2010)
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), pp. 168–175. Association for Computational Linguistics (2002)
Domingue, J., Motta, E.: PlanetOnto: From News Publishing to Integrated Knowledge Management Support. IEEE Intelligent Systems 15(3), 26–32 (2000)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction From The Web: An Experimental Study. Artificial Intelligence 165(1), 91–134 (2005)
Frasincar, F., Borsje, J., Hogenboom, F.: Personalizing News Services Using Semantic Web Technologies. In: E-Business Applications for Product Development and Competitive Growth: Emerging Technologies, pp. 261–289. IGI Global (2011)
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conf. on Computational Linguistics (COLING 1992), vol. 2, pp. 539–545 (1992)
IJntema, W., Sangers, J., Hogenboom, F., Frasincar, F.: A Lexico-Semantic Pattern Language for Learning Ontology Instances from Text. J. of Web Semantics: Science, Services and Agents on the World Wide Web 15(1), 37–50 (2012)
Jones, T.: Crossover Macromutation and Population-based Search. In: 6th Int. Conf. on Genetic Algorithms (ICGA 1995), pp. 73–80. Morgan Kaufmann (1995)
Maynard, D., Saggion, H., Yankova, M., Bontcheva, K., Peters, W.: Natural Language Technology for Information Integration in Business Intelligence. In: Abramowicz, W. (ed.) BIS 2007. LNCS, vol. 4439, pp. 366–380. Springer, Heidelberg (2007)
Sangers, J., Hogenboom, F., Frasincar, F.: Event-Driven Ontology Updating. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 44–57. Springer, Heidelberg (2012)
Snow, R., Jurafsky, D., Ng, A.Y.: Learning Syntactic Patterns for Automatic Hypernym Discovery. In: 18th Ann. Conf. on Neural Information Processing Systems (NIPS 2004). Advances in Neural Information Processing Systems, vol. 17, pp. 1297–1304. MIT Press (2004)
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)
Thompson, D.R., Bilbro, G.L.: Comparison of a Genetic Algorithm with a Simulated Annealing Algorithm for the Design of an ATM Network. IEEE Communications Letters 4(8), 267–269 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
IJntema, W., Hogenboom, F., Frasincar, F., Vandic, D. (2014). A Genetic Programming Approach for Learning Semantic Information Extraction Rules from News. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)