Abstract
Detecting the location entities mentioned in Twitter messages is useful in text mining for business, marketing or defence applications. Therefore, techniques for extracting the location entities from the Twitter textual content are needed. In this work, we approach this task in a similar manner to the Named Entity Recognition (NER) task focused only on locations, but we address a deeper task: classifying the detected locations into names of cities, provinces/states, and countries. We approach the task in a novel way, consisting in two stages. In the first stage, we train Conditional Random Fields (CRF) models with various sets of features; we collected and annotated our own dataset or training and testing. In the second stage, we resolve cases when there exist more than one place with the same name. We propose a set of heuristics for choosing the correct physical location in these cases. We report good evaluation results for both tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-Where: Geotagging Web Content. In: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 273–280. ACM Press, New York (2004), http://dl.acm.org/citation.cfm?id=1008992.1009040
Bouillot, F., Poncelet, P., Roche, M.: How and why exploit tweet ’ s location information? In: Jérôme Gensel, D.J., Vandenbroucke, D. (eds.) AGILE 2012 International Conference on Geographic Information Science, pp. 24–27. Avignon (2012)
Cohen, W.W.: Minorthird: Methods for identifying names and ontological relations in text using heuristics for inducing regularities from data (2004)
Cunningham, H.: GATE, a general architecture for text engineering. Computers and the Humanities 36(2), 223–254 (2002)
Gelernter, J., Mushegian, N.: Geo-parsing messages from microtext. Transactions in GIS 15(6), 753–773 (2011)
Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics, Morristown (2002), http://dl.acm.org/citation.cfm?id=1072228.1072355
Liu, F., Vasardani, M., Baldwin, T.: Automatic identification of locative expressions from social media text: A comparative analysis. In: Proceedings of the 4th International Workshop on Location and the Web, LocWeb 2014, pp. 9–16. ACM, New York (2014), http://doi.acm.org/10.1145/2663713.2664426
Mani, I., Hitzeman, J., Richer, J., Harris, D., Quimby, R., Wellner, B.: SpatialML: Annotation Scheme, Corpora, and Tools. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, p. 11 (2008), http://www.lrec-conf.org/proceedings/lrec2008/summaries/106.html
Owoputi, O., OConnor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of NAACL-HLT, pp. 380–390 (2013)
Paradesi, S.: Geotagging tweets using their content. In: Proceedings of the Twenty-Fourth International Florida, pp. 355–356 (2011), http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/viewFile/2617/3058
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fluart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006. European Language Resources Association, ELRA (2006), http://aclweb.org/anthology/L06-1349
Qin, T., Xiao, R., Fang, L., Xie, X., Zhang, L.: An efficient location extraction algorithm by leveraging web contextual information. In: proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 53–60. ACM (2010)
Sarawagi, S., Cohen, W.W.: Semi-markov conditional random fields for information extraction. In: NIPS, vol. 17, pp. 1185–1192 (2004)
Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.Y.: Detecting geographic locations from web resources. In: Proceedings of the 2005 Workshop on Geographic Information Retrieval, GIR 2005, p. 17. ACM Press, New York (2005), http://dl.acm.org/citation.cfm?id=1096985.1096991
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Inkpen, D., Liu, J., Farzindar, A., Kazemi, F., Ghazi, D. (2015). Detecting and Disambiguating Locations Mentioned in Twitter Messages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)