A Coarse-to-Fine Model for Geolocating Chinese Addresses
<p>Polysemy in Chinese.</p> "> Figure 2
<p>The overall architecture of CFM.</p> "> Figure 3
<p>GeoSOT subdivision model.</p> "> Figure 4
<p>The distribution of the Chinese address lengths.</p> "> Figure 5
<p>Performance in distinguishing polysemy.</p> "> Figure 6
<p>Comparison of the geolocation accuracy under different input/output lengths.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Textual Geolocation Prediction
2.2. Neural Language Modeling
3. Methodology
3.1. Problem Statement
3.2. Overall Architecture
3.3. GeoSOT Subdivision Scheme
3.4. Representing Chinese Textual Addresses
3.4.1. Input Processing
3.4.2. Feature Extraction
3.5. Coarse-to-Fine Location Prediction
4. Results and Discussion
4.1. Experiment Settings
4.2. Visualizing the Performance in Polysemy Recognition
4.3. Comparison in Geolocation Prediction
4.4. Ablation Study
4.5. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Mahmud, J.; Nichols, J.; Drews, C. Where Is This Tweet From? Inferring Home Locations of Twitter Users. ICWSM 2012, 12, 511–514. [Google Scholar]
- Wing, B.; Baldridge, J. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 336–348. [Google Scholar]
- Huang, B.; Carley, K.M. A Hierarchical Location Prediction Neural Network for Twitter User Geolocation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4734–4744. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Jin, A.; Cheng, C. Spatial Data Coding Method Based on Global Subdivision Grid. J. Geomat. Sci. Technol. 2013, 30, 284–287. [Google Scholar]
- Sundermeyer, M.; Schlüter, R.; Ney, H. LSTM neural networks for language modeling. In Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, ON, USA, 9–13 September 2012. [Google Scholar]
- Fink, C.; Piatko, C.; Mayfield, J.; Chou, D.; Finin, T.; Martineau, J. The geolocation of web logs from textual clues. In Proceedings of the 2009 International Conference on Computational Science and Engineering, Vancouver, BC, Canada, 29–31 August 2009; Volume 4, pp. 1088–1092. [Google Scholar]
- Chi, L.; Lim, K.H.; Alam, N.; Butler, C.J. Geolocation prediction in twitter using location indicative words and textual features. In Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), Osaka, Japan, 11 December 2016; pp. 227–234. [Google Scholar]
- Liu, B.; Yuan, Q.; Cong, G.; Xu, D. Where your photo is taken: Geolocation prediction for social images. J. Assoc. Inf. Sci. Technol. 2014, 65, 1232–1243. [Google Scholar] [CrossRef]
- Rahimi, A.; Cohn, T.; Baldwin, T. Semi-supervised user geolocation via graph convolutional networks. arXiv 2018, arXiv:1804.08049. [Google Scholar]
- Miura, Y.; Taniguchi, M.; Taniguchi, T.; Ohkuma, T. Unifying text, metadata, and user network representations with a neural network for geolocation prediction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July– 4 August 2017; pp. 1260–1272. [Google Scholar]
- Wing, B.; Baldridge, J. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, ON, USA, 19–24 June 2011; pp. 955–964. [Google Scholar]
- Roller, S.; Speriosu, M.; Rallapalli, S.; Wing, B.; Baldridge, J. Supervised text-based geolocation using language models on an adaptive grid. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 13 July 2012; pp. 1500–1510. [Google Scholar]
- Han, B.; Cook, P.; Baldwin, T. A stacking-based approach to twitter user geolocation prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, 4–9 August 2013; pp. 7–12. [Google Scholar]
- Han, B.; Cook, P.; Baldwin, T. Text-based twitter user geolocation prediction. J. Artif. Intell. Res. 2014, 49, 451–500. [Google Scholar] [CrossRef]
- Rout, D.; Bontcheva, K.; Preoţiuc-Pietro, D.; Cohn, T. Where’s@ wally? A classification approach to geolocating users based on their social ties. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, Paris, France, 1–3 May 2013; pp. 11–20. [Google Scholar]
- Rahimi, A.; Vu, D.; Cohn, T.; Baldwin, T. Exploiting text and network context for geolocation of social media users. arXiv 2015, arXiv:1506.04803. [Google Scholar]
- Dredze, M.; Osborne, M.; Kambadur, P. Geolocation for twitter: Timing matters. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1064–1069. [Google Scholar]
- Kulkarni, S.; Jain, S.; Hosseini, M.J.; Baldridge, J.; Ie, E.; Zhang, L. Spatial Language Representation with Multi-Level Geocoding. arXiv 2020, arXiv:2008.09236. [Google Scholar]
- Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar]
- Mikolov, T.; Karafiát, M.; Burget, L.; Cernocky, J.; Kombrink, S. Recurrent neural network based language model. In Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010; pp. 1045–1048. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 5–10 December 2013; pp. 3111–3119. [Google Scholar]
- Dai, A.M.; Le, Q.V. Semi-supervised sequence learning. In Proceedings of the Advances in Neural Information Processing Systems, Center, MO, Canada, 7–12 December 2015; pp. 3079–3087. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (accessed on 17 October 2020).
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Lawder, J.K. The Application of Space-Filling Curves to the Storage and Retrieval of Multi-Dimensional Data. Ph.D. Thesis, Citeseer, London, UK, 2000. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Maaten, L.v.d.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Cheng, Z.; Caverlee, J.; Lee, K. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 November 2010; pp. 759–768. [Google Scholar]
- Pao, Y.; Kodesh, S.; Chopra, J.; Fan, K.; Lin, C.; Shen, L.; Levi, D.; Haque, A.; Baz, Z.E. Geohash-Related Location Predictions. U.S. Patent 9,894,484, 13 February 2018. [Google Scholar]
Level | Grid Size | Level | Grid Size | Level | Grid Size | Level | Grid Size |
---|---|---|---|---|---|---|---|
1 | - | 9 | 128 km | 17 | 512 m | 25 | 2 m |
2 | - | 10 | 64 km | 18 | 256 m | 26 | 1 m |
3 | - | 11 | 32 km | 19 | 128 m | 27 | 0.5 m |
4 | - | 12 | 16 km | 20 | 64 m | 28 | 25 cm |
5 | - | 13 | 8 km | 21 | 32 m | 29 | 12.5 cm |
6 | 1024 km | 14 | 4 km | 22 | 16 m | 30 | 6.2 cm |
7 | 512 km | 15 | 2 km | 23 | 8 m | 31 | 3.1 cm |
8 | 256 km | 16 | 1 km | 24 | 4 m | 32 | 1.5 cm |
Address (In Chinese) | Address (In English) | Lon. | Lat. | GeoSOT Code (L17) |
---|---|---|---|---|
北京市海淀区民族园路2号 大润发超市 | RT-MART, Minzuyuan Road No.2, Haidian District, Beijing | 116.391121 | 39.982151 | 30232031113311211 |
Methods | XGBoost-Regression | XGBoost-Classification | Ours-768 | Ours-1024 | Ours-2048 |
---|---|---|---|---|---|
Mean (m) | 1170.4 | 723.8 | 612.3 | 583.2 | 552.6 |
Mid (m) | 1065.0 | 591.5 | 497.7 | 460.0 | 423.2 |
Number of Encoders | Training Time | Training Loss | Evaluation Loss | Accuracy |
---|---|---|---|---|
6 | 3 h, 21 m, 30 s | 0.036 | 0.038 | 96.10% |
8 | 5 h, 15 m, 12 s | 0.032 | 0.030 | 96.23% |
10 | 7 h, 11 m, 37 s | 0.029 | 0.031 | 96.37% |
12 | 8 h, 59 m, 42 s | 0.028 | 0.029 | 96.41% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qian, C.; Yi, C.; Cheng, C.; Pu, G.; Liu, J. A Coarse-to-Fine Model for Geolocating Chinese Addresses. ISPRS Int. J. Geo-Inf. 2020, 9, 698. https://doi.org/10.3390/ijgi9120698
Qian C, Yi C, Cheng C, Pu G, Liu J. A Coarse-to-Fine Model for Geolocating Chinese Addresses. ISPRS International Journal of Geo-Information. 2020; 9(12):698. https://doi.org/10.3390/ijgi9120698
Chicago/Turabian StyleQian, Chunyao, Chao Yi, Chengqi Cheng, Guoliang Pu, and Jiashu Liu. 2020. "A Coarse-to-Fine Model for Geolocating Chinese Addresses" ISPRS International Journal of Geo-Information 9, no. 12: 698. https://doi.org/10.3390/ijgi9120698
APA StyleQian, C., Yi, C., Cheng, C., Pu, G., & Liu, J. (2020). A Coarse-to-Fine Model for Geolocating Chinese Addresses. ISPRS International Journal of Geo-Information, 9(12), 698. https://doi.org/10.3390/ijgi9120698