GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest
<p>The overall framework of this study consists of three parts. The part on the left side shows how we construct pre-training corpora based on POIs and Geohash Grids. The middle part shows the model structure, which is based on the BERT structure. E represents the input embedding, Trm refers to a transformer block, and T represents the output token. On the right side is the fine-tuning module. We design five practical downstream tasks. The grid embedding learned from GeoBERT can be directly used for fine-tuning or combined with other features.</p> "> Figure 2
<p>The left part exhibits the area of Shanghai, which covers 404,697 level-7 Geohash grids in total. On the right, is a slice that covers 20 grids. Each level-7 grid can be represented by a unique Geohash string of length 7. All the smaller grids that belong to the same larger grid of the upper level share the same prefix. As shown in the figure, the Geohash of 4 grids in the lower right corner share six characters “wtw3w3” in the prefix since they all belong to a much larger level-6 Geohash grid. The same phenomenon can be observed in all four corners.</p> "> Figure 3
<p>Distribution of POI sequence length of grids. The POI sequence length is the number of POIs in a level-7 Geohash grid. The max length is set to 64, which covers 97.33%.</p> "> Figure 4
<p>Construct a POI sequence based on distance ordering from the center point.</p> "> Figure 5
<p>The POI number dataset of Shanghai.</p> "> Figure 6
<p>The working/living area dataset of Shanghai, where yellow refers to living area and red to working area.</p> "> Figure 7
<p>The passenger flow dataset of Shanghai, where red refers to high-density areas.</p> "> Figure 8
<p>The house price dataset of Shanghai, where red refers to higher house prices.</p> "> Figure 9
<p>The process of building a bank recommendation dataset.</p> "> Figure 10
<p>Results of classification tasks. Word2vec and GloVe still achieve good results. GeoBERT leads Word2vec by 4.49% in Accuracy for the store site recommendation task and 1.51% for the working∖living area prediction task.</p> "> Figure 11
<p>Self-attention mechanism in GeoBERT. GeoBERT consists of 12 layers and 12 heads in each layer. Each row represents a layer (from 0 to 11). Each block represents a head (from 0 to 11). The lines between pairs of tokens in a head show the self-attention weights between them. The darker the color is, the greater the weight between the two tokens. Different layers are represented by different colors, while the same color represents heads in the same layer. These are better viewed in color: (<b>a</b>) GeoBERT-ShortestPath; (<b>b</b>) GeoBERT-CenterDistance; (<b>c</b>) GeoBERT-RandomPath.</p> "> Figure 12
<p>The pattern of attention to the next token. The attention mechanism for shortest Path POI sequence example in <a href="#applsci-12-12942-t016" class="html-table">Table 16</a> is illustrated. Note that the index starts at 0. Most tokens have a heavy attention weight directed to the subsequent tokens. However, this pattern is not absolute since we can see that some tokens are directed to the other tokens. Colors on the top identify the corresponding attention head(s), while the depth of color reflects the attention score: (<b>a</b>) attention weights for all tokens in Layer 1, Head 10; (<b>b</b>) attention weights for selected token ’Teahouse’.</p> "> Figure 13
<p>The pattern of attention to the previous token. In the example of Layer 0, Head 2, most tokens show apparent attention weight directed to the previous tokens. Of course, there are some exceptions, such as the token “Teahouse”, which still has close attention to the next token “Real Estate”: (<b>a</b>) attention weights for all tokens in Layer 0, Head 2; (<b>b</b>) attention weights for selected token ’Store’.</p> "> Figure 14
<p>Pattern of long-distance dependencies. “Real Estate” is directed to itself and “Express” in (<b>a</b>). However, the attention is also dispersed over many different words which can be seen in (<b>b</b>). The color in the right sequence represents its corresponding head with yellow for Head 1 and green for Head 2: (<b>a</b>) attention weights for selected token “Real Estate” in Layer 6, Head 1; (<b>b</b>) attention weights for selected token “Real Estate” in Layer 6, heads 1 (orange) and 2 (green).</p> "> Figure 15
<p>Attention Layer 1 and Layer 2 in GeoBERT-ShortestPath. We can see pairs of cross lines between adjacent tokens, which means that GeoBERT has learned the position information between adjacent tokens. (<b>a</b>) Attention Layer 1; (<b>b</b>) Attention Layer 2.</p> "> Figure 16
<p>Attention Layer 1 and Layer 2 in GeoBERT-CenterDistance. Pairs of cross lines between adjacent tokens can be clearly observed, and the conclusion is similar to the shortest Path. Moreover, most of these signs occur at shallow attention layers, basically from Layer 0 to Layer 2. Thus, we believe that in the shallow attention layers, GeoBERT learns the position information among POIs: (<b>a</b>) Attention Layer 1; (<b>b</b>) Attention Layer 2.</p> "> Figure 17
<p>Attention Layer 1 and Layer 2 in GeoBERT-RandomPath. Unlike the above two methods, there were no obvious signs observed. Therefore, we think that no position information has been acquired by GeoBERT-RandomPath. This phenomenon is reasonable since all POIs are ordered randomly: (<b>a</b>) Attention Layer 1: (<b>b</b>) Attention Layer 2.</p> "> Figure 18
<p>Two specific attention heads in GeoBERT-ShortestPath. In both figures, tokens “Mall” and “Hotel” strongly connect with all other tokens. We define this kind of token as the “Anchor POIs” in a grid. Anchor POIs play essential roles in a grid and to some extent can represent certain attributes of the whole grid: (<b>a</b>) Layer 9, Head 9; (<b>b</b>) Layer 10, Head 8.</p> "> Figure 19
<p>Attention Layer 10 (with all heads) of three models. Although “Mall’ and “Hotel” are in different positions in different corpora, they are successfully recognized by the models. As we have mentioned, the core ability of GeoBERT is to identify the most significant tokens in a grid and capture co-occurrence. These phenomena only appear in the deep attention layers, basically from Layer 9 to Layer 11 (layer index starts at 0).</p> "> Figure 20
<p>Attention mechanism for addition Case 1 in GeoBERT-ShortestPath. "CVS" is the abbreviation for convenience store. In (<b>a</b>), “Guesthouse” obtains attention weights from all other tokens. In (<b>b</b>), there are two “Bus Stations” in a grid, and both attract the most attention. Moreover, the weights for the first “Bus Station” are higher. This difference validates that the sequence order plays a role to some extent: (<b>a</b>) Layer 10, Head 7; (<b>b</b>) Layer 10, Head 8.</p> "> Figure 21
<p>Attention mechanism for addition Case 2 in GeoBERT-ShortestPath. The phenomenon is evident, and the above two heads each identify an anchor POI, namely “Supermarket” and “Industry Park”: (<b>a</b>) Layer 9, Head 9; (<b>b</b>) Layer 10, Head 3.</p> ">
Abstract
:1. Introduction
- To our limited knowledge, this study introduces the first large-scale pre-training geospatial representation learning model called GeoBERT. Through self-supervised learning, we pre-train GeoBERT on about 17 million POIs from the top 30 Chinese cities by GDP.
- We propose five practical downstream tasks for geospatial representation learning and validate them on GeoBERT, which dramatically expands the scope of current research. These tasks are of guiding significance to actual business activities.
- Numerous experiments have shown that with just simple fine-tuning, GeoBERT outperforms previous NLP methods used in this field, proving that pre-training with large urban data is more effective in extracting geospatial information.
- GeoBERT is highly scalable.The learned grid embedding of GeoBERT can be used as the base representation of the grid and concat with additional features to improve performance.
- From the perspective of the attention mechanism, we compare several ways of constructing POI sequences and dive into what GeoBERT has learned from large-scale POI data, which are neglected by previous research.
2. Related Work
2.1. Geospatial Representation Learning
2.2. Pre-Trained Models
3. Materials and Methods
3.1. Overall Framework
- Build Training Corpus: We collect about 17 million POIs in 30 cities in China and set the level-7 Geohash grid as the basic unit. Taking POI types as tokens and grids as sentences, we build three pre-training corpora based on the “shortest path”, “center distance”, and “random path” methods.
- Pre-Train GeoBERT: Utilizing the BERT structure, we pre-train GeoBERT by masking some percent tokens and then predicting those tokens.
- Fine-tune GeoBERT: GeoBERT is fine-tuned to address five downstream urban tasks. It is worth mentioning that GeoBERT can be used alone or combined with additional features.
3.2. Data and Preprocessing
3.3. Basic Geographic Unit
3.4. Build Training Corpus
3.4.1. Shortest Path
Algorithm 1 Shortest Path For POIs In A Grid |
|
Algorithm 2 Query POI |
|
3.4.2. Center Distance Path
3.4.3. Random Path
3.5. Pre-Training GeoBERT
3.6. Fine-Tuning GeoBERT
3.6.1. POI Number Prediction
3.6.2. Work/Living Area Prediction
3.6.3. Passenger Flow Prediction
3.6.4. House Price Prediction
3.6.5. Store Site Recommendation
- First, there are too little data for just one bank brand in one city, so we use other similar large chain joint-stock banks for data enhancement. Specifically, we select nine other large joint-stock bank brands similar to .
- Second, the grids of normally operating banks of selected brands are taken as positive samples.
- Third, we build the negative samples, which consist of two parts. The first part is the banks of that have already closed in the past. The second part is those non-bank POIs within 500 m of the normally operating banks of .
- Passenger flow: Two kinds of features are used to measure the level of passenger flow in a region. First, we calculate the passenger flow of in each hour t of the day. Moreover, we aggregate the eight grids around the current grid as a block to measure the surrounding environment. A block consists of 9 (3*3) adjacent grids. The calculation principle of block features is the same as the grid features but on a larger area (9 grids). See Equation (1). So, we omit the calculation formula of block features in the following features.
- Diversity: We calculate the number of POIs of different categories in each grid to reflect the diversification and heterogeneity of the environment, as shown in Equation (2). is the number of POIs of category c in , is the total number of POIs, and the is the set of all the POI categories. The diversity of block is calculated in the same way.
- Competitiveness: Stores of the same category in a region will form a competitive relationship and influence each other. We define the competitiveness feature in Equation (3). represents the total number of stores of the same category of the target stores in the area around the candidate location j, and is the number of stores of the same category except for the target stores in .
- Traffic Convenience: It reflects the accessibility to different means of transportation (e.g., bus station, subway station, ferry station, etc.), as shown in Equation (4). Here, represents a certain type of transportation, and is the number of all stations of the corresponding transportation type in Gird i.
- Residence: It reflects the surrounding residential conditions. Specifically, it is the number of residential buildings of different grades, for example, ordinary residence, high-grade residence, and villa.
- Concat: We directly concatenate the output of the transformer layer in GeoBERT and all features before the final classifier layer. There is no additional preprocessing.
- MLP: We put an MLP (Multilayer Perceptron) layer on additional features first and then concatenate it with the transformer before the final classifier layer.
- Weighted: We set a learnable weight matrix for each dimension of additional features and then sum it with transformer outputs before the final classifier layer, where W refers to the weight matrix.
- Gating: We complete a gated summation of transformer outputs and additional features before the final classifier layer, where is a hyper parameter and R is an activation function. Detailed information can be found in the paper [43].
4. Experiments and Results
4.1. Baseline and Setup
4.1.1. Baseline
- GloVe (2021): Proposed by Zhang et al. [5]. We set the number of POI-type vector dimensions to 70, window size to 10, and epoch to 10, according to the original paper. After training POI-type embedding, we use LightGBM for downstream tasks.
4.1.2. Setup
4.2. Downstream Task Results
- POI Number Prediction:The POI number prediction experiment result illustrated in Table 9 shows that GeoBERT significantly outperformed Word2vec and GloVe in three different training corpora. Generally, the overall performance of Word2vec was better than GloVe, while GeoBERT pre-trained on the shortest path corpus achieved the best result (0.1790 in MSE and 0.1343 in MAE).
- Working∖Living Area Prediction:The result of work and living area prediction is illustrated in Table 10. The GeoBERT series outperformed the series of Word2vec and GloVe, and GeoBERT pre-trained on random sequence corpus obtained the best results, with Accuracy of 0.7739 and F1-score 0.7719. However, the difference between groups and within groups was small. The best model GeoBERT-RandomPath improved the worst model by 4.91% in Accuracy and by 8.83% in F1-score.
- Passenger Flow Prediction: As exhibited in Table 11, the results in GeoBERT on passenger flow prediction far exceeded Word2vec and GloVe. GeoBERT pre-trained on the shortest path corpus obtained the best results, with 0.1446 in MSE and 0.1809 in MAE. The difference between the three corpora was not quite significant.
- House Price Prediction: As depicted in Table 12, GeoBERT significantly outperformed the other two models, and the performance difference between these three models was substantial. GeoBERT pre-trained on the shortest path obtained the best result.
- Store Site Recommendation: The results of store site recommendation with POIs only are shown in Table 13. GeoBERT achieved better performance, while GeoBERT pre-trained on center distance corpus obtained the best result, with 0.8359 in Accuracy and 0.7922 in F1-score. The differences between the three GeoBERT models are not considerable. The results of store site recommendation with additional features are illustrated in Table 14. We can see that with additional features, both Accuracy and F1-score were improved in all cases. Among different combination methods, the MLP method obtained the best performance, increasing Accuracy by 2.45% and F1-score by 3.04%. The result of these two experiments illustrate the following points:
- GeoBERT obtains good grid embedding from POI data and can be directly used for store site recommendation.
- GeoBERT is scalable and can be jointly used with additional features. The effect will be improved if more dimensional data is provided.
GeoBERT was pre-trained solely on POI data, which can be seen as static urban information. On the other hand, additional features, such as passenger flow each hour, can provide dynamic ubran information. However, in practice, additional features, such as user profiles and passenger flow data, are hard to access and often subject to privacy restrictions, while POIs are not. Therefore, GeoBERT alone can be used to achieve pretty good results which proves that GeoBERT has practical applied value. To sum up, POI data is the most readily available urban data that can contain much geospatial information. However, other features can also add information from different dimensions, especially user consumption behavior and travel behavior. GeoBERT is effective and having additional features would be better for more specific tasks.
5. Discussion
5.1. Result on Downstream Tasks
5.2. Ablation Study
5.3. What Does GeoBERT Actually Learn?—Part 1: Distilling Common Patterns
5.3.1. Pattern 1: Attention to Next Token
5.3.2. Pattern 2: Attention to Previous Token
5.3.3. Pattern 3: Long-Distance Dependencies
5.4. What Does GeoBERT Actually Learn?—Part2: Deeper Insights
5.4.1. Question 1: What Is the Difference between the Three POI Sequence Construction Methods?
5.4.2. Question 2: Why Do the Three Models Have Similar Effects on All Five Downstream Tasks?
6. Conclusions and Future Work
6.1. Conclusions
- The shortest path and center distance contain the position information among POIs in a grid, while the random path method does not.
- GeoBERT learns the position information in the shallow attention layers. In deep attention layers, GeoBERT captures co-occurrence among POIs and identifies the most important POIs, called the anchor POIs in a grid.
- The sequential relationship between POIs does not play an important role. What matters is the co-occurrence among POIs and the specific anchor POIs learned in deep attention layers.
6.2. Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yao, Z.; Fu, Y.; Liu, B.; Hu, W.; Xiong, H. Representing urban functions through zone embedding with human mobility patterns. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
- Huang, C.; Zhang, J.; Zheng, Y.; Chawla, N.V. DeepCrime: Attentive hierarchical recurrent networks for crime prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 1423–1432. [Google Scholar]
- Yao, Y.; Li, X.; Liu, X.; Liu, P.; Liang, Z.; Zhang, J.; Mai, K. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model. Int. J. Geogr. Inf. Sci. 2017, 31, 825–848. [Google Scholar] [CrossRef]
- Niu, H.; Silva, E.A. Delineating urban functional use from points of interest data with neural network embedding: A case study in Greater London. Comput. Environ. Urban Syst. 2021, 88, 101651. [Google Scholar] [CrossRef]
- Zhang, C.; Xu, L.; Yan, Z.; Wu, S. A glove-based poi type embedding model for extracting and identifying urban functional regions. ISPRS Int. J. Geo-Inf. 2021, 10, 372. [Google Scholar] [CrossRef]
- Yan, B.; Janowicz, K.; Mai, G.; Gao, S. From itdl to place2vec: Reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–10 November 2017; pp. 1–10. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (accessed on 23 November 2022).
- Mai, G.; Janowicz, K.; Hu, Y.; Gao, S.; Yan, B.; Zhu, R.; Cai, L.; Lao, N. A review of location encoding for GeoAI: Methods and applications. Int. J. Geogr. Inf. Sci. 2022, 36, 639–673. [Google Scholar] [CrossRef]
- Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar]
- Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Feng, S.; Cong, G.; An, B.; Chee, Y.M. Poi2vec: Geographical latent representation for predicting future visitors. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Xiang, M. Region2vec: An Approach for Urban Land Use Detection by Fusing Multiple Features. In Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence, Tianjin, China, 17–20 October 2020; pp. 13–18. [Google Scholar] [CrossRef]
- Zhu, M.; Wei, C.; Xia, J.; Ma, Y.; Zhang, Y. Location2vec: A Situation-Aware Representation for Visual Exploration of Urban Locations. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3981–3990. [Google Scholar] [CrossRef]
- Sun, Z.; Jiao, H.; Wu, H.; Peng, Z.; Liu, L. Block2vec: An Approach for Identifying Urban Functional Regions by Integrating Sentence Embedding Model and Points of Interest. ISPRS Int. J. Geo-Inf. 2021, 10, 339. [Google Scholar] [CrossRef]
- Zhang, J.; Li, X.; Yao, Y.; Hong, Y.; He, J.; Jiang, Z.; Sun, J. The Traj2Vec model to quantify residents’ spatial trajectories and estimate the proportions of urban land-use types. Int. J. Geogr. Inf. Sci. 2021, 35, 193–211. [Google Scholar] [CrossRef]
- Shoji, Y.; Takahashi, K.; Dürst, M.J.; Yamamoto, Y.; Ohshima, H. Location2vec: Generating distributed representation of location by using geo-tagged microblog posts. In Proceedings of the International Conference on Social Informatics, Saint-Petersburg, Russia, 25–28 September 2018; pp. 261–270. [Google Scholar]
- Zhang, Y.; Li, Q.; Tu, W.; Mai, K.; Yao, Y.; Chen, Y. Functional urban land use recognition integrating multi-source geospatial data and cross-correlations. Comput. Environ. Urban Syst. 2019, 78, 101374. [Google Scholar] [CrossRef]
- McKenzie, G.; Adams, B. A data-driven approach to exploring similarities of tourist attractions through online reviews. J. Locat. Based Serv. 2018, 12, 94–118. [Google Scholar] [CrossRef]
- Zhang, Y.; Zheng, X.; Helbich, M.; Chen, N.; Chen, Z. City2vec: Urban knowledge discovery based on population mobile network. Sustain. Cities Soc. 2022, 85, 104000. [Google Scholar] [CrossRef]
- Huang, W.; Cui, L.; Chen, M.; Zhang, D.; Yao, Y. Estimating urban functional distributions with semantics preserved POI embedding. Int. J. Geogr. Inf. Sci. 2022, 36, 1–26. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-trained models: Past, present and future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 16000–16009. [Google Scholar]
- Bao, H.; Dong, L.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.H.; Jin, D.; Naumann, T.; McDermott, M. Publicly available clinical BERT embeddings. arXiv 2019, arXiv:1904.03323. [Google Scholar]
- Huang, K.; Altosaar, J.; Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar]
- Fang, X.; Liu, L.; Lei, J.; He, D.; Zhang, S.; Zhou, J.; Wang, F.; Wu, H.; Wang, H. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 2022, 4, 127–134. [Google Scholar] [CrossRef]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 2021, 3, 1–23. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [Green Version]
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A pretrained language model for scientific text. arXiv 2019, arXiv:1903.10676. [Google Scholar]
- Liu, X.; Yin, D.; Zhang, X.; Su, K.; Wu, K.; Yang, H.; Tang, J. Oag-bert: Pre-train heterogeneous entity-augmented academic language models. arXiv 2021, arXiv:2103.02410. [Google Scholar]
- Huang, J.; Wang, H.; Sun, Y.; Shi, Y.; Huang, Z.; Zhuo, A.; Feng, S. ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 3029–3039. [Google Scholar]
- Zhou, J.; Gou, S.; Hu, R.; Zhang, D.; Xu, J.; Jiang, A.; Li, Y.; Xiong, H. A collaborative learning framework to tag refinement for points of interest. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Diego, CA, USA, 15–18 August 2019; pp. 1752–1761. [Google Scholar]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 19–27. [Google Scholar]
- Lu, W.; Tao, C.; Li, H.; Qi, J.; Li, Y. A unified deep learning framework for urban functional zone extraction based on multi-source heterogeneous data. Remote Sens. Environ. 2022, 270, 112830. [Google Scholar] [CrossRef]
- Rahman, W.; Hasan, M.K.; Lee, S.; Zadeh, A.; Mao, C.; Morency, L.P.; Hoque, E. Integrating multimodal information in large pretrained transformers. NIH Public Access 2020, 2020, 2359. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Vig, J. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Florence, Italy, 28 July–2 August 2019; pp. 37–42. [Google Scholar] [CrossRef]
POI Type | Number | Proportions |
---|---|---|
accommodation | 930,307 | 5.62% |
enterprise and business | 2,513,793 | 15.18% |
restaurant | 2,498,107 | 15.09% |
shopping | 3,603,615 | 21.76% |
transportation | 1,385,916 | 8.37% |
life services | 2,515,165 | 15.19% |
sport and leisure | 865,208 | 5.23% |
science and education | 725,916 | 4.38% |
health and medical | 683,560 | 4.13% |
government | 664,317 | 4.01% |
public facilities | 171,141 | 1.03% |
total | 16,557,045 | 100.00% |
Geohash Length (Level) | Cell Length | Cell Width |
---|---|---|
1 | ≤5000 km | ≤5000 km |
2 | ≤1250 km | ≤625 km |
3 | ≤156 km | ≤156 km |
4 | ≤39.1 km | ≤19.5 km |
5 | ≤4.89 km | ≤4.89 km |
6 | ≤1.22 km | ≤0.61 km |
7 | ≤153 m | ≤153 m |
8 | ≤19.1 m | ≤19.1 m |
Count | Mean | Std | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|---|
61,521 | 21.97 | 35.15 | 3 | 4 | 9 | 25 | 883 |
Living Area | Working Area | Total |
---|---|---|
34,049 | 28,156 | 62,205 |
Count | Mean | Std | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|---|
1,262,380 | 955.16 | 2951.65 | 1 | 13 | 88 | 580 | 649,795 |
Count | Mean | Std | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|---|
412,662 | 624,873 | 54,706 | 3742 | 29,415 | 48,574 | 84,546 | 361,633 |
Positive Samples | Negative Samples | Total |
---|---|---|
(Suitable for Opening a Store) | (Unsuitable) | |
701 | 1249 | 1950 |
Notations | Description |
---|---|
a level-7 geohash grid | |
9 (3*3) adjacent grids with in center | |
the number of POI in a grid | |
the final embedding after the merge of GeoBERT and addition features | |
h | the output of the last transformer layer in GeoBERT |
x | additional features |
W | a weight matrix for additional features |
abbreviation for Multilayer Perceptron | |
concate operation | |
R | activation function |
additional parameters only used in Gating method |
Model | MSE | MAE |
---|---|---|
GeoBERT-CenterDistance | 0.1932 | 0.1492 |
GeoBERT-ShortestPath | 0.1790 | 0.1343 |
GeoBERT-RandomPath | 0.1994 | 0.1383 |
Word2Vec-CenterDistance | 0.2503 | 0.2354 |
Word2Vec-ShortestPath | 0.2474 | 0.2330 |
Word2Vec-RandomPath | 0.2659 | 0.2385 |
GloVe-CenterDistance | 0.3824 | 0.3150 |
GloVe-ShortestPath | 0.3958 | 0.3147 |
GloVe-RandomPath | 0.4082 | 0.3225 |
Model | Accuracy | F1-Score |
---|---|---|
GeoBERT-CenterDistance | 0.7736 | 0.7712 |
GeoBERT-ShortestPath | 0.7729 | 0.7677 |
GeoBERT-RandomPath | 0.7739 | 0.7719 |
Word2Vec-CenterDistance | 0.7642 | 0.7359 |
Word2Vec-ShortestPath | 0.7626 | 0.7337 |
Word2Vec-RandomPath | 0.7638 | 0.7344 |
GloVe-CenterDistance | 0.7398 | 0.7093 |
GloVe-ShortestPath | 0.7454 | 0.7144 |
GloVe-RandomPath | 0.7377 | 0.7101 |
Model | MSE | MAE |
---|---|---|
GeoBERT-CenterDistance | 0.1491 | 0.1825 |
GeoBERT-ShortestPath | 0.1446 | 0.1809 |
GeoBERT-RandomPath | 0.1557 | 0.1901 |
Word2Vec-CenterDistance | 0.2563 | 0.2916 |
Word2Vec-ShortestPath | 0.2567 | 0.2920 |
Word2Vec-RandomPath | 0.2569 | 0.2913 |
GloVe-CenterDistance | 0.3825 | 0.3703 |
GloVe-ShortestPath | 0.3772 | 0.3651 |
GloVe-RandomPath | 0.3865 | 0.3700 |
Model | MSE | MAE |
---|---|---|
GeoBERT-CenterDistance | 0.0574 | 0.1578 |
GeoBERT-ShortestPath | 0.0556 | 0.1559 |
GeoBERT-RandomPath | 0.0674 | 0.1724 |
Word2Vec-CenterDistance | 0.3192 | 0.4177 |
Word2Vec-ShortestPath | 0.3190 | 0.4182 |
Word2Vec-RandomPath | 0.3227 | 0.4188 |
GloVe-CenterDistance | 0.4889 | 0.5079 |
GloVe-ShortestPath | 0.4935 | 0.5101 |
GloVe-RandomPath | 0.4945 | 0.5098 |
Model | Accuracy | F1-Score |
---|---|---|
GeoBERT-CenterDistance | 0.8359 | 0.7922 |
GeoBERT-ShortestPath | 0.8256 | 0.7777 |
GeoBERT-RandomPath | 0.8358 | 0.7908 |
Word2Vec-CenterDistance | 0.7846 | 0.7042 |
Word2Vec-ShortestPath | 0.8000 | 0.7254 |
Word2Vec-RandomPath | 0.7821 | 0.6931 |
GloVe-CenterDistance | 0.6744 | 0.5171 |
GloVe-ShortestPath | 0.6923 | 0.5455 |
GloVe-RandomPath | 0.6799 | 0.5039 |
Concat Method | Accuracy | F1-Score |
---|---|---|
MLP | 0.8564 (+2.45%) | 0.8163 (+3.04%) |
Gating | 0.8538 (+2.14%) | 0.8119 (+2.49%) |
Weighted | 0.8436 (+0.92%) | 0.8103 (+2.28%) |
Concat | 0.8435 (+0.91%) | 0.8000 (+0.98%) |
POIs Only | 0.8359 (+0.00%) | 0.7922 (+0.00%) |
Center Distance | Shortest Path | Random Sequence | ||||
---|---|---|---|---|---|---|
Mask Ratio | MSE | MAE | MSE | MAE | MSE | MAE |
15% | 0.1677 | 0.2065 | 0.1697 | 0.2109 | 0.1713 | 0.2090 |
30% | 0.1665 | 0.2066 | 0.1652 | 0.2045 | 0.1718 | 0.2114 |
50% | 0.1726 | 0.2108 | 0.1715 | 0.2095 | 0.1716 | 0.2111 |
70% | 0.1754 | 0.2132 | 0.1653 | 0.2015 | 0.1763 | 0.2166 |
Shortest Path |
---|
‘[CLS]’, ‘Teahouse’, ‘Real Estate’, ‘Store’, ‘Restaurant’, ‘Massage’, ‘Express’, ‘Construction’, ‘Chinese Food’, ‘teahouse’, ‘Chinese Food’, ‘Restaurant’, ‘Park’, ‘Mall Store’, ‘Chinese Food’, ‘KTV’, ‘Office’, ‘Office’, ‘Restaurant’, ‘KTV’, ‘Hotel’, ‘Furniture’, ‘Furniture’, ‘[SEP]’ |
Center Distance Path |
‘[CLS]’, ‘Furniture’, ‘Hotel’, ‘Furniture’, ‘KTV’, ‘KTV’, ‘Restaurant’, ‘Teahouse’, ‘Chinese Food’, ‘Chinese Food’, ‘Restaurant’, ‘Chinese Food’, ‘Mall’, ‘Office’, ‘Park’, ‘Construction’, ‘Office’, ‘Express’, ‘Massage’, ‘Restaurant’, ‘Store’, ‘Real Estate’, ‘Teahouse’, ‘[SEP]’ |
Random Path |
‘[CLS]’, ‘Real Estate’, ‘teahouse’, ‘Office’, ‘Hotel’, ‘Teahouse’, ‘Restaurant’, ‘Express’, ‘Massage’, ‘Store’, ‘Office’, ‘Chinese Food’, ‘Furniture’, ‘Park’, ‘Chinese Food’, ‘Chinese Food’, ‘Construction’, ‘Restaurant’, ‘Restaurant’, ‘KTV’, ‘Mall’, ‘Furniture’, ‘KTV’, ‘[SEP]’ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, Y.; Xiong, Y.; Wang, S.; Wang, H. GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest. Appl. Sci. 2022, 12, 12942. https://doi.org/10.3390/app122412942
Gao Y, Xiong Y, Wang S, Wang H. GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest. Applied Sciences. 2022; 12(24):12942. https://doi.org/10.3390/app122412942
Chicago/Turabian StyleGao, Yunfan, Yun Xiong, Siqi Wang, and Haofen Wang. 2022. "GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest" Applied Sciences 12, no. 24: 12942. https://doi.org/10.3390/app122412942
APA StyleGao, Y., Xiong, Y., Wang, S., & Wang, H. (2022). GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest. Applied Sciences, 12(24), 12942. https://doi.org/10.3390/app122412942