Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework
<p>Technical flow chart of this study.</p> "> Figure 2
<p>UML structure of disaster metadata.</p> "> Figure 3
<p>Schematic diagram of metadata extraction stored in a structured form.</p> "> Figure 4
<p>Training losses and F1 scores with different learning rates and batch sizes on the development set. (<b>a</b>) Training losses with <span class="html-italic">lr</span> ranges 1 × 10<sup>−3</sup>,1 × 10<sup>−4</sup> to 3 × 10<sup>−4</sup>; (<b>b</b>) F1 scores with <span class="html-italic">lr</span> ranges 1 × 10<sup>−3</sup>,1 × 10<sup>−4</sup> to 3 × 10<sup>−4</sup>; (<b>c</b>) training losses with <span class="html-italic">batch_size</span> ranges of 16, 32, and 64; (<b>d</b>) F1 scores with <span class="html-italic">batch_size</span> ranges of 16, 32, and 64.</p> "> Figure 5
<p>Comparison of the extraction effects of five disaster types before and after UIE model tuning. (<b>a</b>) Flood metadata; (<b>b</b>) earthquake metadata; (<b>c</b>) landslide metadata; (<b>d</b>) mudslide metadata; (<b>e</b>) avalanche metadata.</p> "> Figure 6
<p>The consistency, completeness, and accuracy evaluation results of the data table dimension after being imported into the database.</p> "> Figure 7
<p>The consistency, completeness, and accuracy evaluation results of the data item dimensions of each data table after being transferred to the database. (<b>a</b>) DSTi is Dataset Title, DSID is Dataset Identifier ID, DSla is Dataset language, CS is Character set, DSTy is Dataset type, DF is Data format, Keyw is Keywords, SH is Subject headings, DC is Data category, CI is Coverage information, CN is Contact unit, DT is Disaster type, DP is Disaster process, DSCT is Dataset creation time, DSLUT is Dataset last update time; (<b>b</b>) LN is Layer name, I/R CD is Image/raster content description; (<b>c</b>) DQR is Data Quality Report, DL is Data Log; (<b>d</b>) SR is Security restrictions, LR is Legal restrictions; (<b>e</b>) DR is Distributor Resource, TM is Transmission Method, Fo is Format; (<b>f</b>) MDLa is Metadata language, MDCT is Metadata creation time, MDLUT is Metadata latest update time, MDSN is Metadata standard name, MDSV is Metadata standard version, MDCI is Metadata contact information; (<b>g</b>) TR is Time reference, CR is Coordinate reference, SRBGI is Spatial referencing based on geographical identifiers; (<b>h</b>) Ci is Citation, RN is Responsible Unit, Co is Contact, OR is Online Resources; (<b>i</b>) EEI is Extended element information, EOR is Extended online resources; (<b>j</b>) De is Description, GC is Geographic coverage, VRI is Vertical range information, TR is Time range.</p> "> Figure 8
<p>Determining the optimal K-value based on various metrics. (<b>a</b>) SSE; (<b>b</b>) Calinski–Harabaz; (<b>c</b>) Davies–Bouldin.</p> "> Figure 9
<p>Word2vec–Kmeans cluster analysis results: Take flood data as an example.</p> ">
Abstract
:1. Introduction
- (1)
- We construct a unified metadata model framework for natural disasters based on core metadata and complete metadata.
- (2)
- UIE and the Python analysis library are used to realize automatic extraction of unstructured and structured disaster metadata information, and corresponding constraint rules are formulated to establish an evaluation system from the three dimensions of consistency, completeness, and accuracy.
- (3)
- The Word2vec-Kmeans clustering algorithm is used to realize cluster analysis of the extraction results.
2. Research Design
2.1. Experimental Setup and Data Pre-Processing
- (1)
- Clean the original text data, including removing special characters and some meaningless characters to reduce the impact of noise on model training, removing consecutive repetitive characters and redundant blanks in the text to simplify the text structure and improve the efficiency of the analysis, and detecting and deleting duplicate records or duplicate text in the text data to ensure the uniqueness of the data.
- (2)
- Cleaned text is subjected to the segmentation process, which slices the text into sequences of words, converts them into machine-readable forms, and filters out some common stop words. For some datasets, a comparison of segmentation between jieba and Natural Language Processing and Information Retrieval (NLPIR) is conducted. From Table 2, it can be seen that the accuracy of jieba is 65.54%, while the precision and recall of NLPIR are relatively lower. When processing disaster metadata text information, jieba segmentation performs better, with much higher accuracy than NLPIR and relatively fewer cases of text loss.
- (3)
- After tokenizing the words, perform lemmatization and part-of-speech tagging on the words. Utilize the Doccano tool for annotation work to determine entity, relationship, and other information corresponding to each text. Simultaneously, convert the data into the JSON format acceptable by the model.
2.2. Unified Disaster Metadata Model Construction
2.3. Disaster Metadata Extraction
2.3.1. UIE-Based Extraction of Disaster Metadata Stored in Unstructured Form
- Model forward propagation: Input the text of the given metadata into the trained UIE model for forward propagation. During the forward propagation process, the model processes the text and generates various predictions, including entity boundaries (start and end positions), entity types, and relationships between entities.
- Output layer: The model’s output layer typically includes a softmax function. For each predicted category, the softmax function normalizes the predicted scores, converting them into probability form. This ensures that the probabilities of all categories sum up to 1.
- Probability calculation: For each prediction, the softmax function of the model output layer calculates a probability value indicating the confidence the model believes is correct.
2.3.2. Analysis of Disaster Metadata Stored in a Structured Form
2.4. Quality Assessment of Disaster Metadata
- (1)
- Completeness assessment
- (2)
- Consistency assessment
- (3)
- Accuracy assessment
- (4)
- Model accuracy assessment
2.5. Cluster Analysis of Disaster Metadata
3. Results
3.1. UIE Model Optimization
3.2. Disaster Metadata Extraction and Quality Assessment
3.2.1. Results of Disaster Metadata Extraction
3.2.2. Quality Assessment of Extraction Results
3.3. Disaster Metadata Cluster
4. Discussion
5. Conclusions
- (1)
- The UIE and Python parsing libraries were utilized to extract disaster metadata information stored in structured and unstructured forms automatically. The experimental results show that the extraction performance of UIE for five types of natural disasters (floods, earthquakes, landslides, mudslides, and avalanches) all improved by more than 50% when the learning rate was set to 0.0001 and the batch size to 32, which achieved optimal extraction results for disaster metadata information.
- (2)
- Under the three dimensions of consistency, completeness, and accuracy, the metadata standards and unified disaster metadata model framework designed in this study showed good applicability in the field of natural disasters (floods, earthquakes, landslides, mudslides, and avalanches) in terms of both the data table dimension and the data item dimension. Furthermore, the completeness dimension was slightly better than consistency and accuracy.
- (3)
- Combining the Word2vec model and K-means algorithm to cluster analyze the metadata of the flood dataset, the clusters were clustered into five main themes: contact information, location information, time information, format information, and content information. Moreover, at a confidence level of 90% for centroids, the clustering results covered most of the information. In terms of intra-group analysis, there was high similarity among samples within clusters, indicating low internal dissimilarity, which suggests a relatively concentrated distribution of text cluster data. For inter-group analysis, significant differences existed between groups compared to within groups at a significance level of 0.01, while there was slight overlap between some clusters. Overall, the clustering effect was good.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shi, K.; Peng, X.; Lu, H.; Zhu, Y.; Niu, Z. Application of Social Sensors in Natural Disasters Emergency Management: A Review. IEEE Trans. Comput. Soc. Syst. 2023, 10, 3143–3158. [Google Scholar] [CrossRef]
- Ji, S.H.; Satish, N.; Li, S.; Dubey, P.K. Parallelizing Word2Vec in Shared and Distributed Memory. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 2090–2100. [Google Scholar] [CrossRef]
- Liao, Y.; Li, B.; Lv, X.; Cheng, C. Method of Multi-type Disaster Data Organization and Management Based on GeoSOT. Geogr. Geo-Inf. Sci. 2013, 29, 36–40. [Google Scholar]
- Jony, R.I.; Woodley, A.; Perrin, D. Flood Detection in Social Media Images using Visual Features and Metadata. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, WA, Australia, 2–4 December 2019. [Google Scholar] [CrossRef]
- Tian, Y.; Li, W. GeoAI for Knowledge Graph Construction: Identifying Causality Between Cascading Events to Support Environmental Resilience Research arXiv. arXiv 2022, arXiv:2211.06011. [Google Scholar]
- Molina, D.E.; Datcu, M. Data mining and knowledge discovery tools for exploiting big earth observation data. In Proceedings of the 36th International Symposium on Remote Sensing of the Environment (ISRSE), Berlin, Germany, 11–15 May 2015; pp. 627–633. [Google Scholar]
- Eichler, R.; Giebler, C.; Gröger, C.; Schwarz, H.; Mitschang, B. Modeling metadata in data lakes-A generic model. Data Knowl. Eng. 2021, 136, 101931. [Google Scholar] [CrossRef]
- Wang, S.; Wang, J.; Zhan, Q.; Zhang, L.C.; Yao, X.C.; Li, G.Q. A unified representation method for interdisciplinary spatial earth data. Big Earth Data 2023, 7, 136–155. [Google Scholar] [CrossRef]
- Chen, Z.G.; Yang, Y.P. Semantic relatedness algorithm for keyword sets of geographic metadata. Cartogr. Geogr. Inf. Sci. 2020, 47, 125–140. [Google Scholar] [CrossRef]
- Ke, C.; Jiahong, W.; Lizhong, Y. Design and construction of natural disaster metadata standards. Geomat. Spat. Inf. Technol. 2013, 36, 4–8. [Google Scholar]
- Babaie, H.A.; Babaei, A. Developing the earthquake markup language and database with UML and XML schema. Comput. Geosci. 2005, 31, 1175–1200. [Google Scholar] [CrossRef]
- Yu, E.; Acharya, P.; Jaramillo, J.; Kientz, S.; Thomas, V.; Hauksson, E. The Station Information System (SIS): A Centralized Repository for Populating, Managing, and Distributing Metadata of the Advanced National Seismic System Stations. Seismol. Res. Lett. 2018, 89, 47–55. [Google Scholar] [CrossRef]
- Hong, J.H.; Shi, Y.T. Integration of Heterogeneous Sensor Systems for Disaster Responses in Smart Cities: Flooding as an Example. ISPRS Int. J. Geo-Inf. 2023, 12, 279. [Google Scholar] [CrossRef]
- Xiang, Z.R.; Demir, I. Flood Markup Language-A standards-based exchange language for flood risk communication. Environ. Modell. Softw. 2022, 152, 105397. [Google Scholar] [CrossRef]
- Di, L.P.; Shao, Y.Z.; Kang, L.J. Implementation of Geospatial Data Provenance in a Web Service Workflow Environment with ISO 19115 and ISO 19115-2 Lineage Model. IEEE Trans. Geosci. Remote Sens. 2013, 51, 5082–5089. [Google Scholar] [CrossRef]
- Goncharov, M.V.; Kolosov, K.A. The principles of extended metadata formation in RNPLS&T’s Single Open Information Archive. Nauchnye Tek. Bibl. 2023, 1, 84–98. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, F.G.; Zheng, L.L.; Wu, X.J.; Lai, C.Q. CSR-SVM: Compositional semantic representation for intelligent identification of engineering change documents based on SVM. Adv. Eng. Inform. 2023, 57, 15. [Google Scholar] [CrossRef]
- Al-Fuqaha’a, S.; Al-Madi, N.; Hammo, B. A robust classification approach to enhance clinic identification from Arabic health text. Neural Comput. Appl. 2024, 36, 7161–7185. [Google Scholar] [CrossRef]
- Yan, D.C.; Li, G.Q.; Li, X.Q.; Zhang, H.; Lei, H.; Lu, K.X.; Cheng, M.H.; Zhu, F.X. An Improved Faster R-CNN Method to Detect Tailings Ponds from High-Resolution Remote Sensing Images. Remote Sens. 2021, 13, 18. [Google Scholar] [CrossRef]
- Luo, J.; Du, J.; Nie, B.; Xiong, W.; Liu, L.; He, J. TCM text relationship extraction model based on bidirectional LSTM and GBDT. Appl. Res. Comput. 2019, 36, 3744–3747. [Google Scholar]
- Islam, M.S.; Kabir, M.N.; Ab Ghani, N.; Zamli, K.Z.; Zulkifli, N.S.A.; Rahman, M.M.; Moni, M.A. Challenges and future in deep learning for sentiment analysis: A comprehensive review and a proposed novel hybrid approach. Artif. Intell. Rev. 2024, 57, 79. [Google Scholar] [CrossRef]
- Skondras, P.; Zotos, N.; Lagios, D.; Zervas, P.; Giotopoulos, K.C.; Tzimas, G. Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings. Information 2023, 14, 19. [Google Scholar] [CrossRef]
- Qiao, B.; Zou, Z.Y.; Huang, Y.; Fang, K.; Zhu, X.H.; Chen, Y.M. A joint model for entity and relation extraction based on BERT. Neural Comput. Appl. 2022, 34, 3471–3481. [Google Scholar] [CrossRef]
- Lu, Y.J.; Liu, Q.; Dai, D.; Xiao, X.Y.; Lin, H.Y.; Han, X.P.; Sun, L.; Wu, H. Unified Structure Generation for Universal Information Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Acl 2022), Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 5755–5772. [Google Scholar]
- Jie, Z.; Suwen, L.; Junhui, L.; Lifan, G.; Haifeng, Z.; Feng, C. Interpretable Sentiment Analysis Based on UIE. J. Chin. Inf. Process. 2023, 37, 151–157. [Google Scholar]
- ChinaGE-OSS Data Portal. Available online: https://www.chinageoss.cn/datasharing (accessed on 4 January 2024).
- GBT 24888-2010; Technical Requirements of Data Share Foremergency Command in Earthquake Occurrence Site. Standard Press of China: Beijing, China, 2010.
- Dublin Core. Dublin Core™ Metadata Element Set, Version 1.1. Available online: https://www.dublincore.org/specifications/dublin-core/dces/ (accessed on 4 January 2024).
- DB/T 41-2011; Earthquake Data Metadata. China Earthquake Administration: Beijing, China, 2011.
- ISO19115; Geographic Information—Metadata. ISO: Geneva, Switzerland, 2014.
- GB/T 19710-2005; Geographic information—Metadata. Standard Press of China: Beijing, China, 2005.
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Breuel, T.M. The Effects of Hyperparameters on SGD Training of Neural Networks. arXiv 2015, arXiv:1508.02788. [Google Scholar]
- Wang, R.Y.; Strong, D.M. Beyond Accuracy: What Data Quality Means to Data Consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Reiche, K.J.; Höfig, E. Implementation of Metadata Quality Metrics and Application on Public Government Data. In Proceedings of the IEEE 37th Annual Computer Software and Applications Conference (COMPSAC), Kyoto, Japan, 22–26 July 2013; pp. 236–241. [Google Scholar]
- Nogueras-Iso, J.; Lacasta, J.; Ureña-Cámara, M.A.; Ariza-López, F.J. Quality of Metadata in Open Data Portals. IEEE Access 2021, 9, 60364–60382. [Google Scholar] [CrossRef]
- Kuzma, M.; Moscicka, A. Metadata evaluation criteria in respect to archival maps description A systematic literature review. Electron. Libr. 2020, 38, 1–27. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2011, arXiv:2010.16061. [Google Scholar]
- Rong, X. word2vec Parameter Learning Explained. arXiv 2014, arXiv:1411.2738. [Google Scholar]
- Ma, L.; Zhang, Y.Q. Using Word2Vec to Process Big Text Data. In Proceedings of the IEEE International Conference on Big Data, Santa Clara, CA, USA, 29 October–1 November 2015; pp. 2895–2897. [Google Scholar]
- Fesseha, A.; Xiong, S.W.; Emiru, E.D.; Diallo, M.; Dahou, A. Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya. Information 2021, 12, 17. [Google Scholar] [CrossRef]
- Dimitriadis, G.; Neto, J.P.; Kampff, A.R. t-SNE Visualization of Large-Scale Neural Recordings. Neural Comput. 2018, 30, 1750–1774. [Google Scholar] [CrossRef] [PubMed]
- Atzberger, D.; Cech, T.; Trapp, M.; Richter, R.; Scheibel, W.; Dollner, J.; Schreck, T. Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization. IEEE Trans. Vis. Comput. Graph. 2024, 30, 902–912. [Google Scholar] [CrossRef] [PubMed]
- Hu, C.X.; Wu, T.; Liu, S.Q.; Liu, C.S.; Ma, T.; Yang, F. Joint unsupervised contrastive learning and robust GMM for text clustering. Inf. Process. Manag. 2024, 61, 17. [Google Scholar] [CrossRef]
- Xu, Q.; Gu, H.; Ji, S.W. Text clustering based on pre-trained models and autoencoders. Front. Comput. Neurosci. 2024, 17, 13. [Google Scholar] [CrossRef] [PubMed]
- González, F.; Torres-Ruiz, M.; Rivera-Torruco, G.; Chonona-Hernández, L.; Quintero, R. A Natural-Language-Processing-Based Method for the Clustering and Analysis of Movie Reviews and Classification by Genre. Mathematics 2023, 11, 26. [Google Scholar] [CrossRef]
- Liu, X.D.; Tian, Y.Z.; Zhang, X.Q.; Wan, Z.Y. Identification of Urban Functional Regions in Chengdu Based on Taxi Trajectory Time Series Data. ISPRS Int. J. Geo-Inf. 2020, 9, 19. [Google Scholar] [CrossRef]
- Cao, Q.; Wang, S.; Chen, Z.; Li, G.; Li, J. The Method of Extracting Names of Geo-science Data based on Regular Expressions. J. Geo-Inf. Sci. 2023, 25, 1601–1610. [Google Scholar]
- Evans, M.T.C.; Latifi, M.; Ahsan, M.; Haider, J. Leveraging Semantic Text Analysis to Improve the Performance of Transformer-Based Relation Extraction. Information 2024, 15, 91. [Google Scholar] [CrossRef]
- Bartoli, A.; De Lorenzo, A.; Medvet, E.; Tarlao, F. Inference of Regular Expressions for Text Extraction from Examples. IEEE Trans. Knowl. Data Eng. 2016, 28, 1217–1230. [Google Scholar] [CrossRef]
- Fagin, R.; Kimelfeld, B.; Reiss, F.; Vansummeren, S. Document Spanners: A Formal Approach to Information Extraction. J. ACM 2015, 62, 51. [Google Scholar] [CrossRef]
- Gong, Y.; Mao, L.; Li, C.L. Few-shot Learning for Named Entity Recognition Based on BERT and Two-level Model Fusion. Data Intell. 2021, 3, 568–577. [Google Scholar] [CrossRef]
- Bello, A.; Ng, S.C.; Leung, M.F. A BERT Framework to Sentiment Analysis of Tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef] [PubMed]
Disaster Type | Time | Data Format | Data Size | Data Source | Counts of Events |
---|---|---|---|---|---|
Flood | 2013–2021 | XLS/XLSX, SHP, TIFF, DOC/DOCX, CSV, TXT | 5.25 GB | ChinaGEOSS Data Portal (chinageoss.cn) [26] | 42 |
Earthquake | 1900–2022 | 713 MB | 76 | ||
Landslide | 1995–2022 | 58 GB | 56 | ||
Mudslide | 2005–2022 | 429 MB | 47 | ||
Avalanche | 2006–2019 | 854 MB | 47 |
The Name of the Word Tokenizer | P | R | F1 |
---|---|---|---|
NLPIR | 0.298 | 0.380 | 0.334 |
Jieba | 0.655 | 0.731 | 0.691 |
Natural Disaster Comprehensive Metadata | Individual Natural Disaster Metadata | Related Information Field Metadata | |
---|---|---|---|
EM-DAT | Earthquake eML | GB/T 24888-2010 [27] | Dublin Core Metadata Standard (DC1.1) [28] |
EU-MEDIN RDF Schema | TWML | DB/T 41-2011 [29] | ISO19115 [30] |
FGDC Content Standards for Digital Geospatial Metadata | Metadata Standard for Seismic Mitigation and Disaster Prevention Planning | Geological Disaster Emergency Information Resource Metadata Standard | Core Metadata Standard for Earth System Science Data Sharing |
Geoscience Australia Metadata | General Metadata Standard for Emergency Field | Debris Flow Disaster Emergency Metadata Standard | GB/T 19710-2005/ISO 19115:2003, MOD [31] |
Metadata Standard for Natural Disasters | Core Metadata for Earthquake Data Resources | Core Metadata for Geological Disaster Monitoring Dataset | |
CWML |
Structural Elements | Description of Sub-Elements or Corresponding Structural Elements | Abbreviations |
---|---|---|
Metadata identification information | The basic information needed to uniquely identify data resources | MDID info |
Content information | Information describing the content of the dataset | Cont info |
Data-quality information | Evaluation information regarding the quality of the dataset | DQ info |
Restriction information | Information containing restrictions on access and use of resources | Restr info |
Distribution information | Description of the distributor of the dataset and methods for obtaining data, providing reference material names and dates, as well as responsible unit names, duties, contacts, and other information | Distr info |
Metadata reference information | Contains descriptions of the metadata standards themselves, including metadata standard names, versions, etc. | MDRef info |
Reference system information | Provides spatial reference system and temporal reference system information | RS info |
Extended information | Provides extension information for implementation when specialized standards need to be established and the required metadata elements or entities are not present in this standard | Ext info |
Citation and responsible party information | Provides information about responsible units and individuals related to the data, as well as materials, datasets, models, or literature used for referencing or referring to the dataset | CRP info |
Coverage information | Defines and describes metadata for the spatial and temporal coverage of resources | Cov info |
Name/Role Name | Constraint/Condition | Name/Role Name | Constraint/Condition |
---|---|---|---|
Use restrictions | O | Dataset keywords | M |
System unique identifier ID | M | Dataset identifier/ID | C |
Dataset type | M | Dataset creation time | M |
Dataset title | M | Dataset contact information | M |
Dataset thumbnail | O | Dataset character set | O |
Dataset summary | M | Dataset access restrictions | O |
Dataset subject category | M | Data-quality report | C |
Dataset security restriction level | M | Data log | C |
Dataset language | M | Data format | M |
Parameter | Description | Experimental Setting |
---|---|---|
Learning rate | The time interval for updating model parameters per training epoch; values range from 0 to 1. | Adjusting from 1 × 10−3 or 1 × 10−4 to 3 × 10−4 |
Batch_size | Batch size | Adjusting from batch sizes of 16 or 32 to 64 |
num_epochs | Number of training epochs | Setting the maximum iteration rounds to 400 |
Model | Model selection: program performs model fine-tuning based on the selected model | UIE-base |
Completeness Constraint Rules | Content |
---|---|
Primary key constraint rule | Primary key attribute values must exist and be unique. |
Composite primary key constraint rule | A primary key composed of two or more fields must exist and be unique. |
Not null constraint rule | Values must exist and cannot be null (non-primary key). |
Unique constraint rule | Values must be unique and cannot have duplicates (non-primary key). |
Continuity constraint rule | Values must be continuous. |
Candidate key constraint rule | Values must exist and be unique (non-primary key). |
Consistency Constraint Rules | Content |
---|---|
Foreign key constraint rule | The values of the foreign key attribute column in the relation table must be consistent with the attribute values of the associated primary key. That is, the values of the foreign key attribute column must be referenced by the primary key. |
Equality consistency constraint rule | Values must be calculated based on one or more attribute columns in one or more relation tables. |
Logical consistency constraint rule | Values must have a logical relationship with one or more attribute columns in one or more relation tables. |
Existence consistency constraint rule | Values must have a matching relationship with another attribute column. |
Accuracy Constraint Rules | Content |
---|---|
Data-type constraint rule | All value types must satisfy the data type defined under the attribute column. |
Length constraint rule | String lengths must meet the given length constraint. |
Precision constraint rule | Floating-point values must satisfy the given precision constraint. |
Data format rule | Values must satisfy the given data format. |
Value range rule | Values must be within the given value range. |
Fixed-value constraint rule | Values must be in the given set. |
CSS | ESS | F-Value | p-Value | |
---|---|---|---|---|
x | 73.60 | 0.10 | 714.40 | 4.83 × 10-168 |
y | 48.79 | 0.08 | 603.90 | 1.14 × 10-156 |
Methods | P | R | F1 |
---|---|---|---|
Regular expressions | 0.655 | 0.731 | 0.691 |
BERT | 0.802 | 0.811 | 0.806 |
UIE | 0.842 | 0.794 | 0.778 |
Self-trained UIE | 0.926 | 0.911 | 0.918 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Shi, X.; Yang, H.; Yu, B.; Cai, Y. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework. ISPRS Int. J. Geo-Inf. 2024, 13, 201. https://doi.org/10.3390/ijgi13060201
Wang Z, Shi X, Yang H, Yu B, Cai Y. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework. ISPRS International Journal of Geo-Information. 2024; 13(6):201. https://doi.org/10.3390/ijgi13060201
Chicago/Turabian StyleWang, Zongmin, Xujie Shi, Haibo Yang, Bo Yu, and Yingchun Cai. 2024. "Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework" ISPRS International Journal of Geo-Information 13, no. 6: 201. https://doi.org/10.3390/ijgi13060201
APA StyleWang, Z., Shi, X., Yang, H., Yu, B., & Cai, Y. (2024). Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework. ISPRS International Journal of Geo-Information, 13(6), 201. https://doi.org/10.3390/ijgi13060201