CN118113788B

CN118113788B - GIS data processing method and system based on multi-source data fusion

Info

Publication number: CN118113788B
Application number: CN202410005144.XA
Authority: CN
Inventors: 史宏垒; 刘鲁光; 刘伟; 袁刚; 冷寿明; 晋松
Original assignee: Xinjiang Duyi Huanqiu Technology Co ltd
Current assignee: Xinjiang Duyi Huanqiu Technology Co ltd
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-10-29
Anticipated expiration: 2044-01-03
Also published as: CN118113788A

Abstract

The present invention relates to the field of geographic information system data processing technology, and in particular to a GIS data processing method and system based on multi-source data fusion. The method obtains the similarity of spatial attribute information between unsampled positions and collection areas, and then determines the spatial correlation factor. The isolation factor of each collection area is analyzed, and the bias factor is obtained according to the relative distance between each collection area and the unsampled position. The interpolation weight of each collection area relative to the unsampled position is obtained by combining the spatial correlation factor, the isolation factor and the bias factor. The unsampled position is interpolated using the interpolation weight with strong reference, and the corresponding elevation data is obtained, and then the multi-source data is fused according to the interpolated GIS data. The present invention can obtain accurate interpolation results of the unsampled position by obtaining the interpolation weight with strong reference, and then realize the fusion display of multi-source data according to the complete GIS data.

Description

GIS data processing method and system based on multi-source data fusion

技术领域Technical Field

本发明涉及地理信息系统数据处理技术领域，具体涉及一种基于多源数据融合的GIS数据处理方法及系统。The present invention relates to the technical field of geographic information system data processing, and in particular to a GIS data processing method and system based on multi-source data fusion.

背景技术Background Art

地理信息系统(Geo－Information system，GIS)是一种用于捕捉、存储、检索和展示地理信息数据的技术。在对城市环境的地理信息系统构建过程中，所采集到的GIS数据包含高程数据以及数字线划数据等空间属性信息，在进行多远数据融合展示时会将每个维度下的数据都进行可视化处理，进而在一个展示平台下融合展示每个维度的可视化数据。但是在对城市范围的高程数据采集过程中，由于设备问题以及城市政策规定问题，某些位置不存在高程数据，需要这类未采样位置进行插值处理，获取数据后再进行可视化处理进而实现多源数据融合展示。Geographic Information System (GIS) is a technology used to capture, store, retrieve and display geographic information data. In the process of building a geographic information system for urban environments, the collected GIS data contains spatial attribute information such as elevation data and digital line data. When performing multi-dimensional data fusion display, the data in each dimension will be visualized, and then the visualized data of each dimension will be fused and displayed on a display platform. However, in the process of collecting elevation data in the city, due to equipment problems and urban policy regulations, there is no elevation data in some locations. These unsampled locations need to be interpolated, and after the data is obtained, it is visualized to achieve multi-source data fusion display.

在现有技术中，对于GIS数据中的高程数据进行插值时会考虑到未采样位置与已经存在数据的采集区域之间的距离构建插值权重进行插值，但是在实际的城市环境中，因为建筑物分布的原因，城市地形具有复杂的特征，存在不同的建筑类型，若直接采用距离构建插值权重则最终得到的插值结果参考性较差，无法形成准确的可视化融合展示结果。In the prior art, when interpolating elevation data in GIS data, the distance between the unsampled location and the collection area where data already exists is considered to construct interpolation weights for interpolation. However, in the actual urban environment, due to the distribution of buildings, the urban terrain has complex characteristics and there are different types of buildings. If the distance is directly used to construct the interpolation weight, the final interpolation result will have poor reference value and cannot form an accurate visual fusion display result.

发明内容Summary of the invention

为了解决现有技术中不能获得合理的插值权重导致插值结果参考性较差，进而影响GIS数据多元数据融合展示的技术问题，本发明的目的在于提供一种基于多源数据融合的GIS数据处理方法及系统，所采用的技术方案具体如下：In order to solve the technical problem that the reasonable interpolation weight cannot be obtained in the prior art, resulting in poor reference of the interpolation result, which in turn affects the multi-source data fusion display of GIS data, the purpose of the present invention is to provide a GIS data processing method and system based on multi-source data fusion. The technical scheme adopted is as follows:

本发明提出了一种基于多源数据融合的GIS数据处理方法，所述方法包括：The present invention proposes a GIS data processing method based on multi-source data fusion, the method comprising:

获取城市范围内GIS数据；所述GIS数据包括整个城市范围的空间属性信息，以及预设采集区域内的高程数据；所述城市范围内还包括未采样位置；Acquire GIS data within the city; the GIS data includes spatial attribute information of the entire city and elevation data within a preset collection area; the city also includes unsampled locations;

根据所述未采样位置的预设邻域范围与所述采集区域之间的空间属性信息相似度，获得每个采集区域与所述未采样位置之间的空间相关因子；根据所述采集区域之间的位置分布以及空间属性信息相似情况，获得每个采集区域的孤立因子；获得未采样位置与每个所述采集区域之间相对距离，根据每个采集区域与其他采集区域之间相对距离的差异，以及每个采集区域与未采样位置之间的位置分布，获得每个采集区域与未采样位置之间的偏向因子；According to the similarity of spatial attribute information between the preset neighborhood range of the unsampled position and the collection area, a spatial correlation factor between each collection area and the unsampled position is obtained; according to the position distribution between the collection areas and the similarity of spatial attribute information, an isolation factor of each collection area is obtained; the relative distance between the unsampled position and each of the collection areas is obtained, and according to the difference in relative distance between each collection area and other collection areas, and the position distribution between each collection area and the unsampled position, a bias factor between each collection area and the unsampled position is obtained;

根据所述孤立因子、所述偏向因子和所述空间相关因子获得每个采集区域相对于所述未采样位置的插值权重；基于所述插值权重，根据所述采集区域内的高程数据对所述未采样位置的数据进行插值，获得未采样位置处的高程数据；Obtaining an interpolation weight of each acquisition area relative to the unsampled position according to the isolation factor, the deflection factor and the spatial correlation factor; interpolating the data of the unsampled position according to the elevation data in the acquisition area based on the interpolation weight to obtain the elevation data at the unsampled position;

根据插值处理后的GIS数据进行多源数据融合展示。Multi-source data fusion display is performed based on the GIS data after interpolation processing.

进一步地，选择面积最小的采集区域的尺寸作为所述邻域范围的尺寸，以所述未采样位置为中心构建所述邻域范围。Furthermore, the size of the smallest acquisition area is selected as the size of the neighborhood range, and the neighborhood range is constructed with the unsampled position as the center.

进一步地，所述空间属性信息包括数字线划数据和建筑设施类型文本向量。Furthermore, the spatial attribute information includes digital line drawing data and building facility type text vectors.

进一步地，所述空间相关因子的获取方法包括：Furthermore, the method for obtaining the spatial correlation factor includes:

获取所述未采样位置的预设邻域范围与所述采集区域之间建筑设施类型文本向量的第一余弦相似度，以及数字线划数据的第二余弦相似度；将所述第一余弦相似度和所述第二余弦相似度的和，作为所述空间相关因子。Obtain a first cosine similarity of the building facility type text vector between the preset neighborhood range of the unsampled position and the collection area, and a second cosine similarity of the digital line stroke data; and use the sum of the first cosine similarity and the second cosine similarity as the spatial correlation factor.

进一步地，所述孤立因子的获取公式包括：Furthermore, the formula for obtaining the isolation factor includes:

其中g_n为第n个采集区域的孤立因子，N为采集区域的数量，j为所有采集区域中除了第n个采集区域之外的其他采集区域的序号，e为自然常数，l_nj为第n个采集区域与第j个其他采集区域之间的距离，a_nj为第n个采集区域与第j个其他采集区域之间的空间相关因子。 Where g _n is the isolation factor of the nth collection area, N is the number of collection areas, j is the serial number of all collection areas except the nth collection area, e is a natural constant, l _nj is the distance between the nth collection area and the jth other collection area, and a _nj is the spatial correlation factor between the nth collection area and the jth other collection area.

进一步地，所述偏向因子的获取公式包括：Furthermore, the formula for obtaining the bias factor includes:

其中，G_n为第n个采集区域的偏向因子；softmax()为归一化指数函数；s_n为第n个采集区域的面积；N为采集区域的数量；j为所有采集区域中除了第n个采集区域之外的其他采集区域的序号；l_n为第n个采集区域与所述未采样位置之间的相对距离；l_j为第j个其他采集区域与所述未采样位置之间的相对距离；以未采样位置为顶点向第n个采集区域和第j个其他采集区域的中心点分别做射线，射线的夹角为θ_nj；sin()为正弦函数；s_j为第j个其他采集区域的面积。 Among them, _Gn is the bias factor of the nth acquisition area; softmax() is the normalized exponential function; _sn is the area of the nth acquisition area; N is the number of acquisition areas; j is the serial number of all acquisition areas except the nth acquisition area; _ln is the relative distance between the nth acquisition area and the unsampled position; _lj is the relative distance between the jth other acquisition area and the unsampled position; with the unsampled position as the vertex, rays are drawn to the center points of the nth acquisition area and the jth other acquisition area, and the angle of the ray is _θnj ; sin() is the sine function; _sj is the area of the jth other acquisition area.

进一步地，所述根据所述孤立因子、所述偏向因子和所述空间相关因子获得每个采集区域相对于所述未采样位置的插值权重的获取方法包括：Furthermore, the method for obtaining the interpolation weight of each acquisition area relative to the unsampled position according to the isolation factor, the bias factor and the spatial correlation factor includes:

将所述孤立因子、所述偏向因子和所述空间相关因子相乘后进行归一化处理，获得每个采集区域相对于所述未采样位置的所述插值权重。The isolation factor, the bias factor and the spatial correlation factor are multiplied and then normalized to obtain the interpolation weight of each acquisition area relative to the unsampled position.

进一步地，获得插值权重后还包括：Furthermore, after obtaining the interpolation weight, the following steps are also included:

根据每个未采样位置与每个所述采集区域之间相对距离，获得每个采集区域相对于所述未采样位置的距离权重，将所述距离权重与所述插值权重的平均值作为新的插值权重对插值权重进行更新。According to the relative distance between each unsampled position and each of the acquisition areas, the distance weight of each acquisition area relative to the unsampled position is obtained, and the interpolation weight is updated by taking the average value of the distance weight and the interpolation weight as a new interpolation weight.

进一步地，采用反距离加权插值法对所述未采样位置的数据进行插值，获得未采样位置处的高程数据。Furthermore, the data of the unsampled position is interpolated using an inverse distance weighted interpolation method to obtain elevation data at the unsampled position.

本发明还提出了一种基于多源数据融合的GIS数据处理系统，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现任意一项所述一种基于多源数据融合的GIS数据处理方法的步骤。The present invention also proposes a GIS data processing system based on multi-source data fusion, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any one of the steps of the GIS data processing method based on multi-source data fusion.

本发明具有如下有益效果：The present invention has the following beneficial effects:

本发明实施例基于GIS数据的特点，考虑到GIS数据中存在信息丰富的空间属性信息，并且空间属性信息不为容易缺失或者采集不到的数据，因此在考虑未采样位置与采集区域之间的相关性时利用空间属性信息相似度进行分析，获取空间相关因子。进一步考虑到分析采集区域与未采样位置之间相关关系时不能仅考虑距离的远近，还需要考虑到城市范围中的位置分布，若未采样位置与某一采集区域在城市范围中均属于孤立区域，并且未采样位置更偏向于该采集区域，则说明该采集区域的数据对于未采样位置而言具有极大的参考性，因此分析采集区域的孤立因子以及采集区域与未采样位置之间的偏向因子，进而结合空间相关因子即可获得参考性强的插值权重。进而对未采样位置的数据进行插值即可获得准确的高程数据，通过插值将未采样或者缺失的数据进行填补，在后续过程中进行可视化处理，结合其他维度的数据即可完成有效的多源数据融合展示。The embodiment of the present invention is based on the characteristics of GIS data. Considering that there is rich spatial attribute information in GIS data, and the spatial attribute information is not easy to be missing or collected, the spatial attribute information similarity is used to analyze when considering the correlation between the unsampled position and the collection area to obtain the spatial correlation factor. Further considering that when analyzing the correlation between the collection area and the unsampled position, it is not only necessary to consider the distance, but also to consider the location distribution in the urban area. If the unsampled position and a certain collection area are both isolated areas in the urban area, and the unsampled position is more inclined to the collection area, it means that the data of the collection area is of great reference to the unsampled position. Therefore, the isolation factor of the collection area and the bias factor between the collection area and the unsampled position are analyzed, and then the interpolation weight with strong reference can be obtained by combining the spatial correlation factor. Then, the data of the unsampled position can be interpolated to obtain accurate elevation data, and the unsampled or missing data can be filled by interpolation. Visual processing is performed in the subsequent process, and effective multi-source data fusion display can be completed by combining data of other dimensions.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案和优点，下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它附图。In order to more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings required for use in the embodiments or the prior art descriptions are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本发明一个实施例所提供的一种基于多源数据融合的GIS数据处理方法流程图。FIG1 is a flow chart of a GIS data processing method based on multi-source data fusion provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了更进一步阐述本发明为达成预定发明目的所采取的技术手段及功效，以下结合附图及较佳实施例，对依据本发明提出的一种基于多源数据融合的GIS数据处理方法及系统，其具体实施方式、结构、特征及其功效，详细说明如下。在下述说明中，不同的“一个实施例”或“另一个实施例”指的不一定是同一实施例。此外，一或多个实施例中的特定特征、结构或特点可由任何合适形式组合。In order to further explain the technical means and effects adopted by the present invention to achieve the predetermined invention purpose, the following is a detailed description of the GIS data processing method and system based on multi-source data fusion proposed by the present invention, its specific implementation method, structure, features and effects, in conjunction with the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" does not necessarily refer to the same embodiment. In addition, specific features, structures or characteristics in one or more embodiments may be combined in any suitable form.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

下面结合附图具体的说明本发明所提供的一种基于多源数据融合的GIS数据处理方法及系统的具体方案。The following is a detailed description of a specific solution of a GIS data processing method and system based on multi-source data fusion provided by the present invention in conjunction with the accompanying drawings.

基于多源数据融合的GIS数据处理方法实施例：GIS data processing method embodiment based on multi-source data fusion:

请参阅图1，其示出了本发明一个实施例提供的一种基于多源数据融合的GIS数据处理方法流程图，该方法包括：Please refer to FIG. 1 , which shows a flow chart of a GIS data processing method based on multi-source data fusion provided by an embodiment of the present invention. The method includes:

步骤S1：获取城市范围内GIS数据。Step S1: Obtain GIS data within the city.

基于地理信息系统即可获得城市范围内的GIS数据，其中GIS数据包含整个城市范围的空间属性信息，以及预设采集区域内的高程数据。需要说明的是，存在预设采集区域的原因为在城市范围内采集高程区域时会有些如政府机关等重要区域无法进行采集，因此需要预先设置可以进行高程数据采集的采集区域，在每个采集区域中具有明确的经纬数据，每个经纬数据对应一个采样位置，即一个采集区域是由多个采样位置组成的，进而利用无人机等高程数据采集装置对每个采样位置进行处理，获得对应的高程数据，即一个采集区域中的高程数据是由对应的多个采样位置的高程数据组成的数据集合。Based on the geographic information system, GIS data within the city can be obtained, where GIS data contains spatial attribute information of the entire city and elevation data within the preset collection area. It should be noted that the reason for the preset collection area is that when collecting elevation areas within the city, some important areas such as government agencies cannot be collected. Therefore, it is necessary to pre-set the collection area for elevation data collection. In each collection area, there is clear longitude and latitude data, and each longitude and latitude data corresponds to a sampling position, that is, a collection area is composed of multiple sampling positions, and then each sampling position is processed using an elevation data collection device such as a drone to obtain the corresponding elevation data, that is, the elevation data in a collection area is a data set composed of the elevation data of the corresponding multiple sampling positions.

在本发明一个实施例中，采集区域内的采样位置之间的间隔可设置为1米。In one embodiment of the present invention, the interval between sampling locations within the collection area may be set to 1 meter.

在采集到高程数据进行传输过程中，可能会因为设备问题导致某个具体采样位置的数据丢失，则在最终的结果中即可视为没有进行采样，即该位置变成了未采样位置。同样属于未采样位置的还有真正未进行采样操作的位置，此类位置均需要进行插值处理进而填补数据，在后续进行多源融合展示时能够确保每个位置均对应有空间属性信息以及高程数据。When collecting elevation data for transmission, data at a specific sampling location may be lost due to equipment problems. In the final result, it can be considered that no sampling has been performed, that is, the location becomes an unsampled location. Also considered as unsampled locations are locations where no sampling operations have been performed. Such locations need to be interpolated to fill in the data, so that each location can be guaranteed to have spatial attribute information and elevation data in the subsequent multi-source fusion display.

优选地，在本发明一个实施例中，空间属性信息包括数字线划数据和建筑设施类型文本向量。需要说明的是数字线划数据为地理信息系统中的数字线划地图中的数据，是现有地形图上基础地理要素分层存储的矢量数据集，数字线划数据包括空间信息也包括属性信息，例如河流、道路等矢量信息，也包含例如道路等级、限速、宽度等的属性信息。建筑类型文本可根据已知的城市地图获得，例如居民楼、写字楼等文本描述，进而根据语言模型将对应为本输出成向量。需要说明的是，数字线划数据和建筑设施类型文本向量的获取方法均为本领域技术人员熟知的技术手段，在此不做赘述。Preferably, in one embodiment of the present invention, the spatial attribute information includes digital line drawing data and building facility type text vectors. It should be noted that the digital line drawing data is the data in the digital line drawing map in the geographic information system, which is a vector data set that stores the basic geographic elements in layers on the existing topographic map. The digital line drawing data includes both spatial information and attribute information, such as vector information such as rivers and roads, and also includes attribute information such as road grade, speed limit, width, etc. The building type text can be obtained based on a known city map, such as text descriptions of residential buildings, office buildings, etc., and then the corresponding text is output as a vector based on the language model. It should be noted that the methods for obtaining digital line drawing data and building facility type text vectors are technical means well known to those skilled in the art, and will not be elaborated here.

步骤S2：根据未采样位置的预设邻域范围与采集区域之间的空间属性信息相似度，获得每个采集区域与未采样位置之间的空间相关因子；根据采集区域之间的位置分布以及空间属性信息相似情况，获得每个采集区域的孤立因子；获得未采样位置与每个采集区域之间相对距离，根据每个采集区域与其他采集区域之间相对距离的差异，以及每个采集区域与未采样位置之间的位置分布，获得每个采集区域与未采样位置之间的偏向因子。Step S2: According to the similarity of spatial attribute information between the preset neighborhood range of the unsampled position and the collection area, obtain the spatial correlation factor between each collection area and the unsampled position; according to the position distribution between the collection areas and the similarity of spatial attribute information, obtain the isolation factor of each collection area; obtain the relative distance between the unsampled position and each collection area, and according to the difference in relative distance between each collection area and other collection areas, and the position distribution between each collection area and the unsampled position, obtain the bias factor between each collection area and the unsampled position.

考虑到现有技术中只考虑采集区域与未采样位置之间的距离构建插值权重并没有很强参考性，最终得到的插值结果准确性较差。因此本发明实施例首先根据空间属性信息进行分析，因为未采样位置为一个具体的位置点，因此其包含的空间属性信息与采集区域中的空间属性信息相比信息量较少，为了能够准确表征未采样位置与采集区域之间的相关性，以未采样位置为中心构建其预设邻域范围，即分析邻域范围与采集区域之间的空间属性信息相似度，获得每个采集区域与对应未采样位置之间的空间相关因子。空间相关因子越大，说明采集区域与未采样位置之间的空间属性信息越接近，则在后续插值时对应采集区域的插值权重需要调大，使得能够获得准确的插值结果。Considering that the prior art only considers the distance between the acquisition area and the unsampled position to construct the interpolation weight, which is not very referenceable, the final interpolation result has poor accuracy. Therefore, the embodiment of the present invention first analyzes according to the spatial attribute information. Because the unsampled position is a specific position point, the spatial attribute information it contains is less than the spatial attribute information in the acquisition area. In order to accurately characterize the correlation between the unsampled position and the acquisition area, the preset neighborhood range is constructed with the unsampled position as the center, that is, the spatial attribute information similarity between the neighborhood range and the acquisition area is analyzed to obtain the spatial correlation factor between each acquisition area and the corresponding unsampled position. The larger the spatial correlation factor, the closer the spatial attribute information between the acquisition area and the unsampled position is. In the subsequent interpolation, the interpolation weight of the corresponding acquisition area needs to be increased so that an accurate interpolation result can be obtained.

优选地，在本发明一个实施例中，考虑到未采样位置的邻域范围不易设置过大，设置过大导致范围内信息与未采样位置相关性较差，会降低参考性；设置过小也同样会导致信息较少，降低参考性。因此以采集区域作为参考，选择面积最小的采集区域的尺寸作为所述邻域范围的尺寸，以所述未采样位置为中心构建所述邻域范围。Preferably, in one embodiment of the present invention, it is considered that the neighborhood range of the unsampled position should not be set too large. If it is set too large, the information in the range will be less relevant to the unsampled position, which will reduce the reference value; if it is set too small, it will also lead to less information and reduce the reference value. Therefore, taking the acquisition area as a reference, the size of the smallest acquisition area is selected as the size of the neighborhood range, and the neighborhood range is constructed with the unsampled position as the center.

需要说明的是，若采集区域为规则形状，例如圆形，或者其正方形等，则可字节根据具体的尺寸构建邻域范围；若采集区域为不规则的形状，则可以选择最小采集区域的最小外接圆或者最小外接矩形的尺寸作为邻域范围的尺寸进行构建圆形或者矩形，具体不再赘述。It should be noted that if the collection area is a regular shape, such as a circle or a square, the neighborhood range can be constructed according to the specific size; if the collection area is an irregular shape, the size of the minimum circumscribed circle or the minimum circumscribed rectangle of the minimum collection area can be selected as the size of the neighborhood range to construct a circle or rectangle, and the details will not be repeated.

优选地，在本发明一个实施例中，考虑到空间属性信息具有两个维度下的信息，即包含建筑设施文本向量以及数字线划数据，因此需要分别计算相似度，所以获取未采样位置的预设邻域范围与采集区域之间建筑设施类型文本向量的第一余弦相似度，以及数字线划数据的第二余弦相似度；将第一余弦相似度和第二余弦相似度的和，作为空间相关因子。在本发明一个实施例中，考虑到余弦相似度的值域范围为[-1,1]，为了方便后续运算，将所计算的余弦相似度进行归一化处理，并且考虑到数字线划数据具有多条，因此需要计算每条数字线划数据之间的余弦相似度，进而求平均，获得第二余弦相似度，具体空间相关因子用公式表示为：Preferably, in one embodiment of the present invention, considering that the spatial attribute information has information in two dimensions, namely, it contains building facility text vectors and digital line stroke data, it is necessary to calculate the similarity separately, so the first cosine similarity of the building facility type text vector between the preset neighborhood range of the unsampled position and the collection area, and the second cosine similarity of the digital line stroke data are obtained; the sum of the first cosine similarity and the second cosine similarity is used as the spatial correlation factor. In one embodiment of the present invention, considering that the value range of the cosine similarity is [-1,1], in order to facilitate subsequent calculations, the calculated cosine similarity is normalized, and considering that there are multiple digital line stroke data, it is necessary to calculate the cosine similarity between each digital line stroke data, and then average it to obtain the second cosine similarity. The specific spatial correlation factor is expressed by the formula:

其中，a_n为第n个采集区域与未采样位置之间的空间相关因子，softmax( )为归一化指数函数，cosθ_n为第n个采集区域与未采样位置之间的第一余弦相似度；M为第n个采集区域与未采样位置的邻域范围之间，数字线划数据之间余弦相似度的数量，即统计未采样位置的邻域范围中每条数字线划数据与第n个采集区域中每条数字线划数据之间的余弦相似度，获得对应数量；h_nm为第n个采集区域与未采样位置的邻域范围之间第m个数字线划数据之间的余弦相似度，即为第二余弦相似度。 Among them, a _n is the spatial correlation factor between the nth acquisition area and the unsampled position, softmax () is the normalized exponential function, cosθ _n is the first cosine similarity between the nth acquisition area and the unsampled position; M is the number of cosine similarities between the digital line data between the nth acquisition area and the neighborhood range of the unsampled position, that is, the cosine similarity between each digital line data in the neighborhood range of the unsampled position and each digital line data in the nth acquisition area is counted to obtain the corresponding number; h _nm is the cosine similarity between the mth digital line data between the nth acquisition area and the neighborhood range of the unsampled position, that is, is the second cosine similarity.

在城市范围中，因为城市规划的问题，各个区域在城市中处于不同的分布，有些区域分布较为集中，有些区域分布较为孤立，若需要获得未采样位置准确的插值结果，需要使得处于同样位置分布下的采集区域的插值权重更大，因此仅考虑空间属性信息获得的空间相关因子去获取插值权重并不能得到有效的结果，还需要进一步考虑到采集区域以及未采样位置在城市范围内的分布特征。进一步考虑到采集区域为预先设置的采集区域，可能对于某个完整区域而言，因为某些原因导致该完整区域被划分成了两个采集区域，若仅根据采集区域之间的距离评估孤立情况，就会造成误判，因此在考虑位置分布的基础上还需要结合空间属性信息相似情况，进而获得每个采集区域的孤立因子。In the city, due to urban planning issues, different areas are distributed differently in the city. Some areas are more concentrated, while others are more isolated. If you need to obtain accurate interpolation results for unsampled locations, you need to make the interpolation weights of the collection areas with the same location distribution larger. Therefore, only considering the spatial correlation factors obtained from the spatial attribute information to obtain the interpolation weights cannot obtain effective results. It is also necessary to further consider the distribution characteristics of the collection areas and unsampled locations within the city. Further considering that the collection area is a pre-set collection area, it is possible that for a complete area, the complete area is divided into two collection areas due to some reasons. If the isolation is evaluated only based on the distance between the collection areas, it will cause misjudgment. Therefore, on the basis of considering the location distribution, it is also necessary to combine the similarity of spatial attribute information to obtain the isolation factor of each collection area.

优选地，在本发明一个实施例中，孤立因子的获取公式包括：Preferably, in one embodiment of the present invention, the formula for obtaining the isolation factor includes:

在孤立因子公式中，作用为将l_nj进行归一化，即先利用以自然常数为底的指数函数进行负相关映射并归一化，将1作为被减数即可获得归一化结果。即采集区域之间的距离越大，说明第n个采集区域相对于整体采集区域越偏僻孤立，该采集区域中的高程信息可能具有一定的特异性，则孤立程度越大；进一步将空间相关因子进行负相关处理，空间相关因子越大说明两个采集区域之间具有相似的空间属性信息，可能为同一类区域，即孤立因子越小。In the isolation factor formula, The function is to normalize l _nj , that is, first use the exponential function with natural constant as the base to perform negative correlation mapping and normalization, and use 1 as the minuend to obtain the normalized result. That is, the greater the distance between the collection areas, the more remote and isolated the nth collection area is relative to the overall collection area, and the elevation information in the collection area may have certain specificity, and the greater the degree of isolation; further, the spatial correlation factor is negatively correlated, and the larger the spatial correlation factor is, the two collection areas have similar spatial attribute information and may be the same type of area, that is, the smaller the isolation factor is.

需要说明的是，在本发明实施例中，所设计到的距离可为现实世界中对象之间的距离，单位可设置为公里。也可为地图上两个位置之间的距离，单位可设置为像素点等其他单位，因为本发明实施例中存在多出归一化步骤，因此是不存在量纲影响，所以此处内容在此不做限定及赘述。It should be noted that in the embodiment of the present invention, the designed distance can be the distance between objects in the real world, and the unit can be set to kilometers. It can also be the distance between two locations on a map, and the unit can be set to other units such as pixels. Because there is an extra normalization step in the embodiment of the present invention, there is no dimension effect, so the content here is not limited or repeated.

在获得每个采集区域的孤立因子后对每个采集区域在城市范围中的孤立情况进行了分析，若未采样位置更偏向与该采集区域，则说明该采集区域内的数据相对于其他采集区域而言，对于未采样位置更具有参考性。因此需要进一步分析未采样位置与采集区域之间的偏向因子，首先需要获得未采样位置与每个采集区域之间的相对距离，若一个采集区域相对于其他采集区域的相对距离差异较大并且该采集区域与未采样位置的相对距离较小，则说明在整个城市范围中，未采样位置更偏向与该采集区域，因此需要根据每个采集区域与其他采集区域之间相对距离的差异，以及每个采集区域与未采样位置之间的位置分布，获得每个采集区域与未采样位置之间的偏向因子。After obtaining the isolation factor of each collection area, the isolation of each collection area in the city is analyzed. If the unsampled position is more biased towards the collection area, it means that the data in the collection area is more reference-oriented to the unsampled position than other collection areas. Therefore, it is necessary to further analyze the bias factor between the unsampled position and the collection area. First, the relative distance between the unsampled position and each collection area needs to be obtained. If the relative distance difference between a collection area and other collection areas is large and the relative distance between the collection area and the unsampled position is small, it means that in the entire city, the unsampled position is more biased towards the collection area. Therefore, it is necessary to obtain the bias factor between each collection area and the unsampled position based on the difference in relative distance between each collection area and other collection areas, and the position distribution between each collection area and the unsampled position.

优选地，在本发明一个实施例中，偏向因子的获取公式包括：Preferably, in one embodiment of the present invention, the formula for obtaining the bias factor includes:

在偏向因子的获取公式中，引入了采集区域的面积信息，对于第n个采集区域而言，其面积越大说明包含的信息越丰富，测量出来的数据更完整，因此将面积作为公式中的一项权重，同理第j个其他采集区域的面积作为分析两个采集区域相对情况之间的权重。利用构表示第n个采集区域与所述未采样位置之间的相对距离越小，则说明未采样位置更偏向于该采集区域，G_n越大。通过构建射线并将夹角θ_nj作为表征第n个采集区域、第j个其他采集区域和未采样位置之间的位置关系，该夹角越大说明以未采样位置为中心第n个采集区域和第j个其他采集区域之间呈对立分布，则两个采集区域之间的相对距离差异权重也应越大，即为两个采集区域之间的相对距离差异的权重之一，因为两条射线的夹角为小于等于180°的角，为了使得夹角与权重呈正相关，则将θ_nj除以2后再获得正弦函数值，获得对应权重。需要说明的是，和softmax(s_j)均为(l_n-l_j)的权重，(l_n-l_j)越大说明第n个采集区域与第j个其他采集区域之间的相对距离差异越大，第n个采集区域越孤立。结合各个参数之间的关系，获得的G_n越大，表示在整个城市范围中，未采样位置越偏向于第n个采集区域，第n个采集区域中的数据相对于未采样位置而言参考性更强。In the formula for obtaining the bias factor, the area information of the collection area is introduced. For the nth collection area, the larger its area is, the richer the information it contains and the more complete the measured data is. Therefore, the area is used as a weight in the formula. Similarly, the area of the jth other collection area is used as the weight for analyzing the relative situation between two collection areas. The smaller the relative distance between the nth acquisition area and the unsampled position, the more the unsampled position is biased towards the acquisition area, and the larger G _n is. By constructing rays and taking the angle θ _nj as the positional relationship between the nth acquisition area, the jth other acquisition area and the unsampled position, the larger the angle is, the greater the distribution of the nth acquisition area and the jth other acquisition area with the unsampled position as the center is in opposition, and the relative distance difference weight between the two acquisition areas should also be greater, that is, is one of the weights of the relative distance difference between the two acquisition areas. Since the angle between the two rays is less than or equal to 180°, in order to make the angle positively correlated with the weight, θ _nj is divided by 2 to obtain the sine function value and the corresponding weight. It should be noted that and softmax(s _j ) are the weights of (l _n -l _j ). The larger the (l _n -l _j ) is, the greater the relative distance difference between the nth collection area and the jth other collection areas is, and the more isolated the nth collection area is. Combined with the relationship between various parameters, the larger the G _n is, the more the unsampled locations are biased towards the nth collection area in the entire city, and the data in the nth collection area is more reference-oriented than the unsampled locations.

需要说明的是，相对距离可通过欧氏距离算法计算地图中未采样位置与采集区域中心点之间的欧氏距离进行获得。其中采集区域中心点获取方法为基础的数学手段，在此不做赘述。It should be noted that the relative distance can be obtained by calculating the Euclidean distance between the unsampled position in the map and the center point of the collection area through the Euclidean distance algorithm. The mathematical means based on the method for obtaining the center point of the collection area will not be described in detail here.

步骤S3：根据孤立因子、偏向因子和空间相关因子获得每个采集区域相对于未采样位置的插值权重；基于插值权重，根据采集区域内的高程数据对未采样位置的数据进行插值，获得未采样位置处的高程数据。Step S3: obtaining the interpolation weight of each acquisition area relative to the unsampled position according to the isolation factor, the bias factor and the spatial correlation factor; based on the interpolation weight, interpolating the data of the unsampled position according to the elevation data in the acquisition area to obtain the elevation data at the unsampled position.

结合孤立因子、偏向因子和空间相关因子即可获得每个采集区域相对于未采样位置的插值权重，即一个采集区域与未采样位置的空间属性信息越相近，采集区域孤立的同时未采样位置越偏向于该采集区域，则该采集区域相对于未采样位置的插值权重越大。基于插值权重，根据采集区域内的高程数据对未采样位置的数据进行插值，即可获得未采样位置处的高程数据。至此GIS数据中高程数据这一维度的数据通过插值算法进行了补全。Combining the isolation factor, bias factor and spatial correlation factor, the interpolation weight of each collection area relative to the unsampled position can be obtained. That is, the closer the spatial attribute information of a collection area and the unsampled position is, the more isolated the collection area is and the more biased the unsampled position is towards the collection area, the greater the interpolation weight of the collection area relative to the unsampled position. Based on the interpolation weight, the data of the unsampled position is interpolated according to the elevation data in the collection area to obtain the elevation data at the unsampled position. At this point, the data of the elevation data dimension in the GIS data has been supplemented by the interpolation algorithm.

优选地，在本发明一个实施例中将孤立因子、偏向因子和空间相关因子相乘后进行归一化处理，获得每个采集区域相对于未采样位置的插值权重。需要说明的是，在本发明一个实施例中所采用的归一化算法均为利用softmax函数进行映射，在本发明其他实施例中也可采用最大最小值归一化等其他归一化方法进行处理，在此不做限定及赘述。Preferably, in one embodiment of the present invention, the isolation factor, the bias factor and the spatial correlation factor are multiplied and then normalized to obtain the interpolation weight of each acquisition area relative to the unsampled position. It should be noted that the normalization algorithm used in one embodiment of the present invention is to use the softmax function for mapping, and other normalization methods such as maximum and minimum value normalization can also be used in other embodiments of the present invention for processing, which is not limited or elaborated here.

优选地，在本发明一个实施例中，为了使得插值权重进一步更准确，在获得插值权重后还包括：Preferably, in one embodiment of the present invention, in order to make the interpolation weight more accurate, after obtaining the interpolation weight, the following steps are further included:

根据每个未采样位置与每个采集区域之间相对距离，获得每个采集区域相对于未采样位置的距离权重，将距离权重与插值权重的平均值作为新的插值权重对插值权重进行更新。需要说明的是，距离权重可视为传统插值算法中依据对象之间的距离所获得的权重，其获取方法为本领域技术人员熟知的技术手段，在本发明一个实施例中可直接将相对距离进行负相关映射并归一化处理，并且保证所有采集区域的距离权重相加为1。因为距离权重为归一化后的结果，原始的插值权重也为归一化后的结果，因此经过平均值计算后得到的更新后的插值权重同样为归一化后的结果，便于后续插值计算。According to the relative distance between each unsampled position and each acquisition area, the distance weight of each acquisition area relative to the unsampled position is obtained, and the average value of the distance weight and the interpolation weight is used as the new interpolation weight to update the interpolation weight. It should be noted that the distance weight can be regarded as the weight obtained according to the distance between objects in the traditional interpolation algorithm, and its acquisition method is a technical means well known to those skilled in the art. In one embodiment of the present invention, the relative distance can be directly negatively mapped and normalized, and it is ensured that the distance weights of all acquisition areas add up to 1. Because the distance weight is the normalized result, the original interpolation weight is also the normalized result, so the updated interpolation weight obtained after the average value calculation is also the normalized result, which is convenient for subsequent interpolation calculations.

在本发明一个实施例中，插值算法选用反距离加权插值模型，具体算法步骤为本领域技术人员熟知的技术手段，在此不做赘述。In one embodiment of the present invention, the interpolation algorithm uses an inverse distance weighted interpolation model. The specific algorithm steps are technical means well known to those skilled in the art and will not be described in detail here.

步骤S4：根据插值处理后的GIS数据进行多源数据融合展示。Step S4: Perform multi-source data fusion display based on the interpolated GIS data.

经过插值处理后的GIS数据中包含了完整的高程数据，将各个维度下的数据进行可视化处理，并将可视化结果展示在同一个展示平台中即可完成多远数据融合展示。用户可在展示平台中同时获取城市范围中每个位置下的多源数据。The interpolated GIS data contains complete elevation data. By visualizing the data in each dimension and displaying the visualization results on the same display platform, multi-distance data fusion display can be completed. Users can simultaneously obtain multi-source data at each location in the city on the display platform.

综上所述，本发明实施例获取未采样位置与采集区域之间的空间属性信息相似度，进而确定空间相关因子。进一步分析每个采集区域的孤立因子，根据每个采集区域与未采样位置之间的相对距离获取偏向因子。结合空间相关因子、孤立因子和偏向因子获取每个采集区域相对于未采样位置的插值权重。利用参考性强的插值权重对未采样位置进行插值处理，获得对应的高程数据，进而根据插值后的GIS数据进行多源数据融合，本发明通过获取参考性强的插值权重，能够获得未采样位置的准确的插值结果，进而根据完整的GIS数据实现多源数据融合展示。In summary, the embodiment of the present invention obtains the similarity of spatial attribute information between the unsampled position and the collection area, and then determines the spatial correlation factor. The isolation factor of each collection area is further analyzed, and the bias factor is obtained according to the relative distance between each collection area and the unsampled position. The interpolation weight of each collection area relative to the unsampled position is obtained by combining the spatial correlation factor, the isolation factor and the bias factor. The unsampled position is interpolated using the interpolation weight with strong reference to obtain the corresponding elevation data, and then multi-source data fusion is performed based on the interpolated GIS data. The present invention can obtain accurate interpolation results for the unsampled position by obtaining the interpolation weight with strong reference, and then realize multi-source data fusion display based on the complete GIS data.

基于多源数据融合的GIS数据处理系统实施例：GIS data processing system implementation based on multi-source data fusion:

本发明还提出了一种基于多源数据融合的GIS数据处理系统，包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序，处理器执行计算机程序时实现任意一项一种基于多源数据融合的GIS数据处理方法的步骤。The present invention also proposes a GIS data processing system based on multi-source data fusion, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any step of a GIS data processing method based on multi-source data fusion.

一种用于高程数据插值的GIS数据处理方法实施例：A GIS data processing method embodiment for elevation data interpolation:

现有技术中，对于GIS数据中的高程数据进行插值时会考虑到未采样位置与已经存在数据的采集区域之间的距离构建插值权重进行插值，但是在实际的城市环境中，因为建筑物分布的原因，城市地形具有复杂的特征，存在不同的建筑类型，若直接采用距离构建插值权重则最终得到的插值结果参考性较差。为了解决该问题，本发明提出了一种用于高程数据插值的GIS数据处理方法：In the prior art, when interpolating elevation data in GIS data, the distance between the unsampled location and the collection area where data already exists is considered to construct the interpolation weight for interpolation. However, in the actual urban environment, due to the distribution of buildings, the urban terrain has complex characteristics and there are different types of buildings. If the distance is directly used to construct the interpolation weight, the final interpolation result is of poor reference. In order to solve this problem, the present invention proposes a GIS data processing method for elevation data interpolation:

由于步骤S1-S3的具体实施方式在上述基于多源数据融合的GIS数据处理方法实施例中已经给出了详细说明，在此不再赘述。Since the specific implementation of steps S1-S3 has been described in detail in the above-mentioned embodiment of the GIS data processing method based on multi-source data fusion, it will not be repeated here.

本发明实施例基于GIS数据的特点，考虑到GIS数据中存在信息丰富的空间属性信息，并且空间属性信息不为容易缺失或者采集不到的数据，因此在考虑未采样位置与采集区域之间的相关性时利用空间属性信息相似度进行分析，获取空间相关因子。进一步考虑到分析采集区域与未采样位置之间相关关系时不能仅考虑距离的远近，还需要考虑到城市范围中的位置分布，若未采样位置与某一采集区域在城市范围中均属于孤立区域，并且未采样位置更偏向于该采集区域，则说明该采集区域的数据对于未采样位置而言具有极大的参考性，因此分析采集区域的孤立因子以及采集区域与未采样位置之间的偏向因子，进而结合空间相关因子即可获得参考性强的插值权重。进而对未采样位置的数据进行插值即可获得准确的高程数据。The embodiment of the present invention is based on the characteristics of GIS data. Considering that there is rich spatial attribute information in GIS data, and the spatial attribute information is not easy to be missing or collected, the spatial attribute information similarity is used to analyze when considering the correlation between the unsampled position and the collection area to obtain the spatial correlation factor. Further considering that when analyzing the correlation between the collection area and the unsampled position, it is not only necessary to consider the distance, but also the location distribution in the urban area. If the unsampled position and a certain collection area are both isolated areas in the urban area, and the unsampled position is more biased towards the collection area, it means that the data of the collection area is of great reference value for the unsampled position. Therefore, the isolation factor of the collection area and the bias factor between the collection area and the unsampled position are analyzed, and then the interpolation weight with strong reference value can be obtained by combining the spatial correlation factor. Then, the data of the unsampled position can be interpolated to obtain accurate elevation data.

需要说明的是：上述本发明实施例先后顺序仅仅为了描述，不代表实施例的优劣。在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that the sequence of the above embodiments of the present invention is for description only and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referenced to each other, and each embodiment focuses on the differences from other embodiments.

Claims

1. A GIS data processing method based on multi-source data fusion, the method comprising:

Acquiring GIS data in a city range; the GIS data comprises spatial attribute information of the whole city range and elevation data in a preset acquisition area; the city range also comprises an un-sampled position;

Obtaining a spatial correlation factor between each acquisition region and the non-sampling position according to the similarity of spatial attribute information between the preset neighborhood range of the non-sampling position and the acquisition region; obtaining an isolated factor of each acquisition region according to the position distribution among the acquisition regions and the similarity condition of the spatial attribute information; obtaining the relative distance between the non-sampling position and each acquisition region, and obtaining a deflection factor between each acquisition region and the non-sampling position according to the difference of the relative distance between each acquisition region and other acquisition regions and the position distribution between each acquisition region and the non-sampling position;

Obtaining interpolation weights of each acquisition region relative to the non-sampling positions according to the isolation factors, the bias factors and the spatial correlation factors; interpolating the data of the non-sampling position according to the elevation data in the acquisition area based on the interpolation weight to obtain the elevation data of the non-sampling position;

Carrying out multi-source data fusion display according to the GIS data after interpolation processing;

The space attribute information comprises digital line drawing data and a construction facility type text vector;

the method for acquiring the spatial correlation factor comprises the following steps:

Acquiring a first cosine similarity of a text vector of a building facility type between a preset neighborhood range of the non-sampling position and the acquisition region and a second cosine similarity of digital line drawing data; taking the sum of the first cosine similarity and the second cosine similarity as the spatial correlation factor;

The obtaining formula of the isolation factor comprises the following steps:

; wherein the method comprises the steps of N is the number of acquisition areas, j is the serial number of other acquisition areas except the nth acquisition area in all the acquisition areas, e is a natural constant,For the distance between the nth acquisition region and the jth other acquisition region,A spatial correlation factor between the nth acquisition region and the jth other acquisition region;

the acquisition formula of the deflection factor comprises the following steps:

; wherein, A bias factor for the nth acquisition region; Is a normalized exponential function; Is the area of the nth acquisition region; n is the number of acquisition areas; j is the serial number of the other acquisition areas except the nth acquisition area in all the acquisition areas; is the relative distance between the nth acquisition region and the non-sampled position; A relative distance between the j-th other acquisition region and the non-sampled position; taking the non-sampling position as a vertex to respectively take rays from the center point of the nth acquisition region and the center point of the jth other acquisition region, wherein the included angle of the rays is ；Is a sine function; Is the area of the jth other acquisition region.

2. The method for processing GIS data based on multi-source data fusion according to claim 1, wherein a size of an acquisition region with a minimum area is selected as a size of the neighborhood range, and the neighborhood range is constructed with the non-sampled position as a center.

3. The GIS data processing method based on multi-source data fusion according to claim 1, wherein the obtaining method for obtaining the interpolation weight of each acquisition region relative to the non-sampling position according to the isolation factor, the bias factor and the spatial correlation factor includes:

And multiplying the isolation factor, the deflection factor and the space correlation factor, and then carrying out normalization processing to obtain the interpolation weight of each acquisition region relative to the non-sampling position.

4. The method for processing GIS data based on multi-source data fusion according to claim 1, wherein obtaining interpolation weights further comprises:

And obtaining the distance weight of each acquisition region relative to the non-sampling position according to the relative distance between each non-sampling position and each acquisition region, and updating the interpolation weight by taking the average value of the distance weight and the interpolation weight as a new interpolation weight.

5. The method for processing GIS data based on multi-source data fusion according to claim 1, wherein the data of the non-sampling position is interpolated by an inverse distance weighted interpolation method to obtain elevation data of the non-sampling position.

6. A GIS data processing system based on multi-source data fusion, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of a GIS data processing method based on multi-source data fusion as claimed in any one of claims 1 to 5 when executing the computer program.