CN116662468A

CN116662468A - Urban functional area identification method and system based on geographic object space mode characteristics

Info

Publication number: CN116662468A
Application number: CN202310608898.XA
Authority: CN
Inventors: 眭海刚; 杜卓童; 周启鸣; 史玮玥; 葛亮
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-08-29

Abstract

The invention discloses a city functional area identification method and system based on geographic object space mode characteristics, firstly, data standardization processing is carried out on open source big data, and a series of natural and artificially constructed separating elements are utilized to divide city space areas as research units, namely, blocks; secondly, mining spatial distribution and semantic attributes in multimode data based on urban surface elements, point of interest data (POI) and medium-resolution remote sensing images in the blocks to obtain socioeconomic feature vectors and spatial mode feature vectors of each block; and finally, designing a convolutional neural network to automatically identify and classify the spatial semantics of the urban functional area based on the obtained spatial pattern feature vector and the socioeconomic feature vector. The method and the device are based on open source big data, intelligently mine multidimensional features of various geographic objects such as urban earth surface elements, point of interest data (POI) and the like, and realize high-precision and fine-grained automatic identification and classification of spatial semantics of urban functional areas.

Description

Urban functional area identification method and system based on geographic object space mode characteristics

Technical Field

The invention belongs to the technical field of remote sensing application, and particularly relates to a city functional area identification method and system based on geographic object space mode characteristics.

Background

Urban functional areas refer to geographic combinations that spatially constitute various landscape elements and semantically abstract into the same urban functions, which are considered the basic units of urban management and planning. The space pattern of the urban functional area influences the urban living efficiency, and is closely related to the problems of traffic jams, division of sleeping areas, urban air pollution and the like. Urban functional area detection is an effective method of understanding urban space and human activity and environmental interactions. The driving force of rapid development of urban process and the characteristics of the formed functional areas are widely discussed. The division of traditional urban areas is mainly determined according to the results of field investigation and field observation. With the wide application of the high-resolution satellite image (Landsat, SPOT, quickBird), the high-precision urban land utilization map can be extracted by utilizing the remote sensing technology, so that the visual characteristics of urban ground objects can be effectively described, and further, the region segmentation is carried out. However, the existing urban functional area recognition result based on remote sensing image land coverage still has a large application gap from the actual living space. Furthermore, the dynamic change research on urban land utilization classification and the related research on suburban complex transition areas are far from sufficient, and providing high-precision, fine-grained, and instant urban functional area identification remains a great challenge.

The remote sensing image contains abundant ground object visual information and spatial information, and the urban functional area identification should not only pay attention to the identification of the object-oriented ground surface elements. Only the spatial arrangement features and spatial layout of the ground features in the captured image and the spatial aggregation mode of the geographic objects are selected as key recognition factors, the method for classifying urban functional scenes only aiming at the visual features of the ground features can be effectively changed. Meanwhile, with the rapid development of the artificial intelligence method such as machine learning on high-resolution remote sensing image recognition, the spatial mode characteristics of ground feature elements can be effectively quantized by utilizing the relevant model of the machine learning, and image, space and semantic information are integrated, so that the urban functional area planning with accurate mapping and timely updating is possible.

Although urban functional area detection methods based on geographic objects in remote sensing images have achieved a certain success in academia and industry, with the acceleration of global urbanization process, cities are rapidly expanded, population is continuously increased, and in order to meet the increasing social and economic demands of people in the rapid urbanization process, urban functional areas are continuously changed. The current method for extracting only ground object targets and space details is limited to the functional area identification of a single street view environment or certain specific scenes, but cannot realize the intelligent detection classification of urban land utilization in complex geographic scenes, and cannot provide an automatically updated urban functional area planning map for the sustainable development of cities. Therefore, providing urban functional area planning with high timeliness and high precision has become a great demand for smart city construction, and is a basis for further capturing urban human behavior patterns, and is also an important basis for developing and providing traffic control, energy recovery, resource emergency and the like in urban management.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a typical urban functional area identification method and system based on geographic object space mode characteristics, which are based on open source big data, intelligently mine multidimensional characteristics of various geographic objects such as urban earth surface elements, point of interest data (POI) and the like, and can realize automatic identification and classification of urban functional area space semantics with high precision and fine granularity.

The invention provides a city functional area identification method based on geographic object space mode characteristics, which comprises the following steps:

step 1, preprocessing open source geospatial vector data, and dividing urban space areas to obtain research units, namely blocks;

step 1.1, preprocessing open source geospatial vector data to enable the data to meet data use specifications;

step 1.2, merging multi-lane line data in open-source geospatial vector data by ArcGIS software, and connecting broken lines in the multi-lane line data based on a topology inspection result to simplify road line data;

step 1.3, dividing urban space by main road level road data and river data outside a dehumidified ground to obtain a city block;

step 2, converting the spatial distribution attribute and the position relation of POI objects in the blocks into a spatial semantic corpus of Block-POIs, obtaining high-dimensional continuous dense vectors of different POI types in each Block, and obtaining the socioeconomic feature vector of each Block through weighted average processing;

Step 2.1, regarding a Block in a research area as a document, regarding POIs in the Block as words, and constructing a Block-POI space semantic corpus;

step 2.2, obtaining high-dimensional continuous dense vectors of different POI types in each block by using a Word2Vec model;

2.3, improving the true number part of the log function of the TF-IDF algorithm, weighting the POI word vector by using the improved TF-IDF algorithm, calculating the actual contribution degree of different POI types in the Block-POI corpus, and mining the scale and the use characteristics of the POI function in each Block;

step 3, calculating various elements, neighborhood and overall spatial distribution structural indexes of the elements of the internal earth surface based on vector data by taking the neighborhood as a unit, calculating land coverage remote sensing application indexes and image depth semantic features formed by aggregation of various elements in the neighborhood based on medium resolution remote sensing images, and combining the land coverage remote sensing application indexes and the image depth semantic features to form a spatial mode feature vector of each neighborhood;

step 3.1, calculating an index based on element characteristics of a geographic object, an index based on surface element neighborhood characteristics and an index based on overall distribution characteristics of surface elements in a block based on internal surface element vector data by taking the block as a unit;

Step 3.2, using the minimum circumscribed rectangle of the blocks obtained by the division in the step 1 as a mask, extracting the corresponding earth surface range in the medium-resolution remote sensing image, and calculating three earth surface coverage remote sensing application indexes of each block;

step 3.3, dividing the remote sensing image of the research area into input images with a certain size, and extracting image depth semantic features by adopting an existing depth convolution self-encoder model MegNet;

step 3.4, performing Z-Score normalization on the elements, the neighborhood and the overall spatial distribution structural indexes obtained by the calculation in the step 3.1 and the surface coverage remote sensing application index obtained by the calculation in the step 3.2;

and 4, based on the socioeconomic feature vector obtained in the step 2 and the space mode feature vector obtained in the step 3, automatically identifying and classifying the space semantics of the urban functional area by utilizing a convolutional neural network.

Moreover, the preprocessing operation in step 1.1 includes: (1) data space reference conversion; (2) topology inspection is carried out on the geographic element data; (3) and carrying out semantic attribute supplementation on the data with the POI missing key attributes.

The specific operations involved in the step 2.1 are as follows: (1) calculating Euclidean distance between all POI point pairs in the neighborhood, and selecting the POI point pair with the farthest distance as two endpoints of word arrangement sequences in a corpus, namely < _start ,P _end >The method comprises the steps of carrying out a first treatment on the surface of the (2) By taking a block as a unit, abstract POI (point of interest) in the block as a graph node, construct an unowned and undirected Voronoi graph, and calculate according to Dijkstra algorithmThe shortest path between the two, and record the POI word in order, construct the "block file" comprising POI space context information; (3) simulating a document construction process, constructing point pairs based on a block center, and calculating Euclidean distances between the block point pairs in a research area to obtain a block document; (2) and (3) constructing a Block-POI spatial semantic corpus.

Furthermore, in the step 2.2, the probability vector of each type of POI in the neighborhood is obtained by using CBOW model training of Word2Vec neural network, and the objective function L and the probability distribution p (w _t |Context(w _t ) The calculation mode is as follows:

Context(w _t )＝w _t-c ,…,w _t-1 ,w _t+1 ,…,w _t+c (1)

wherein w is _t Is the current word, c is the contextual window size, context (w _t ) Representing the current word w _t Is the context input vector of T, is the corpus size, w _i Is an arbitrary word in the corpus, E (w _t ,Context(w _t ) Is an energy function of the Word2Vec model, and is calculated as follows:

E(w _i ,w _j )＝-(v(w _i )·v(w _j )) (3)

wherein E (w) _i ,w _j ) The expression w _i And its context word w _j Is the vector inner product of v (w _i )、v(w _j ) Representing a word vector.

In step 2.3, firstly, the TF value of the POI semantic function type is determined, and the specific calculation mode is as follows:

In the formula, TF _i,j TF value, n, representing the ith POI type at the jth block _i,j Is the number of times the ith POI type appears in the jth neighborhood, Σ _k n _k,j Is the total number of occurrences of each type of POI in the jth neighborhood.

Aiming at the problem of non-uniform distribution among classes, an improved IDF is utilized to determine the IDF value of the semantic function type of the POI, and the improved calculation mode is as follows:

in the formula, IDF _i IDF value representing the ith POI type, N is the total number of blocks, N _m Represents the number of m-th class blocks, d _im Representing the number of blocks in the m-th type of block containing the i-th POI type, d _ic,c≠m The number of blocks containing the ith POI type in the blocks of other categories is represented, and a specific calculation formula is as follows:

wherein w is _i Refers to the ith POI type, d _j Refers to the jth neighborhood, c _m Refers to the m-th class of block,<w _i ∈d _j >∩<d _j ∈c _m >representing the value judgment condition.

Aiming at the problem of non-uniformity in class distribution, the parameter class frequency CF is utilized _i In combination with the above calculated parameters TF _i,j And IDF (IDF) _i The improved TF-IDF algorithm is shown in formula (8):

wherein TF-IDF _i Weights representing the i th POI type, d _im Representing the number of blocks in the mth type of block in the dataset containing the ith POI type, N _m Represents the number of m-th class blocks, N is the total number of blocks, d _ic,c≠m Indicating the number of blocks in other categories of blocks containing the ith POI type.

To avoid IDF _i And (3) carrying out smoothing treatment on the formula (8) when the true number or denominator is zero in the calculation process, wherein the formula (9) is as follows:

based on the POI semantic type feature vectors and type weights obtained in steps 2.2 and 2.3, the calculation of the socioeconomic features of each block is as follows:

in the formula, v _SE (Block) means socioeconomic feature vector v of a neighborhood _{Social-Economic} (Block), v (POI_Category (i)) is the vector representation obtained in step 2.2 for the different POI types within the block, weight (POI_Category (i)) is the weight of the different POI types found by the modified TF-IDF algorithm.

In addition, in the step 3.1, the principle of information entropy is combined, the area distribution entropy, the perimeter distribution entropy, the uniform radius distribution entropy, the minimum circumscribed rectangle direction distribution entropy, the compactness distribution entropy, the fractal dimension distribution entropy and the concavity distribution entropy of the building objects in the neighborhood are calculated based on the element level, the density distribution entropy of the building objects in the neighborhood is calculated based on the neighborhood level, the overall aggregation structure characteristics of the geographic objects in the neighborhood are calculated, the diversity of the building geometry and the distribution attribute in the neighborhood is measured, and the specific calculation modes are shown in the formulas (11) to (30):

area distribution entropy:

wherein H is ₁ (d) Entropy of area distribution for geographical objects of buildings within a neighborhood, A _i For the i-th building area, S is the total building area and n is the number of geographic objects of the building.

Perimeter distribution entropy:

wherein H is ₂ (d) Perimeter distribution entropy, P, for building geographic objects within a neighborhood _i For the perimeter of the ith building, L is the total building perimeter and n is the number of geographic objects of the building.

Equal radius distribution entropy:

wherein H is ₃ (d) Entropy of uniform radius distribution for geographical objects of buildings in neighborhood, R _i For the average radius of the ith building, R is the sum of the average radii of the overall building, and n is the number of geographic objects of the building; wherein, the average radius of the building object is calculated from the average distance from all boundary points to the center point of the contour:

wherein d _j Representing the distance from the boundary point j to the center point, M is the number of boundary points.

Minimum circumscribed rectangle direction distribution entropy: the main direction of the minimum circumscribed rectangle of the building is defined as the included angle between the long axis and the horizontal axis, the main direction theta of each building object is calculated firstly, the number of main direction classifications is determined, then the number of buildings in different main direction classification intervals in the neighborhood is counted, the difference analysis of the building direction distribution in the neighborhood is carried out, and the direction statistical result is brought into the following formula:

Wherein H is ₄ (d) For the directional distribution entropy of the object hierarchy of building instances in the building group, S _i The number of buildings in the i-th direction, N is the total number of buildings, N ₄ The number of categories for the principal direction.

Compactness distribution entropy:

wherein H is ₅ (d) Distributed entropy, S_Com, for compactness of building geographic objects within a neighborhood _i For the compactness of the ith building, s_com is the total building compactness and n is the number of geographic objects of the building. Wherein the compactness of the building object is between its area and perimeterIs a secondary relationship of:

wherein A is _i For the i-th building area, P _i Is the perimeter of the ith building.

Fractal dimension distribution entropy:

wherein H is ₆ (d) Fractal dimension distribution entropy, S_Frac, for building geographic objects within a neighborhood _i For the i-th building fractal dimension, s_frac is the total building fractal dimension, and n is the number of geographic objects of the building.

Wherein the fractal dimension of a building object is the logarithmic relationship between its area and perimeter:

Concavity distribution entropy:

wherein H is ₇ (d) For the concavity distribution entropy of building geographic objects within a neighborhood, S_Con _i For the concavity of the ith building, S_Con is the total building concavity and n is the number of building geographic objects. Wherein the concavity of a building object is the area ratio of the object to its convex hull:

Wherein A is _i For the area of the i-th building,is the area of the convex hull of the building object.

Density distribution entropy:

wherein H is ₈ (d) Entropy, S_den, of density distribution for geographical objects of buildings within a block _i For the density of the ith building, S_den is the total building density and n is the number of geographic objects of the building. The density of a building object refers to the area ratio of the object to its corresponding Voronoi polygon:

wherein A is _i For the area of the i-th building,is the area of the building corresponding to the Voronoi polygon.

Calculating the overall aggregate structural characteristics of the geographic objects of the neighborhood, including the average height of buildings in the neighborhood and the six land proportion indexes, as shown in the formulas (24) to (30):

average height of building in blockIs calculated as follows:

in the formula, h _i,b Is the height of the ith building in the neighborhood, N _b Is the number of building objects within the neighborhood.

The six land proportion indexes are calculated as follows:

wherein ULR is of the formula ₁ Represents the first item of land index, A _i,g Is the area of the ith greenbelt object in the neighborhood, A _i,w Is the area of the ith water object in the neighborhood, A _block Is the area of the block, N _g And N _w The number of greenbelts and water objects within the neighborhood, respectively.

Wherein ULR is of the formula ₂ Represents the second land index, N _{POI_facilities} Is the number of city utility class POIs within a neighborhood,is the total area of all building objects within the neighborhood.

Wherein ULR is of the formula ₃ Represents the third land index, N _{POI_facilities} Is the number of city facility POIs in the neighborhood, A _block Is the area of the block, A _i,g Is the area of the ith greenbelt object in the neighborhood, A _i,w Is the area of the ith water object in the neighborhood, A _i,b Is the area of the ith building object in the neighborhood, N _g And N _w The number of green land and water body objects in the neighborhood, N _b Is the number of building objects within the neighborhood.

Wherein ULR is of the formula ₄ Represent the fourth itemIndex of land, L _{inner_roads} Is the length of the interior road object of the block, N _r Is the number of internal roads within the neighborhood, A _i,b Is the area of the ith building object within the neighborhood.

Wherein ULR is of the formula ₅ Represents the land index of the fifth item, L _{inner_roads} Is the length of the interior road object of the block, N _r Is the number of internal roads within the neighborhood, A _block Is the area of the block, A _i,g Is the area of the ith greenbelt object in the neighborhood, A _i,w Is the area of the ith water object in the neighborhood, A _i,b Is the area of the ith building object in the neighborhood, N _g And N _w The number of green land and water body objects in the neighborhood, N _b Is the number of building objects within the neighborhood.

Wherein ULR is of the formula ₆ Represents the sixth land index, E _entities Is the sum of the boundary lengths of buildings, greenbelts, bodies of water, road objects and other areas of the neighborhood, A _block Is the area of the neighborhood.

In the step 3.2, the three calculation modes of the surface coverage remote sensing application indexes are shown in formulas (31) to (33):

where NDVI is the normalized vegetation index, NIR represents the near infrared band and Red represents the Red band.

Where NDBI is the normalized building index, SAVI is the vegetation index for soil regulation, SWIR represents the short infrared band, NIR represents the near infrared band, red represents the Red band, and L is the soil regulator.

Where NDWI is the normalized water index, green represents the Green wave band and NIR represents the near infrared band.

And the convolutional neural network constructed in the step 4 comprises a functional area category feature extraction module and a functional area category feature classification module. The feature extraction module acquires high-dimensional functional area category features in a combination mode of a convolution layer and a pooling layer, wherein the feature extraction module comprises four convolution layers and three pooling layers; wherein, each convolution layer is followed by an activation function layer, which uses the ReLU () activation function of pytorch, and the pooling layer uses the MaxPool2d () function of pytorch. The feature classification module processes the high-dimensional features obtained in the feature extraction module to predict the functional area types, and the feature classification module consists of a flattening layer, two full-connection layers and a classification output layer; the flattening layer adopts a flat () function of pytorch, the full connection layer adopts a Linear () function, the output layer adopts a log_softmax () function, and the output dimension is the total number of label function area types.

Aggregating the POI semantic space distribution feature vectors in the blocks obtained in the step 2, and elements, neighborhood, whole space mode feature vectors, earth surface coverage remote sensing application indexes and image depth semantic features of geographic objects in the blocks obtained based on vector and image data in the steps 3.1-3.3, and endowing 530-dimensional feature vector description to each block.

Dividing the block data set into a training set, a verification set and a test set, inputting the network into a 530-dimensional feature vector description of the block, and outputting the network into the category of the city functional block. The network optimizer adopts Adam algorithm to set the learning rate and the attenuation rate to be 3e respectively ^-5 And 0.95; the main framework of the encoder-full connection layer-Softmax classifier takes the parameters of the encoder as initial values, and is realized through the supervised learning of the parametersAnd (5) automatically identifying and classifying the urban functional areas.

The invention also provides a city functional area identification system based on the geographic object space mode characteristics, which is used for realizing the city functional area identification method based on the geographic object space mode characteristics.

Further, the system includes a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a method for urban function area identification based on the geographic object space pattern feature as described above.

Compared with the prior art, the invention has the following advantages:

1) A new urban function space semantic recognition framework is established, the attribute semantics, the space mode and the image coverage multidimensional geographic features of the multi-source data mining urban neighborhood are integrated to quantitatively describe the typical urban functions, and the coupling of geographic space and geographic relation is realized;

2) Aiming at the complexity of the urban functional space and the relationship between the internal elements thereof, the space measurement describing various geographic objects (including buildings, roads, greenbelts, water bodies, POIs and the like) and the interaction between the geographic objects is constructed, and an effective and sufficient urban neighborhood space semantic representation method is provided;

3) Based on the Voronoi geometric model and the basic unit of the city Block, a Block-POI space semantic corpus is established, and the spatial distribution attribute and the position relation of the POI geographic objects in the Block are converted into dense feature vectors by using a deep learning language model, so that the semantic expression of the functional space is supported.

Drawings

Fig. 1 is a general flow chart of a city functional area identification method according to an embodiment of the present invention.

FIG. 2 is a diagram of a city functional area identification and classification neural network according to an embodiment of the present invention.

Detailed Description

The invention provides a typical urban functional area identification method and system based on geographic object space mode characteristics, and the technical scheme of the invention is further described below by taking open source data OSM data as an example with reference to the accompanying drawings.

Example 1

As shown in fig. 1, the present invention provides a method for identifying a typical urban functional area based on a spatial mode feature of a geographic object, comprising the following steps:

step 1, dividing urban space regions by a series of natural and artificially constructed separating elements (such as rivers, roads and the like) to obtain research units, namely blocks.

And 1.1, preprocessing the open source geospatial vector data to enable the open source geospatial vector data to meet the data use specification.

(1) Data space reference conversion, including conversion between WGS84, mobile internet geodetic coordinate system and CGCS2000 coordinate system; (2) topology inspection is carried out on geographic element data such as building surface shape data, road line data and the like; (3) and carrying out semantic attribute supplementation on the data with the POI missing key attributes.

And 1.2, combining OpenStreetMap (OSM) multi-lane line data by using ArcGIS software, and connecting broken lines in the multi-lane line data based on a topology checking result to simplify OSM road line data.

And 1.3, dividing the urban space by using the road data at the main road level and the river data outside the dehumidification area to obtain the urban neighborhood.

And 2, converting the spatial distribution attribute and the position relation of the POI objects in the blocks into a spatial semantic corpus of Block-POI by means of the Voronoi diagram, acquiring high-dimensional continuous dense vectors of different POI types in each Block by using a Word2Vec model, and obtaining the socioeconomic feature vector of each Block by weighted average processing.

Step 2.1, analogizing a natural language processing process, regarding a Block in a research area as a document, regarding POIs in the Block as words, and constructing a Block-POI space semantic corpus, wherein the specific operation steps are as follows:

(1) calculating Euclidean distance between all POI point pairs in the neighborhood, and selecting the POI point pair with the farthest distance as two endpoints of word arrangement sequences in a corpus, namelyThe method comprises the steps of carrying out a first treatment on the surface of the (2) In the unit of block, abstract POI therein as graph nodePoints, constructing an unowned and undirected Voronoi diagram, and calculating according to Dijkstra algorithmThe shortest path between the two, and record the POI word in order, construct the "block file" comprising POI space context information; (3) simulating a document construction process, constructing point pairs based on a block center, and calculating Euclidean distances between the block point pairs in a research area to obtain a block document; (2) and (3) constructing a Block-POI spatial semantic corpus.

And 2.2, obtaining high-dimensional continuous dense vectors of different POI types in each block by using a Word2Vec model.

Word2Vec neural network is a distributed text representation method, and is divided into two models of Skip-gram and CBOW, because the invention aims to predict functional types through context space distribution information, namely 'predicting central words through surrounding words', the probability vector of each type of POI in a block is obtained by selecting CBOW model training, and the target function L and the probability distribution p (w _t |Context(w _t ) The calculation mode is as follows:

Context(w _t )＝w _t-c ,…,w _t-1, w _t+1 ,…,w _t+c (1)

wherein w is _t Is the current word, c is the contextual window size, this embodiment is set to 5, context (w _t ) Representing the current word w _t Is the context input vector of T, is the corpus size, w _i Is an arbitrary word in the corpus, E (w _t ,Context(w _t ) Is an energy function of the Word2Vec model, and is calculated as follows:

E(w _i ,w _j )＝-(v(w _i )·v(w _j )) (3)

In this embodiment, the dimension parameter of the output word vector of the CBOW model is set to 256, and the iteration number parameter is set to 20.

And 2.3, introducing an improved TF-IDF algorithm to weight the POI word vectors, calculating actual contribution degrees of different POI types in a Block-POI corpus, and mining the scale and the use characteristics of the POI functions in each Block.

Because the distribution of the POI geographic objects is uneven under the influence of economic activities, geographic environments and the like, an improved TF-IDF algorithm is introduced to weight the POI word vectors, the actual contribution degree of different POI types in a Block-POI corpus is calculated, and the scale and the use characteristics of the POI functions in each Block are mined.

Firstly, determining the TF value of the semantic function type of the POI, wherein the specific calculation mode is as follows:

in the formula, IDF _i IDF value representing the ith POI type, N is the total number of blocks, N _m Represents the number of m-th class blocks, d _im Representing the number of blocks in the m-th type of block containing the i-th POI type, d _ic,≠m The number of blocks containing the ith POI type in the blocks of other categories is represented, and a specific calculation formula is as follows:

wherein w is _i Refers to the ith POI type, d _j Refers to the jth neighborhood, c _m Refers to the m-th class of block,< _i ∈d _j >∩<d _j ∈c _m >representing the value judgment condition.

wherein TF-IDF _i Weights representing the i th POI type, d _im Representing the number of blocks in the mth type of block in the dataset containing the ith POI type, N _m Represents the number of m-th class blocks, N is the total number of blocks, d _ic,≠m Indicating the number of blocks in other categories of blocks containing the ith POI type.

in the formula, v _SE (Block) means socioeconomic feature vector v of a neighborhood _{Social-Economic} (blLock), v (poi_category (i)) is the vector representation of the different POI types within the block obtained in step 2.2, weight (poi_category (i) is the weight of the different POI types found by the modified TF-IDF algorithm.

And 3, calculating various elements, neighborhood and overall spatial distribution structure indexes of internal surface elements (including buildings, roads, greenbelts and water bodies) based on vector data by taking the neighborhood as a unit, and calculating land coverage remote sensing application indexes and image depth semantic features formed by aggregation of various elements in the neighborhood based on the medium resolution remote sensing image, wherein the two elements are combined to form a spatial mode feature vector of each neighborhood.

And 3.1, calculating indexes based on element characteristics of geographic objects per se, indexes based on surface element neighborhood characteristics and indexes based on overall distribution characteristics of surface elements in the neighborhood based on the vector data of internal surface elements (including buildings, roads, greenbelts and water bodies) by taking the neighborhood as a unit.

The method comprises the steps of combining an information entropy principle, calculating area distribution entropy, perimeter distribution entropy, uniform radius distribution entropy, minimum circumscribed rectangle direction distribution entropy, compactness distribution entropy, fractal dimension distribution entropy and concavity distribution entropy of building objects in a neighborhood based on an element level, calculating density distribution entropy of the building objects based on the neighborhood level, calculating integral aggregation structural characteristics of geographic objects of the neighborhood, measuring diversity of building geometry and distribution attributes in the neighborhood, wherein specific calculation modes are shown in formulas (11) to (30):

Area distribution entropy:

Perimeter distribution entropy:

Equal radius distribution entropy:

Compactness distribution entropy:

wherein H is ₅ (d) Distributed entropy, S_Com, for compactness of building geographic objects within a neighborhood _i For the compactness of the ith building, s_com is the total building compactness and n is the number of geographic objects of the building. The compactness of a building object is a quadratic relation between the area and the perimeter:

Fractal dimension distribution entropy:

Concavity distribution entropy:

Wherein A is _i For the i-th building area, A _chi Is the area of the convex hull of the building object.

Density distribution entropy:

wherein A is _i For the i-th building area, A _{voronoi_} polugon _i Is the area of the building corresponding to the Voronoi polygon.

average height of building in blockIs calculated as follows:

The six land proportion indexes are calculated as follows:

Wherein ULR is of the formula ₃ Represents the third land index, N _{POI_facilities} Is the number of city facility POIs in the neighborhood, A _block Is the area of the block, A _i,g Is the area of the ith greenbelt object in the neighborhood, A _i,w Is the area of the ith water object in the neighborhood, A _i,b Is the area of the ith building object in the neighborhood, N _g And N _w The number of green land and water body objects in the neighborhood, N _d Is the number of building objects within the neighborhood.

Wherein ULR is of the formula ₄ Represents the fourth land index L _{inner_roads} Is the length of the interior road object of the block, N _r Is the number of internal roads within the neighborhood, A _i,b Is the area of the ith building object within the neighborhood.

And 3.2, using the minimum circumscribed rectangle of the blocks obtained by segmentation in the step 1 as a mask, extracting the corresponding earth surface range in the medium-resolution remote sensing image, and calculating three earth surface coverage remote sensing application indexes of each block, wherein the three earth surface coverage remote sensing application indexes are shown in formulas (31) to (33):

Where NDBI is a normalized building index, SAVI is a vegetation index for soil regulation, SWIR represents a short infrared band, NIR represents a near infrared band, red represents a Red band, L is a soil regulator, and the value of this example is 0.5.

And 3.3, dividing the remote sensing image of the research area into input images with the size of 512 multiplied by 512, and learning image features by adopting an existing depth convolution self-encoder model MegNet.

The self-encoder model comprises an encoder and a decoder, the input image is also a label image of the model, and the model is subjected to iterative optimization by taking similarity estimation between the decoded output value and the original input image as a loss function. Therefore, the trained MegNet self-encoder has good image feature expression capability, and can be used as a feature extractor to obtain the image depth semantic features of each block in the research area, wherein the dimension of the high-dimensional depth semantic features is 256 (namely the channel parameter value of the last convolution layer of the encoder).

And 3.4, performing Z-Score normalization on the elements, the neighborhood and the overall spatial distribution structural indexes obtained by the calculation in the step 3.1 and the surface coverage remote sensing application index obtained by the calculation in the step 3.2.

In order to avoid the influence of the dimension of the index on the subsequent analysis, Z-Score normalization is carried out on the calculated index, so that the invariance of translation, scaling and rotation of the index is ensured, and the average value and standard deviation of the index are calculated by all example objects in the MegNet self-encoder training set data. Because the convolution network model outputs the prediction probability, the probability value is [0,1], the image depth semantic features extracted by the MegNet encoder do not need to be normalized.

And 4, constructing a convolutional neural network based on the socioeconomic feature vector obtained in the step 2 and the space mode feature vector obtained in the step 3 to automatically identify and classify the space semantics of the urban functional area.

And (3) aggregating the POI semantic space distribution feature vector obtained in the step (2) and the element, neighborhood, whole space mode feature vector, surface coverage remote sensing application index and image depth semantic features of the geographic object in the blocks obtained based on vector and image data in the steps (3.1-3.3) based on a concat function of pytorch, and endowing each block with 530-dimensional (256-dim+8-dim+7-dim+3-dim+256-dim=530-dim) feature vector description.

The constructed convolutional neural network comprises a functional area category feature extraction module and a functional area category feature classification module, and the structure of the functional area category feature extraction module is shown in figure 2. The feature extraction module acquires high-dimensional functional area category features in a combination mode of a convolution layer and a Pooling layer, and comprises four convolution layers (Conv layer1, conv layer2, conv layer3 and Conv layer 4) and three Pooling layers (Pooling layer1, pooling layer2 and Pooling layer 3). Wherein, each convolution layer is followed by an activation function layer, which uses the ReLU () activation function of pytorch, and the pooling layer uses the MaxPool2d () function of pytorch. The feature classification module processes the high-dimensional features obtained in the feature extraction module to conduct functional region category prediction, and the feature classification module consists of a flattening layer, two full-connection layers (FC layer1 and FC layer 2) and a classification output layer. The flattening layer adopts a flat () function of pytorch, the full connection layer adopts a Linear () function, the output layer adopts a log_softmax () function, and the output dimension is the total number of label function area types.

Dividing the block data set into a training set, a verification set and a test set by adopting a ratio of 6:2:2, inputting the network into a 530-dimensional feature vector description of the block, and outputting the network into the category of the urban functional block. The network optimizer adopts Ada m algorithm, learning rate and attenuation rate are 3×10 respectively ^-5 And 0.95. The main framework of the encoder-full-connection layer-Softmax classifier takes the parameters of the encoder as initial values, and realizes the automatic identification and classification of the urban functional areas through the supervised learning of the parameters.

Example two

Based on the same inventive concept, the invention also provides a city functional area identification system based on the geographic object space mode features, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the storage instructions in the memory to execute the city functional area identification method based on the geographic object space mode features.

In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. A city function area identification method based on geographic object space mode features is characterized by comprising the following steps:

2. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: the step 1 comprises the following steps:

the preprocessing operation comprises the following steps: (1) data space reference conversion; (2) topology inspection is carried out on the geographic element data; (3) carrying out semantic attribute supplementation on the data with the POI missing key attributes;

3. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: the specific operation involved in step 2.1 is as follows: (1) calculating Euclidean distance between all POI point pairs in the neighborhood, and selecting the POI point pair with the farthest distance as two endpoints of word arrangement sequences in a corpus, namely The method comprises the steps of carrying out a first treatment on the surface of the (2) By taking a block as a unit, abstract POI (point of interest) in the block as a graph node, construct an unowned and undirected Voronoi graph, and calculate according to Dijkstra algorithmThe shortest path between the two, and record the POI word in order, construct the "block file" comprising POI space context information; (3) simulating a document construction process, constructing point pairs based on a block center, and calculating Euclidean distances between the block point pairs in a research area to obtain a block document; (2) and (3) constructing a Block-POI spatial semantic corpus.

4. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: training in step 2.2 using the CBOW model of Word2Vec neural network to obtain probability vectors for each class of POI in the neighborhood, its objective function L and the probability distribution p (w _t |Context(w _t ) The calculation mode is as follows:

Context(w _t )＝w _t-c ，…，w _t-1 ，w _t+1 ，…，w _t+c (1)

wherein w is _t Is the current word, c is the contextual window size, context (w _t ) Representing the current word w _t Is the context input vector of T, is the corpus size, w _i Is an arbitrary word in the corpus, E (w _t ，Context(w _t ) Is Word2VThe energy function of the ec model is specifically calculated as follows:

E(w _i ，w _j )＝-(v(w _i )·v(w _j )) (3)

wherein E (w) _i ，w _j ) The expression w _i And its context word w _j Is the vector inner product of v (w _i )、v(w _j ) Representing a word vector.

5. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: in step 2.3, firstly, the TF value of the semantic function type of the POI is determined, and the specific calculation mode is as follows:

in the formula, TF _i，j TF value, n, representing the ith POI type at the jth block _i，j Is the number of times the ith POI type appears in the jth neighborhood, Σ _k n _k，j Is the total number of occurrence times of various POIs in the jth neighborhood;

in the formula, IDF _i IDF value representing the ith POI type, N is the total number of blocks, N _m Represents the number of m-th class blocks, d _im Representing the number of blocks in the m-th type of block containing the i-th POI type, d _ic，c≠m The number of blocks containing the ith POI type in the blocks of other categories is represented, and a specific calculation formula is as follows:

wherein w is _i Refers to the ith POI type, d _j Refers to the jth neighborhood, c _m Refers to the m-th class of block,<w _i ∈d _j >∩<d _j ∈c _m >representing a value judgment condition;

aiming at the problem of non-uniformity in class distribution, the parameter class frequency CF is utilized _i In combination with the above calculated parameters TF _i，j And IDF (IDF) _i The improved TF-IDF algorithm is shown in formula (8):

Wherein TF-IDF _i Weights representing the i th POI type, d _im Representing the number of blocks in the mth type of block in the dataset containing the ith POI type, N _m Represents the number of m-th class blocks, N is the total number of blocks, d _ic，c≠m Representing the number of blocks containing the ith POI type in blocks of other categories;

based on the POI semantic type feature vectors and type weights available in steps 2.2 and 2.3, the calculation of the socioeconomic characteristics of each block is as follows:

in the formula, v _SE (Block) refers to socioeconomic characteristics of a neighborhoodSign vector v _{Social-Economic} (Block), v (PDI_Category (i)) is the vector representation of the different POI types within the block obtained in step 2.2, weight (PDI_Category (i)) is the weight of the different POI types found by the modified TF-IDF algorithm.

6. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: the step 3 comprises the following steps:

7. The urban function area recognition method based on the geographic object space pattern features according to claim 6, wherein: in the step 3.1, the information entropy principle is combined, the area distribution entropy, the perimeter distribution entropy, the uniform radius distribution entropy, the minimum circumscribed rectangle direction distribution entropy, the compactness distribution entropy, the fractal dimension distribution entropy and the concavity distribution entropy of building objects in a neighborhood are calculated based on element levels, the density distribution entropy of the building objects is calculated based on neighborhood levels, the diversity of building geometry and distribution attributes in the neighborhood is measured, and specific calculation modes are shown in the formulas (11) to (30):

Area distribution entropy:

wherein H is ₁ (d) Entropy of area distribution for geographical objects of buildings within a neighborhood, A _i The i-th building area is the total building area, S is the number of geographic objects of the building;

perimeter distribution entropy:

wherein H is ₂ (d) Perimeter distribution entropy, P, for building geographic objects within a neighborhood _i For the perimeter of the ith building, L is the total building perimeter, n is the number of geographic objects of the building;

equal radius distribution entropy:

wherein d _j Representing the distance from the boundary point j to the center point, wherein M is the number of the boundary points;

Wherein H is ₄ (d) For the directional distribution entropy of the object hierarchy of building instances in the building group, S _i The number of buildings in the i-th direction, N is the total number of buildings, N ₄ The number of categories for the primary direction;

compactness distribution entropy:

wherein H is ₅ (d) Distributed entropy, S_Com, for compactness of building geographic objects within a neighborhood _i For the compactness of the ith building, s_com is the total building compactness and n is the number of geographic objects of the building; the compactness of a building object is a quadratic relation between the area and the perimeter:

wherein A is _i For the i-th building area, P _i Is the perimeter of the ith building;

fractal dimension distribution entropy:

wherein H is ₆ (d) Fractal dimension distribution entropy, S_Frac, for building geographic objects within a neighborhood _i S_Frac is the total building fractal dimension for the ith building, n is the building geographic pairNumber of images; wherein the fractal dimension of a building object is the logarithmic relationship between its area and perimeter:

concavity distribution entropy:

wherein H is ₇ (d) For the concavity distribution entropy of building geographic objects within a neighborhood, S_Con _i For the concavity of the ith building, S_Con is the total building concavity and n is the number of building geographic objects; wherein the concavity of a building object is the area ratio of the object to its convex hull:

Wherein A is _i For the area of the i-th building,is the area of the convex hull of the building object;

density distribution entropy:

wherein H is ₈ (d) Entropy, S_den, of density distribution for geographical objects of buildings within a block _i For the density of the ith building, S_den is the total building density, n is the number of building geographic objects; wherein the density of a building object refers to the area of the Voronoi polygon to which the object correspondsRatio of:

wherein A is _i For the area of the i-th building,is the area of the building corresponding to the Voronoi polygon;

average height of building in blockIs calculated as follows:

in the formula, h _i，b Is the height of the ith building in the neighborhood, N _b Is the number of building objects within the neighborhood;

the six land proportion indexes are calculated as follows:

wherein ULR is of the formula ₁ Represents the first item of land index, A _i，g Is the area of the ith greenbelt object in the neighborhood, A _i，w Is the area of the ith water object in the neighborhood, A _block Is the area of the block, N _g And N _w The number of greenbelts and water objects in the neighborhood;

wherein ULR is of the formula ₂ Represents the second land index, N _{POI_facilities} Is the number of city utility class POIs within a neighborhood,is the total area of all building objects within the neighborhood;

wherein ULR is of the formula ₃ Represents the third land index, N _{POI_facilities} Is the number of city facility POIs in the neighborhood, A _block Is the area of the block, A _i，g Is the area of the ith greenbelt object in the neighborhood, A _i，w Is the area of the ith water object in the neighborhood, A _i，b Is the area of the ith building object in the neighborhood, N _g And N _w The number of green land and water body objects in the neighborhood, N _b Is the number of building objects within the neighborhood;

wherein ULR is of the formula ₄ Represents the fourth land index L _{inner_roads} Is the length of the interior road object of the block, N _r Is the number of internal roads within the neighborhood, A _i，b Is the area of the ith building object within the neighborhood;

wherein ULR is of the formula ₅ Represents the land index of the fifth item, L _{inner_roads} Is the length of the interior road object of the block, N _r Is the number of internal roads within the neighborhood, A _block Is the area of the block, A _i，g Is streetArea of ith greenfield object in zone, A _i，w Is the area of the ith water object in the neighborhood, A _i，b Is the area of the ith building object in the neighborhood, N _g And N _w The number of green land and water body objects in the neighborhood, N _b Is the number of building objects within the neighborhood;

8. The urban function area recognition method based on the geographic object space pattern features according to claim 6, wherein: in the step 3.2, three calculation modes of the surface coverage remote sensing application indexes are shown in formulas (31) to (33):

wherein, NDVI is normalized vegetation index, NIR represents near infrared band, red represents Red wave band;

wherein, NDBI is normalized building index, SAVI is vegetation index for regulating soil, SWIR represents short infrared band, NIR represents near infrared band, red represents Red band, and L is soil regulator;

9. The urban function area recognition method based on the geographic object space pattern features according to claim 1, wherein: the convolutional neural network constructed in the step 4 comprises a functional area category feature extraction module and a functional area category feature classification module, wherein the feature extraction module acquires high-dimensional functional area category features in a form of combination of a convolutional layer and a pooling layer, and comprises four convolutional layers and three pooling layers; wherein, each convolution layer is provided with an activation function layer, a ReLU () activation function of pytorch is adopted, and a MaxPool2d () function of pytorch is adopted by a pooling layer; the feature classification module processes the high-dimensional features obtained in the feature extraction module to predict the functional area types, and the feature classification module consists of a flattening layer, two full-connection layers and a classification output layer; wherein, the flattening layer adopts a flat () function of pytorch, the full connection layer adopts a Linear () function, the output layer adopts a log_softmax () function, and the output dimension is the total number of label function area types; aggregating the POI semantic space distribution feature vectors in the blocks obtained in the step 2, and elements, neighborhood, whole space mode feature vectors, earth surface coverage remote sensing application indexes and image depth semantic features of geographic objects in the blocks obtained based on vector and image data in the steps 3.1-3.3, and endowing 530-dimensional feature vector description to each block; dividing a block data set into a training set, a verification set and a test set, inputting a 530-dimensional feature vector description of a block by a network, and outputting the description of the feature vector as a class of a city functional block; the network optimizer adopts Adam algorithm to set the learning rate and the attenuation rate to be 3e respectively ^-5 And 0.95; the main framework of the encoder-full-connection layer-Softmax classifier takes the parameters of the encoder as initial values, and realizes the automatic identification and classification of the urban functional areas through the supervised learning of the parameters.

10. A geographic object space pattern feature based urban functional area identification system as claimed in claim 1 wherein: comprising a processor and a memory for storing program instructions, the processor being adapted to invoke the program instructions in the memory to perform a method for urban functional area identification based on geographic object space pattern features according to any of claims 1-9.