CN111651502A

CN111651502A - An urban functional area identification method based on multi-subspace model

Info

Publication number: CN111651502A
Application number: CN202010484901.8A
Authority: CN
Inventors: 朱佳玮; 陶超; 李海峰; 肖俊
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-09-11
Anticipated expiration: 2040-06-01
Also published as: CN111651502B

Abstract

The invention discloses a method for identifying urban functional areas based on a multi-subspace model, comprising the following steps: acquiring taxi trajectory data and check-in data in a research area; constructing a partition-oriented time series feature matrix C based on visiting purposes; inputting time series features Matrix C to sparse subspace clustering algorithm, calculate the corresponding relationship between geographic units and urban functional areas; obtain the salient feature locations of each functional area, and then identify the main functions of each functional area. The method of the present invention The present invention utilizes the human activity information provided by the geographic big data, and overcomes the defects existing in the prior art based on the multi-subspace model, can more accurately identify urban functional areas, and analyze each functional area based on the geometric properties of the subspaces The uniqueness and abundance of urban functional areas provide precise and quantitative indicators for the management and development of urban functional areas.

Description

An urban functional area identification method based on multi-subspace model

技术领域technical field

本发明属于地理空间信息识别技术领域，涉及城市地理信息的识别方法，具体涉及一种基于多子空间模型的城市功能区识别方法。The invention belongs to the technical field of geographic space information identification, relates to a method for identifying urban geographic information, and in particular relates to a method for identifying urban functional areas based on a multi-subspace model.

背景技术Background technique

城市空间结构是城市地理信息学的一个核心研究内容，也是人地关系的集中反映，因为城市空间在受人类活动的影响时又对人的生产和活动有影响，大到涉及城市规划，选址，小到出行、地点推荐。在城市空间结构分析中，城市功能区的分布是诸多因素影响下在地理空间中呈现的结果。Urban spatial structure is a core research content of urban geoinformatics, and it is also a concentrated reflection of the relationship between human and land, because urban space has an impact on human production and activities when it is affected by human activities. , small to travel, place recommendation. In the analysis of urban spatial structure, the distribution of urban functional areas is the result presented in geographic space under the influence of many factors.

分析城市功能区的方法有很多，如社会调查，但是在获取数据上费时费力，且在分析时可能受主观因素的极大影响，最大的缺点是不能直接反映城市发展的关键因素——人类的活动。随着移动通讯、互联网和卫星定位技术的快速发展通过具备定位功能的移动设备产生的一系列电子足迹，这些电子足迹是城市居民活动的真实记录，使我们能够从人类活动的角度探索城市功能区。目前已有方法利用社交媒体签到数据，手机数据以及出租车轨迹数据，以检测城市功能区域。There are many methods for analyzing urban functional areas, such as social surveys, but it is time-consuming and laborious to obtain data, and may be greatly affected by subjective factors during analysis. The biggest disadvantage is that it cannot directly reflect the key factor of urban development-human Activity. With the rapid development of mobile communication, Internet and satellite positioning technology, a series of electronic footprints are generated by mobile devices with positioning function. These electronic footprints are real records of urban residents' activities, which enable us to explore urban functional areas from the perspective of human activities. . There are existing methods using social media check-in data, mobile phone data, and taxi trajectory data to detect urban functional areas.

在用于分析数据的模型上，已有技术还不够完善。一般步骤如下，首先，在处理地理空间大数据时，将人类活动时序特征信息映射至人工划分的地理单元上后，使得每个地理单元都可以由向量表达，信息由此存储在一个高维的向量空间中。然后，他们通过一些算法如奇异值分解、潜在语义分析、潜在狄利克雷等分析方法对这些地理单元进行特征表达。最后，通过地理单元在特征表达上地相似性进行聚类，每一个聚类结果代表一个功能区，由此得到城市功能区的分布。然而，这些模型存在如下不足。The existing technology is not yet perfect in the models used to analyze the data. The general steps are as follows. First, when processing geospatial big data, after mapping the time series feature information of human activities to the manually divided geographic units, each geographic unit can be expressed by a vector, and the information is stored in a high-dimensional in vector space. Then, they characterize these geographic units through some algorithms such as singular value decomposition, latent semantic analysis, latent Dirichlet and other analytical methods. Finally, clustering is carried out by the similarity of feature expression of geographic units, each clustering result represents a functional area, and the distribution of urban functional areas is obtained. However, these models have the following shortcomings.

第一，在特征表达的过程中，部分算法先对特征做出严格的假设，如样本仅具有一组特征或服从同样的分布。因样本经过特征表达后，便会从一个高维空间降至一个低维子空间中，这些算法都可称为单子空间算法。单子空间算法严格的假设便于获得特征模式，且根据样本和特征之间的关系进行聚类即可获得功能区分布，但如果样本信息所占的权重较小，在特征表达后将会被边缘化，从而导致聚类结果不准确。并且功能区之间存在特征差异，使用同一组特征不能简洁精确地描述每一个功能区。当数据过大，特征模式过于复杂时，单子空间模型对特征模式的假设将会限制特征的挖掘。所以它们无法处理更为复杂的数据。First, in the process of feature expression, some algorithms first make strict assumptions about the features, such as the sample only has one set of features or obeys the same distribution. After the sample is characterized, it will be reduced from a high-dimensional space to a low-dimensional subspace, and these algorithms can be called single subspace algorithms. The strict assumption of the monadic space algorithm is easy to obtain feature patterns, and the distribution of functional areas can be obtained by clustering according to the relationship between samples and features, but if the weight of sample information is small, it will be marginalized after feature expression. , resulting in inaccurate clustering results. And there are feature differences between functional areas, and each functional area cannot be described concisely and accurately by using the same set of features. When the data is too large and the feature pattern is too complex, the assumption of the feature pattern in the monad space model will limit the feature mining. So they can't handle more complex data.

第二，这些模型忽略向量空间的几何意义。子空间的几何属性与城市功能区的特征是相关的，已有技术忽略了对此的探讨和考虑。Second, these models ignore the geometric meaning of vector spaces. The geometric properties of subspaces are related to the characteristics of urban functional areas, and the prior art ignores the discussion and consideration of this.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于多子空间模型的城市功能区识别方法，本发明利用地理大数据提供的人类活动信息，基于多子空间的模型克服现有技术中存在的缺陷，能够更精确地识别城市功能区。In view of this, the purpose of the present invention is to provide a method for identifying urban functional areas based on a multi-subspace model. The present invention utilizes the human activity information provided by geographic big data, and the multi-subspace-based model overcomes the defects existing in the prior art. , which can more accurately identify urban functional areas.

本发明的目的是这样实现的，一种基于多子空间模型的城市功能区识别方法，包括以下步骤：The purpose of the present invention is to achieve this, a method for identifying urban functional areas based on a multi-subspace model, comprising the following steps:

步骤1，获取研究区域内出租车轨迹数据和签到数据；Step 1: Obtain the taxi trajectory data and check-in data in the research area;

步骤2，构建面向分区基于到访目的的时序特征矩阵C；Step 2, constructing a partition-oriented time series feature matrix C based on the visiting purpose;

步骤3，输入时序特征矩阵C至稀疏子空间聚类算法，计算获得地理单元和城市功能区的对应关系；Step 3, input the time series feature matrix C to the sparse subspace clustering algorithm, and calculate the corresponding relationship between the geographic unit and the urban functional area;

步骤4，获得每个功能区的显著特征地点，进而识别每个功能区的主要功能。Step 4, obtain the salient feature locations of each functional area, and then identify the main function of each functional area.

具体地，步骤2中所述的时序特征矩阵C的构建过程包括以下步骤：Specifically, the construction process of the time series feature matrix C described in step 2 includes the following steps:

步骤201，对所述的研究区域进行划分，得到N个地理单元；Step 201, dividing the research area to obtain N geographic units;

步骤202，对所述的出租车轨迹数据预处理，剔除异常点，提取每次行程的终点和到达时间，并将终点与地理单元进行映射，得到地理单元的到访记录；Step 202, preprocessing the taxi trajectory data, removing abnormal points, extracting the end point and arrival time of each trip, and mapping the end point with the geographic unit to obtain the visit record of the geographic unit;

步骤203，将所述的签到数据记录与地理单元的到访记录进行匹配，对每次到访的目的进行分类；Step 203, matching the check-in data record with the visit record of the geographic unit, and classifying the purpose of each visit;

步骤204，构建M行N列的时序特征矩阵C，表示地理单元在一段时间内所承载的人类活动动态，其中M＝T×D，T表示划分的时间段数，D表示到访目的的类别数，C中每一列表示不同时间段为了不同目的访问对应地理单元的人数。Step 204, constructing a time series feature matrix C with M rows and N columns, which represents the human activity dynamics carried by the geographic unit within a period of time, where M=T×D, T represents the number of divided time periods, and D represents the number of categories for visiting purposes. , each column in C represents the number of people visiting the corresponding geographic unit for different purposes in different time periods.

具体地，步骤3中所述的稀疏子空间聚类算法，包括以下步骤：Specifically, the sparse subspace clustering algorithm described in step 3 includes the following steps:

步骤301，求解系数矩阵Z，大小为N×N，矩阵Z需满足在l₁约束下的最小化：Step 301, solve the coefficient matrix Z, the size is N×N, the matrix Z needs to satisfy the minimization under the constraint of l ₁ :

CZ＝C，Z_ii＝0CZ=C, Z _ii =0

其中

表示l₁范数，l₁范数最小化使得系数矩阵Z稀疏，从而迫使每个地理单元的时序特征仅需用同一子空间中其他地理单元的时序特征的线性组合来表示；in

Represents the _l1 norm, and the minimization of the _l1 norm makes the coefficient matrix Z sparse, thus forcing the time series feature of each geographic unit to be represented only by a linear combination of the time series features of other geographic units in the same subspace;

步骤302，然后利用系数矩阵建立数据的相似度矩阵W＝|Z|+|Z|^T，W大小为N×N，矩阵中的值即为对应索引的地理单元之间在时序特征上的相似度；Step 302, then use the coefficient matrix to establish a data similarity matrix W=|Z|+|Z| ^T , the size of W is N×N, and the value in the matrix is the similarity in time series characteristics between the geographic units corresponding to the index Spend;

相似度矩阵W为分块对角矩阵，即只有主对角线上有非零子矩阵，其余子块都为零矩阵，每一个非零子矩阵是一个子空间，同一子空间内包含多个时序特征极为相似的地理单元，处于不同子空间的地理单元在时序特征上差异大，因此子空间即为所需探测的城市功能区；The similarity matrix W is a block diagonal matrix, that is, only the main diagonal has non-zero sub-matrices, and the rest of the sub-blocks are zero matrices. Each non-zero sub-matrix is a subspace, and the same subspace contains multiple Geographical units with very similar time series characteristics, geographical units in different subspaces have large differences in time series characteristics, so the subspace is the urban functional area to be detected;

步骤303，利用相似度矩阵W的归一化拉普拉斯矩阵L计算子空间个数，L＝I-D^-1/ ²WD^-1/2，其中I是单位矩阵，D＝∑_iW_ij，将L的特征值升序排列，计算每两个相邻特征值的差值λ_k+1-λ_k，最大差值对应的k为所求子空间个数，亦即需要探测的城市功能区个数；Step 303: Calculate the number of subspaces by using the normalized Laplacian matrix L of the similarity matrix W, L=ID ^-1/ ² WD ^-1/2 , where I is the identity matrix, D=∑ _i W _ij , Arrange the eigenvalues of L in ascending order, and calculate the difference λ _{k+1 -} λ _k of each two adjacent eigenvalues. The k corresponding to the largest difference is the number of subspaces to be sought, that is, the number of urban functional areas to be detected. number;

步骤304，对相似度矩阵W使用K均值聚类方法，聚类数设定为步骤303得到的k，得到地理单元与k个类别的对应关系，即与k个城市功能区的对应关系，完成城市功能区探测。In step 304, the K-means clustering method is used for the similarity matrix W, and the number of clusters is set to k obtained in step 303, and the corresponding relationship between geographic units and k categories, that is, the corresponding relationship with k urban functional areas, is completed. Urban functional area detection.

具体地，步骤4中所述的每个功能区的显著特征地点的获得包括：利用步骤304的对应关系从步骤302生成的相似度矩阵W中抽取每个城市功能区对应的子空间矩阵S₁，...，S_i，...，S_k，并进行主成分分析，得到的特征向量[e₁，e₂，...，e_p，...，e_M]_i称为S_i的特征地点，将前r个累加特征值占比高于90％的特征向量[e₁，e₂，...，e_r]_i即为S_i的显著特征地点。Specifically, obtaining the salient feature locations of each functional area described in step 4 includes: extracting _a subspace matrix S1 corresponding to each urban functional area from the similarity matrix W generated in step 302 by using the corresponding relationship in step 304 , ..., S _i , ..., S _k , and perform principal component analysis, the obtained eigenvectors [e ₁ , e ₂ , ..., e _p , ..., e _M ] _i are called S The characteristic location of _i , the _eigenvectors [e ₁ , e ₂ _, _.

具体地，所述的每个功能区的主要功能的识别包括，将每个功能区的每一个显著特征地点变形为D行T列的矩阵，每一行表示该特征地点在T个时间段上以D为目的的活跃水平变化，得到功能区的主要活动模式，并用所述的主要活动模式中最活跃的功能标记该功能区，完成城市功能区识别。Specifically, the identification of the main function of each functional area includes: transforming each salient feature location of each functional area into a matrix with D rows and T columns, each row indicating that the feature location is in T time periods with The main activity pattern of the functional area is obtained by changing the activity level for the purpose of D, and the function area is marked with the most active function in the main activity pattern to complete the identification of the urban functional area.

进一步地，所述的城市功能区识别方法，还包括以下步骤：Further, the method for identifying urban functional areas further includes the following steps:

步骤5，计算每个功能区的相似度；Step 5, calculate the similarity of each functional area;

步骤6，计算每个功能区的独特度，对按所述的独特度对每个功能区进行排序。Step 6: Calculate the uniqueness of each functional area, and sort each functional area according to the uniqueness.

具体地，所述功能区的相似度的计算是根据对应子空间之间的主要角而计算的，任意两个功能区对应子空间S_k和S_l的相似度aff(S_k，S_l)计算公式如下，Specifically, the calculation of the similarity of the functional areas is calculated according to the main angle between the corresponding subspaces, and the similarity aff(S _k , S _l ) of the corresponding subspaces _Sk and S _l of any two functional areas Calculated as follows,

其中，

是

的第i个最大奇异值，U^k和U^l分别是S_k和S_l的正交基，

是子空间之间的主要角，d_k∧d_l表示S_k和S_l的空间维数d_k与d_l中的较小值；in,

Yes

The ith largest singular value of , U ^k and U ^l are the orthonormal basis of S _k and S _l , respectively,

is the main angle between the subspaces, d _k ∧ d _l represents the smaller of the spatial dimensions d _k and d _l of S _k and S _l ;

具体地，所述的功能区的独特度与相似度成反比，若子空间之间的相似度较高，对应的功能区的功能将会极大相似，则功能区的独特度低，每个功能区S_i的独特度计算公式如下：Specifically, the uniqueness of the functional areas is inversely proportional to the similarity. If the similarity between the subspaces is high, the functions of the corresponding functional areas will be very similar, and the uniqueness of the functional areas is low. The formula for calculating the _uniqueness of region Si is as follows:

其中k是总的功能区个数，S_-i表示除了S_i以外的功能区。where k is the total number of functional areas, and S _-i _represents the functional areas other than Si.

步骤7，计算每个功能区的丰度，对按所述的丰度对每个功能区进行排序。Step 7, calculate the abundance of each functional area, and sort each functional area according to the described abundance.

具体地，所述的功能区的丰度与每个功能区显著特征地点的重建误差有关，其计算如下：Specifically, the abundance of the functional areas is related to the reconstruction error of the salient feature locations of each functional area, which is calculated as follows:

其中C(S_i)是由属于子空间S_i的原始向量构成的矩阵，

是由S_i的显著特征地点重构的矩阵，|| ||_F表示矩阵的弗罗贝尼乌斯范数。where C(S _i ) is a matrix consisting of primitive vectors belonging to subspace S _i ,

is the matrix reconstructed from the _salient feature locations of Si, and || || _F denotes the Frobenius norm of the matrix.

本发明方法提出基于多子空间的模型，认为城市功能区拥有多组特征，当地理单元的时空活动信息用向量表达时，其向量样本便位于联合子空间构成的高维空间中，位于同一个子空间的地理单元所承载的人类活动动态特征是相似的，可以聚类为一个功能区，通过寻找子空间实现城市功能区的识别，并基于子空间的几何性质分析各功能区的独特度与丰度，为城市功能区的管理和发展提供了精细量化的指标指示。The method of the present invention proposes a model based on multiple subspaces, and considers that urban functional areas have multiple sets of characteristics. When the spatiotemporal activity information of geographic units is expressed by vectors, the vector samples are located in the high-dimensional space formed by the joint subspace, and are located in the same subspace. The dynamic characteristics of human activities carried by the geographical units of the space are similar, and can be clustered into a functional area. The identification of urban functional areas can be realized by finding subspaces, and the uniqueness and abundance of each functional area are analyzed based on the geometric properties of the subspaces. It provides precise and quantitative indicators for the management and development of urban functional areas.

附图说明Description of drawings

图1本发明方法的流程示意图；Fig. 1 is the schematic flow chart of the method of the present invention;

图2本发明方法实施例的流程示意图。Fig. 2 is a schematic flowchart of a method embodiment of the present invention.

图3本发明实施例中使用稀疏子空间聚类方法得出的相似度矩阵；Fig. 3 uses the similarity matrix that the sparse subspace clustering method obtains in the embodiment of the present invention;

图4本发明实施例探测城市功能区的结果；4 is the result of detecting urban functional areas according to an embodiment of the present invention;

图5本发明实施例中每个功能区的显著特征地点的功能活跃水平；5 is the functional activity level of the salient feature locations of each functional area in the embodiment of the present invention;

图6本发明实施例计算得出的功能区相似度；Fig. 6 Similarity of functional areas calculated by an embodiment of the present invention;

图7本发明实施例计算得出的功能区独特度和丰度。Fig. 7 Uniqueness and abundance of functional regions calculated by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合实施例和附图对本发明作进一步的说明，但不以任何方式对本发明加以限制，基于本发明教导所作的任何变换或替换，均属于本发明的保护范围。The present invention will be further described below in conjunction with the embodiments and the accompanying drawings, but the present invention is not limited in any way, and any transformation or replacement based on the teachings of the present invention belongs to the protection scope of the present invention.

如图1所示，一种基于多子空间模型的城市功能区识别方法，包括以下步骤：As shown in Figure 1, a method for identifying urban functional areas based on a multi-subspace model includes the following steps:

CZ＝C，Z_ii＝0CZ=C, Z _ii =0

其中

具体地，步骤4中所述的每个功能区的显著特征地点的获得包括：利用步骤304的对应关系从步骤302生成的相似度矩阵W中抽取每个城市功能区对应的子空间矩阵S₁，...，S_i，...，S_k，并进行主成分分析，得到的特征向量[e₁，e₂，...，e_p，…，e_M]_i称为5_i的特征地点，将前r个累加特征值占比高于90％的特征向量[e₁，e₂，...，e_r]_i即为S_i的显著特征地点。Specifically, obtaining the salient feature locations of each functional area described in step 4 includes: extracting _a subspace matrix S1 corresponding to each urban functional area from the similarity matrix W generated in step 302 by using the corresponding relationship in step 304 , ..., S _i , ..., _Sk , and perform principal component analysis, the obtained eigenvectors [e ₁ , e ₂ , ..., e _p , ..., e _M ] _i are called _5i 's _Feature locations, the _eigenvectors [ _e ₁ , e ₂ , .

具体地，所述的每个功能区的主要功能的识别包括，将每个功能区的每一个显著特征地点变形为D行T列的矩阵，每一行表示该特征地点在T个时间段上以D为目的的活跃水平变化，得到功能区的主要活动模式，并将所述的主要活动模式中最活跃的功能视为该地区的主要功能。Specifically, the identification of the main function of each functional area includes: transforming each salient feature location of each functional area into a matrix with D rows and T columns, each row indicating that the feature location is in T time periods with D for the purpose of changing the activity level, get the main activity pattern of the functional area, and regard the most active function in the main activity pattern as the main function of the area.

具体地，所述功能区的相似度的计算是根据子空间之间的主要角而计算的，任意两个功能区S_k和S_l的相似度计算公式如下，Specifically, the calculation of the similarity of the functional areas is calculated according to the main angle between the subspaces, and the calculation formula of the similarity of any two functional areas S _k and S _l is as follows:

其中，

是

的第i个最大奇异值，U^k和U^l分别是S_k和S_l的正交基，

是子空间之间的主要角，d_k∧d_l表示S_k和S_l的空间维数d_k与d_l中的较小值。in,

Yes

is the main angle between the subspaces, and d _k ∧ d _l denotes the smaller of the spatial dimensions d _k and d _l of _Sk and S _l .

其中C(S_i)是由属于子空间S_i的原始向量构成的矩阵，

是由S_i的显著特征地点重构的矩阵。where C(S _i ) is a matrix consisting of primitive vectors belonging to subspace S _i ,

is the matrix reconstructed from the _salient feature locations of Si.

重建误差描述的是由显著特征地点还原原始子空间矩阵的差值，重建误差越大，表明除了显著特征地点占主导地位之外，还需要更多的特征地点来描绘功能区中的动态变化。丰度考察的是这一个地区内人们的丰富活动模式，以及可支撑这种活动模式的功能发展。The reconstruction error describes the difference between the original subspace matrix restored by the salient feature sites. The larger the reconstruction error, the more feature sites are needed to describe the dynamic changes in the functional area in addition to the dominant feature sites. Abundance examines the patterns of enrichment activity of people in an area and the functional development that supports this pattern of activity.

如图2所示的流程，本发明进行实验包括以下步骤。As shown in the flow chart in FIG. 2 , the experiment of the present invention includes the following steps.

(1)数据处理(1) Data processing

步骤1.1：选择上海主要城区作为研究区域，划分格网大小为500米×500米，剔除水体单元以后，得到3166个地理单元。Step 1.1: Select the main urban area of Shanghai as the study area, and divide the grid size into 500 meters × 500 meters. After excluding the water body units, 3166 geographic units are obtained.

步骤1.2：对来自上海市内6600辆出租车的GPS轨迹数据预处理，剔除异常点，提取每次行程的终点和到达时间，并将终点与地理单元进行映射，得到7852724条到访记录。Step 1.2: Preprocess the GPS trajectory data from 6,600 taxis in Shanghai, remove outliers, extract the end point and arrival time of each trip, and map the end point to the geographic unit to obtain 7,852,724 visit records.

步骤1.3：将签到数据记录与地理单元的到访记录进行匹配，对每次到访的目的进行分类，到访目的有六种类型：家，交通，工作，餐饮，娱乐和其他(指去公园，博物馆，图书馆等地方)。Step 1.3: Match the check-in data records with the visit records of the geographic unit, classify the purpose of each visit, there are six types of visit purposes: home, transportation, work, dining, entertainment and other (referring to going to the park). , museums, libraries, etc.).

步骤1.4：将一天按小时划分，得到24个时间段，统计因各目的(总数6)在24个时间段访问每个地理单元的次数，得到144行3166列的时序特征矩阵C。Step 1.4: Divide a day into hours to obtain 24 time periods, count the number of visits to each geographic unit in 24 time periods for each purpose (total 6), and obtain a time series feature matrix C with 144 rows and 3166 columns.

(2)城市功能区识别(2) Identification of urban functional areas

步骤2.1：输入时序特征矩阵C至稀疏子空间聚类算法，得到相似度矩阵W，图3为可视化相似度矩阵结果，它揭示了地理单元之间的相似性，相似度值非零则上色黑色，可以看到存在五个块对角，这种结构揭示了城市功能区的个数为5。Step 2.1: Input the time series feature matrix C to the sparse subspace clustering algorithm to obtain the similarity matrix W. Figure 3 shows the visual similarity matrix result, which reveals the similarity between geographic units. If the similarity value is non-zero, it will be colored Black, it can be seen that there are five block diagonals, this structure reveals that the number of urban functional areas is 5.

步骤2.2：利用W的归一化拉普拉斯矩阵L计算子空间个数，L＝I-D^-1/2WD^-1/2，其中I是单位矩阵，D＝∑_iW_ii。将L的特征值升序排列，计算每两个相邻特征值的差值λ_k+1-λ_k，最大差值对应的k为5，即子空间(城市功能区)个数为5，与步骤2.1判读结果一致。Step 2.2: Calculate the number of subspaces by using the normalized Laplacian matrix L of W, L=ID ^-1/2 WD ^-1/2 , where I is the identity matrix, and D=∑ _i Wi _ii . Arrange the eigenvalues of L in ascending order, calculate the difference λ _{k+1 -} λ _k of each two adjacent eigenvalues, the k corresponding to the largest difference is 5, that is, the number of subspaces (urban functional areas) is 5, and The interpretation results of step 2.1 are consistent.

步骤2.3：因此，对W使用K均值聚类方法，聚类数设定为5，完成城市功能区探测，得到城市功能区1，2，3，4，5。聚类结果在地图上可视化结果见附图4，可以看到中心区域主要由功能区5覆盖。Step 2.3: Therefore, use the K-means clustering method for W, set the number of clusters to 5, complete the detection of urban functional areas, and obtain urban functional areas 1, 2, 3, 4, and 5. The visualization results of the clustering results on the map are shown in Figure 4. It can be seen that the central area is mainly covered by the functional area 5.

步骤2.4：由于功能区的主要功能由功能区的显著活动特征决定，为确定探测出的城市功能区的实际功能，对每一个功能区对应的子空间矩阵进行主成分分析，得到各功能区的特征地点，发现功能区1、2、3、4中的前5个特征值占比超过90％，而功能区5中的前5个特征值占比少于90％，因此，我们以前5个特征值对应的基向量作为每个功能区的显著特征地点，而在分析功能区5的时候，以前10个特征值对应的基向量作为其显著特征地点。Step 2.4: Since the main function of the functional area is determined by the significant activity characteristics of the functional area, in order to determine the actual function of the detected urban functional area, the principal component analysis is performed on the subspace matrix corresponding to each functional area, and the Feature locations, it is found that the top 5 eigenvalues in functional areas 1, 2, 3, and 4 account for more than 90%, while the top 5 eigenvalues in functional area 5 account for less than 90%. Therefore, our previous 5 The basis vectors corresponding to the eigenvalues are used as the salient feature locations of each functional area, and when analyzing functional area 5, the basis vectors corresponding to the previous 10 eigenvalues are used as its salient feature locations.

步骤2.5：将每个功能区的每一个显著特征地点变形为6行24列的矩阵，则每一行分别表示该特征地点在24个小时以家(H)，交通(Tr)，工作(W)，餐饮(D)，娱乐(E)和其他(O，指去公园，博物馆，图书馆等地方)为目的的活跃水平变化，所有功能区的显著特征地点如附图5所示。由图可知，家庭活动(H)在功能区1的显著特征地点中最活跃，就餐活动(D)排在第二位，而娱乐活动(E)也较为突出，因此，功能区1可作为餐饮和娱乐设施配套发展的居住区；同理，功能区2交通活动(Tr)突出为交通枢纽；功能区3主要活动为工作(W)，所以是工作区；对于功能区5，主要衡量了前10个显著特征地点的影响，发现就餐活动(D)和娱乐活动(E)活跃，视其为商业区；功能区4则对应为公园、博物馆、加油站等其他功能区域。Step 2.5: Transform each salient feature location of each functional area into a matrix with 6 rows and 24 columns, then each row represents the feature location in 24 hours at home (H), traffic (Tr), work (W) , catering (D), entertainment (E) and other (O, referring to parks, museums, libraries, etc.) for the purpose of activity level changes, the salient features of all functional areas are shown in Figure 5. It can be seen from the figure that family activities (H) are the most active among the salient features of functional area 1, dining activities (D) are ranked second, and entertainment activities (E) are also more prominent. Therefore, functional area 1 can be used as a restaurant. Residential area developed with entertainment facilities; similarly, the traffic activity (Tr) of functional area 2 is prominent as a transportation hub; the main activity of functional area 3 is work (W), so it is a work area; The influence of 10 salient feature locations, it is found that dining activities (D) and entertainment activities (E) are active, and they are regarded as commercial areas; functional area 4 corresponds to other functional areas such as parks, museums, and gas stations.

(3)城市功能区分析(3) Analysis of urban functional areas

步骤3.1：根据子空间之间的主要角计算子空间的临近度，即功能区的相似度，见附图6，功能区本身相似度不计算，设置为0。附图6中居民区和商业区的相似度最高，因为居民区更有可能有餐饮和娱乐设施，附图3中商业区所在位置本身也混杂了大量的居民区。Step 3.1: Calculate the proximity of the subspaces according to the main angle between the subspaces, that is, the similarity of the functional area, see Figure 6, the similarity of the functional area itself is not calculated, and is set to 0. The residential area and the commercial area in Figure 6 have the highest similarity, because the residential area is more likely to have dining and entertainment facilities, and the location of the commercial area in Figure 3 itself is also mixed with a large number of residential areas.

步骤3.2：根据功能区相似度计算功能区的独特度，结果见附图7。功能区独特度的整体值较高，表明研究区域的总体功能区差异显著。其中居民区和商业区的独特度较低，这与(3)中步骤3.1的结果也是相符的。Step 3.2: Calculate the uniqueness of the functional area according to the similarity of the functional area. The results are shown in Figure 7. The overall value of the uniqueness of the functional area is higher, indicating that the overall functional area of the study area is significantly different. Among them, the uniqueness of residential area and commercial area is low, which is also consistent with the result of step 3.1 in (3).

步骤3.3：计算功能区的丰度，结果见附图7。其中其它功能区(提供其他服务的区域)的重构误差最大意味着其它功能区中的活动模式最复杂，因为包含设施多，动态活动模式差异大。而居民区和商业区的重构误差最小，因为它们分别集中于居住，和餐饮、娱乐上，功能动态活动模式较为单一。Step 3.3: Calculate the abundance of functional regions, the results are shown in Figure 7. Among them, the reconstruction error of other functional areas (areas that provide other services) is the largest, which means that the activity patterns in other functional areas are the most complicated, because there are many facilities and the dynamic activity patterns vary greatly. The reconstruction error of residential area and commercial area is the smallest, because they are concentrated on living, dining and entertainment, respectively, and the functional dynamic activity mode is relatively simple.

由发明内容和实施例可知，为了解决现有技术中存在的问题，本发明提出基于多子空间的模型，认为城市功能区拥有多组特征，当地理单元的时空活动信息用向量表达时，其向量样本便位于联合子空间构成的高维空间中，位于同一个子空间的地理单元所承载的人类活动动态特征是相似的，可以聚类为一个功能区，通过寻找子空间实现城市功能区的识别，并基于子空间的几何性质分析各功能区的独特度与丰度，为城市功能区的管理和发展提供了精细量化的指标指示。It can be seen from the content of the invention and the examples that, in order to solve the problems existing in the prior art, the present invention proposes a model based on multiple subspaces, and considers that urban functional areas have multiple sets of characteristics. The vector samples are located in the high-dimensional space formed by the joint subspace, and the dynamic characteristics of human activities carried by the geographic units located in the same subspace are similar, and can be clustered into a functional area, and the identification of urban functional areas can be realized by finding the subspace. , and analyzes the uniqueness and abundance of each functional area based on the geometric properties of the subspace, providing a refined and quantitative indicator for the management and development of urban functional areas.

Claims

1. an urban functional area identification method based on a multi-subspace model, is characterized in that, comprises the following steps:

Step 1: Obtain the taxi trajectory data and check-in data in the research area;

Step 2, constructing a partition-oriented time series feature matrix C based on the visiting purpose;

Step 3, input the time series feature matrix C to the sparse subspace clustering algorithm, and calculate the corresponding relationship between the geographic unit and the urban functional area;

Step 4, obtain the salient feature locations of each functional area, and then identify the main function of each functional area;

Wherein, the construction process of the time series feature matrix C described in step 2 includes the following steps:

Step 201, dividing the research area to obtain N geographic units;

Step 202, preprocessing the taxi trajectory data, removing abnormal points, extracting the end point and arrival time of each trip, and mapping the end point with the geographic unit to obtain the visit record of the geographic unit;

Step 203, matching the check-in data record with the visit record of the geographic unit, and classifying the purpose of each visit;

Step 204, constructing a time series feature matrix C with M rows and N columns, which represents the human activity dynamics carried by the geographic unit within a period of time, where M=T×D, T represents the number of divided time periods, and D represents the number of categories for visiting purposes. , each column in C represents the number of people visiting the corresponding geographic unit for different purposes in different time periods;

The sparse subspace clustering algorithm described in step 3 includes the following steps:

Step 301, solve the coefficient matrix Z, the size is N×N, the matrix Z needs to satisfy the minimization under the constraint of l ₁ :

CZ=C, Z _ii =0

in

Step 302, then use the coefficient matrix to establish a data similarity matrix W=|Z|+|Z| ^T , the size of W is N×N, and the value in the matrix is the similarity in time series characteristics between the geographic units corresponding to the index Spend;

Step 303: Calculate the number of subspaces by using the normalized Laplacian matrix L of the similarity matrix W, L=ID ^-1/2 WD ^-1/2 , where I is the identity matrix, D=∑ _i W _ij , Arrange the eigenvalues of L in ascending order, and calculate the difference λ _{k+1 -} λ _k of each two adjacent eigenvalues. The k corresponding to the largest difference is the number of subspaces to be sought, that is, the number of urban functional areas to be detected. number;

In step 304, the K-means clustering method is used for the similarity matrix W, and the number of clusters is set to k obtained in step 303, and the corresponding relationship between geographic units and k categories, that is, the corresponding relationship with k urban functional areas, is completed. Urban functional area detection.

2 . The method for identifying urban functional areas according to claim 1 , wherein the obtaining of the salient feature locations of each functional area described in step 4 comprises: using the corresponding relationship in step 304 to generate the similarity from step 302 . 3 . _Extract the _subspace _matrices _S ₁ , . ., e _p , ..., e _M ] _i is called the feature location of Si _, and the first _r accumulated eigenvalues account for more than 90% of the eigenvectors [e ₁ , e ₂ , ..., er ] _i is the _salient feature location of Si;

The identification of the main function of each functional area includes: transforming each salient feature location of each functional area into a matrix with D rows and T columns, each row representing that the feature location takes D as the purpose in T time periods. The main activity pattern of the functional area is obtained, and the function area is marked with the most active function in the main activity pattern to complete the identification of the urban functional area.

3. The method for identifying urban functional areas according to claim 1 or 2, wherein the method for identifying urban functional areas further comprises the following steps:

Step 5, calculate the similarity of each functional area;

Step 6: Calculate the uniqueness of each functional area, and sort each functional area according to the uniqueness;

The calculation of the similarity of the functional area is calculated according to the main angle between the corresponding subspaces, and the calculation formula of the similarity aff(S _k , S _l ) of the corresponding subspaces S _k and S _l of any two functional areas is as follows ,

in,

Yes

is the main angle between subspaces, d _k ∧ d _l represents the smaller of the spatial dimensions d _k and d _l of S _k and S _l ;

The uniqueness of the functional areas is inversely proportional to the similarity. If the similarity between the subspaces is high, the functions of the corresponding functional areas will be extremely similar, and the uniqueness of the functional areas is low, and each functional area S _i The uniqueness calculation formula of is as follows:

where k is the total number of functional areas, and S _-i _represents the functional areas other than Si.

4. The method for identifying urban functional areas according to claim 3, wherein the method for identifying urban functional areas further comprises the following steps:

Step 7: Calculate the abundance of each functional area, and sort each functional area according to the abundance;

The abundance of the described functional areas is related to the reconstruction error of the salient feature sites of each functional area, which is calculated as follows:

where C(S _i ) is a matrix consisting of primitive vectors belonging to subspace S _i ,