[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Identifying Coffee Agroforestry System Types Using Multitemporal Sentinel-2 Data and Auxiliary Information
Previous Article in Journal
Peatland Plant Spectral Response as a Proxy for Peat Health, Analysis Using Low-Cost Hyperspectral Imaging Techniques
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data

1
School of Civil Engineering and Geomatics, Southwest Petroleum University, Chengdu 610500, China
2
State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China
3
College of Earth Science, Chengdu University of Technology, Chengdu 610059, China
4
Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2022, 14(16), 3848; https://doi.org/10.3390/rs14163848
Submission received: 7 July 2022 / Revised: 27 July 2022 / Accepted: 6 August 2022 / Published: 9 August 2022
(This article belongs to the Section Urban Remote Sensing)
Figure 1
<p>Processes of building facade extraction.</p> ">
Figure 2
<p>Transformation of three points from the Cartesian coordinate system <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>z</mi> <mo stretchy="false">)</mo> </mrow> </semantics></math> into Hough space <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <mi>θ</mi> <mo>,</mo> <mi>φ</mi> <mo>,</mo> <mi>ρ</mi> <mo stretchy="false">)</mo> </mrow> </semantics></math>: (<b>a</b>) a surface corresponds to a point in the Cartesian coordinate system; and (<b>b</b>) three surfaces correspond to three points, and the black diamond intersection of the surfaces represents the plane spanning the three points.</p> ">
Figure 3
<p>Schematic of peak fuzziness: (<b>a</b>) Cartesian coordinate system points; 2D points with noise (blue) bounce around true points (orange), and the orange line indicates the true line (<math display="inline"><semantics> <mrow> <mi>y</mi> <mo>=</mo> <mo>−</mo> <mi>x</mi> <mo>−</mo> <mn>1</mn> </mrow> </semantics></math>); (<b>b</b>) transforming the true points to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red dot indicates the intersection of all lines; (<b>c</b>) Hough parameter space with noise; transforming the points with noise to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red box shows the peak fuzziness.</p> ">
Figure 4
<p>Schematic of the 3D high-pass filtering convolution kernel: the size of the convolution kernel is 5 × 5; its center pixel (green) is 1/2; others are determined according to the distance from the pixel to the center based on the inverse distance weighted method; the sum of the pixels of the entire convolution kernel is 1.</p> ">
Figure 5
<p>Different facades are identified as one: (<b>a</b>) two facades whose spatial locations are adjacent to each other are considered as one facade but are not coplanar; (<b>b</b>) similar facades of different buildings are incorrectly considered as one facade.</p> ">
Figure 6
<p>Processes of building facade map extraction.</p> ">
Figure 7
<p>Architecture of Faster R-CNN. The cyan parallelogram, yellow parallelogram, green parallelogram, and purple parallelogram represent the convolutional layer, pooling layer, relu layer, and full connection layer, respectively. <span class="html-italic">P</span> × <span class="html-italic">Q</span> and <span class="html-italic">M</span> × <span class="html-italic">N</span> represent the height and width of the image. “cls_prob” represent the bounding box’s probability of various classes.</p> ">
Figure 8
<p>Raw experimental data for the IQmulus &amp; Terra Mobilita Contest dataset: (<b>a</b>) the entire IQmulus &amp; Terra Mobilita Contest dataset; the red dashed box shows the extent of the experimental data area; (<b>b</b>,<b>c</b>) the experimental data in 3D view and 2D view, respectively; and (<b>d</b>) sample areas with misalignment in the point cloud data.</p> ">
Figure 9
<p>Facade extractions from the IQmulus &amp; TerraMobilita Contest dataset: (<b>a</b>–<b>c</b>) the results in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (<b>d</b>–<b>f</b>) the results in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray for nonfacade point clouds. Roman numeral I was used to number the facades extracted by the proposed method and Roman numeral II was used for the VCIR method.</p> ">
Figure 10
<p>Violin plots and bar plot for the errors of facade extraction for the IQmulus &amp; TerraMobilita Contest dataset: (<b>a</b>) violin plot of the distances between points and facade, where the shape of the violin displays the probability density distribution of the data; the black bar depicts the interquartile range, the 95% confidence interval is shown by the inner line branching from it, and the median is shown by a white dot; and (<b>b</b>) bar plot of the total facades error, where the vertical bars show the mean of the data and the error line shows the 95% confidence interval.</p> ">
Figure 11
<p>Facade map extraction with the IQmulus &amp; TerraMobilita Contest dataset: the number in the upper left corner corresponds to the facade number in <a href="#remotesensing-14-03848-f009" class="html-fig">Figure 9</a>; the red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; and the background image shows the single-band feature image, and the darker the image pixel color, the greater the number of points contained in the corresponding planar grid.</p> ">
Figure 12
<p>Box plot of the accuracy of window extraction for the IQmulus &amp; TerraMobilita Contest dataset: (<b>a</b>) the accuracy with a min IoU of 0.5; and (<b>b</b>) the accuracy with a min IoU of 0.85. The different colored boxes represent different precision indicators. The upper and lower quartiles of the data are shown by the box’s upper and lower boundaries, respectively, and the median is shown by the inner horizontal line. The whiskers extending from the ends of the boxes are used to represent variables other than the upper and lower quartiles, and outliers are represented by black dots.</p> ">
Figure 13
<p>Raw experimental data for the Semantic3D.Net Benchmark dataset: (<b>a</b>,<b>b</b>) the processed point cloud data of the “domfountain” scene in 2D view and 3D view, respectively; and (<b>c</b>,<b>d</b>) the processed point cloud data of the “marketsquarefeldkirch” scene in 2D view and 3D view, respectively.</p> ">
Figure 14
<p>Facade extraction with the Semantic3D.Net Benchmark dataset: (<b>a</b>–<b>c</b>) results of the “domfountain” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; (<b>d</b>–<b>f</b>) results of the “domfountain” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; (<b>g</b>–<b>i</b>) results of the “marketsquarefeldkirch” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (<b>j</b>–<b>l</b>) results of the “marketsquarefeldkirch” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray was used for nonfacade point clouds. Capital English letters were used to label these facades.</p> ">
Figure 15
<p>Violin plots and bar plot for the errors of facade extraction for Semantic3D.Net Benchmark dataset: (<b>a</b>,<b>c</b>) violin plots for distances between points and facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the shape of the violin describes the data’s probability density distribution; the black bar describes the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown by a white dot; and (<b>b</b>,<b>d</b>) bar plots for total errors of the facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the vertical bars show the mean of the data, and the error line shows the 95% confidence interval.</p> ">
Figure 16
<p>Facade map extraction with the Semantic3D.Net Benchmark dataset: (<b>a</b>–<b>d</b>) the facade map extraction results for the “domfountain” scene of facades A1, A3, A4, and A8, respectively; and (<b>e</b>–<b>g</b>) the facade map extraction results for the “marketplacefeldkirch” scene of facades C2, C3, and C5, respectively. The red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; the background image shows the single-band feature image; and the darker the image pixel color is, the greater the number of points contained in the corresponding planar grid.</p> ">
Figure 17
<p>Box plot of the accuracy of window extraction with the Semantic3D.Net Benchmark dataset: (<b>a</b>,<b>c</b>) the accuracy with a minimum IoU of 0.5 of the “domfountain” and “marketplacefeldkirch” scenes, respectively; and (<b>b</b>,<b>d</b>) the accuracy with a minimum IoU of 0.85 of the “domfountain” and “marketplacefeldkirch” scenes, respectively. The different colored plots represent different precision indicators. The shape of the violin displays the data’s probability density distribution; the black bar depicts the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown as a white dot.</p> ">
Figure 18
<p>Details of planes extracted by the VCIR method (<b>a</b>,<b>c</b>) and the details of facades extracted by the proposed method (<b>b</b>,<b>d</b>): (<b>a</b>,<b>b</b>) the details of plane II4 extracted by the VCIR method corresponding to the facades I10 and I11 extracted by the proposed method; and (<b>c</b>,<b>d</b>) the details of plane II1 extracted by the VCIR method corresponding to the facade I4 extracted by the proposed method.</p> ">
Figure 19
<p>Details of facade map extraction: (<b>a</b>,<b>c</b>,<b>e</b>) results of the facade map extraction for facades I11, C3, and C5, respectively; and (<b>b</b>,<b>d</b>,<b>f</b>) results of zooming in on the misaligned area.</p> ">
Versions Notes

Abstract

:
Extracting facade maps from 3D point clouds is a fast and economical way to describe a building’s surface structure. Existing methods lack efficiency, robustness, and accuracy, and depend on many additional features such as point cloud reflectivity and color. This paper proposes a robust and automatic method to extract building facade maps. First, an improved 3D Hough transform is proposed by adding shift vote and 3D convolution of the accumulator to improve computational efficiency and reduce peak fuzziness and dependence on the step selection. These modifications make the extraction of potential planes fast and accurate. Second, the coplane and vertical plane constraints are introduced to eliminate pseudoplanes and nonbuilding facades. Then, we propose a strategy to refine the potential facade and to achieve the accurate calibration and division of the adjacent facade boundaries by clustering the refined point clouds of the facade. This process solves the problem where adjoining surfaces are merged into the same surface in the traditional method. Finally, the extracted facade point clouds are converted into feature images. Doors, windows, and building edges are accurately extracted via deep learning and digital image processing techniques, which combine to achieve accurate extraction of building facades. The proposed method was tested on the MLS and TLS point cloud datasets, which were collected from different cities with different building styles. Experimental results confirm that the proposed method decreases computational burden, improves efficiency, and achieves the accurate differentiation of adjacent facade boundaries with higher accuracy compared with the traditional method, verifying the robustness of the method. Additionally, the proposed method uses only point cloud geometry information, effectively reducing data requirements and acquisition costs.

1. Introduction

Buildings form the dominant artificial objects in urban scenes. The requirements for accurate building geometries and three-dimensional (3D) building models are growing in tandem with the expansion of urban planning, smart city construction, and building information modeling (BIM). How to efficiently and accurately obtain these data and the information required for 3D modeling is a key issue [1]. Building facade maps represent the geometric features of building surfaces, such as the edges of windows, doors, and other vital structures. Facade maps can directly serve urban renewal, urban planning, etc., while providing a flexible and straightforward approach to retrieving large-scale building models [1,2]. Laser scanning provides a quick and accurate method to gather 3D point cloud data (PCD) from 3D objects [3]. Thus, how to extract the required geometric features from 3D PCD accurately and robustly should be determined.
Multiple methods to extract facade maps from 3D PCDs have been proposed, and direct or indirect extraction is the most common approach [1]. In direct extraction methods, facade maps are obtained directly from raw or processed 3D PCD by computing geometric information. Given that 3D PCD storage is unstructured, building facade maps are typically extracted by random sample consensus (RANSAC) [4], region growing [5], or semantic feature-based approaches [1,6,7]. These algorithms are typically efficient and concise but only apply to specific situations and rely on good data quality. Slicing-based methods are other commonly used direct extraction methods that can effectively extract facade maps using hole and edge detection, and are easy to use [8,9,10]. However, they are strongly affected by occlusion. In contrast, indirect extraction, which includes segmentation and feature extraction, is a more prevalent approach [11]. Segmentation segregates a group of points into several single surfaces or regions. Building segmentation, which separates various sides of a building, including walls and roofs, from one another, is typically a precursor of feature extraction. Fuzzy clustering [12,13], 3D Hough transform (HT) [14,15,16], RANSAC [17], and other methods are often used for building segmentation. The fuzzy clustering method has high complexity, and its results depend on the initialization parameters. The RANSAC method has higher accuracy and is less affected by noise but can only match one instance at a time and typically achieves multiple instance acquisition via iterative elimination [17,18]. Thus, its results are strongly influenced by the algorithm parameters and convergence conditions, making the results unstable. The 3D HT method can extract multiple instances from point cloud data at once, but the step size limits its accuracy. Some models suited to 3D PCD were proposed with the prosperity of deep learning, and outstanding achievements in the field of point cloud segmentation have been made [19,20,21]. However, these models are complex and have strict hardware requirements, while having poor universality to different scenarios and even different data. Feature extraction involves extracting architectural features (e.g., doors and windows) from segmented parts. Commonly used methods include a priori semantic features, slicing, region growth, etc. [11,22], which typically achieve better results when data quality is high, showing low robustness and generalizability.
In general, existing methods typically achieve good results in ideal environments [4], but in practice, due to the occlusion, noise, and the uneven density of PCDs, they still have marked limitations in: (1) Strict data requirements. Some methods (e.g., clustering-based approaches and improved region growth methods [22,23]) rely on a variety of feature information, which makes it difficult to manage PCD containing only coordinate information, raising the hardware requirements and cost of PCD acquisition. Many algorithms also lack robustness and perform poorly when data quality is poor; (2) High manual work requirements. It is challenging to automatically extract building facade maps directly from unordered PCD; (3) Low transferability. Deep-learning-based point cloud segmentation methods are highly automated, but have low transferability due to the point cloud data’s unstructured characteristics and unstable data quality. One method or model can work well on specific data but performs poorly on others.
To address these limitations, we propose a new method for automatic and robust building facade map extraction. This method can extract building facade geometry and the edges of windows and doors based only on the coordinate information of the 3D PCD without the assistance of other feature information (e.g., laser intensity, color). The IQmulus & TerraMobilita Contest and Semantic3D.Net Benchmark datasets were used to test the proposed method. The primary contributions of this study are as follows:
(1) A new method is presented to extract building facades from 3D PCD. First, an improved 3D HT algorithm is proposed by adding shift vote and 3D convolution to the 3D HT, which improves the accuracy and efficiency of potential facade extraction. Then, the improved 3D HT and RANSAC are combined to achieve potential facade refinement. Thus, the facade extraction’s accuracy and robustness markedly increase compared with the conventional 3D HT and RANSAC. Additionally, the improved 3D HT method is more robust and has a lower data dependency than the deep-learning-based point cloud segmentation methods, which are data-driven.
(2) A facade boundary calibration method is proposed. Planes in a mathematical sense without a definite range are transformed into real building facades with definite boundaries using a density-based clustering method. This method can distinguish facades from other objects and different facades in proximity, improving the extraction accuracy of building facades and avoiding different facades being mistakenly merged into one.
(3) A new way to extract building facade maps from feature images is proposed. The Faster R-CNN model, a classical deep-learning-based image object detection model, is introduced to extract the door and window edges from the feature images. This method achieves better results despite poor data quality (e.g., presence of occlusion, noise, uneven density) compared to traditional geometry-based methods [24].

2. Methodology

The proposed method includes two steps: building facade extraction and building facade map extraction. A 3D PCD was imported as input, each facade equation and its corresponding range were obtained by building facade extraction, and point cloud division was implemented to accurately identify different building facades. Based on these facade data, the building facade map was obtained by building facade map extraction.
Building facade extraction includes three steps: (1) potential plane acquisition; (2) facade constraint; and (3) facade precise extraction. Building facade map extraction also includes three steps: (1) feature image generation; (2) door and window detection; and (3) building boundary extraction. More details are described in the following subsections.

2.1. Building Facade Extraction

In order to overcome the common defects (i.e., high influence by data factors such as noise, uneven density, occlusion, no clear planar boundary, etc.) in point cloud plane segmentation methods, a new method is proposed to extract building facades that has a high robustness compared to traditional methods. The proposed method’s workflow is shown in Figure 1 and consists of three primary steps: potential plane acquisition, facade constraints, and facade precise extraction.

2.1.1. Improved 3D HT for Potential Plane Acquisition

Potential plane acquisition includes two steps: point cloud data preprocessing and plane equation extraction based on improved 3D HT. The purpose of preprocessing is to remove marked nonfacade point clouds and reduce the computation of subsequent algorithms. The improved 3D HT is primarily used for efficient and accurate potential plane acquisition.
(1) Point cloud data preprocessing.
Because point cloud data typically contain many ground points, and their density is typically high, these ground points must be removed first. In addition, the point cloud is panned to the origin of the coordinate system, and voxel downsampling is performed to reduce the computation volume. Eventually, statistical outlier removal is performed on the downsampled data to remove the point cloud noise.
(2) Improved 3D Hough transform.
This study aims to find a robust building facade extraction method to reduce data dependence. Thus, model-driven methods are more applicable than deep-learning-based point cloud segmentation methods, which are data-driven. For the plane detection of 3D PCD, Borrmann et al. proposed 3D HT based on the General Hough Transform (GHT) [25,26], which is a common model-driven method for point cloud plane detection. This method maps all planes that may pass through a point p i into a surface in the Hough parameter space, with each point on the surface corresponding to a plane in the Cartesian coordinate system (Figure 2a). Multiple parametric surfaces form one or more intersections in the parametric space (Figure 2b). We thus count the number of planes that cross each intersection, and the plane corresponding to the highest cumulative number of intersections is the desired plane.
For the 3D HT, the greatest challenge is choosing a step size. The discretization step sizes s θ , s φ , and s ρ strongly affect plane extraction. Smaller discretization steps typically result in higher accuracies, but for every 1X increase in angular resolution (including s θ and s φ ), the algorithm’s computation and memory overhead increase by 4X, which is particularly important with large PCD. Thus, we propose using the shift vote strategy. In the discretization of plane parameters, ρ is discretized into the sets Q , and θ and φ are discretized into the sets M { 0 , s θ , 2 s θ , , 2 π } and N { 0 , s φ , 2 s φ , , 2 π } according to the aforementioned strategy, respectively. Next, copies for each element offset by s θ / 2 and s φ / 2 for M and N are created as:
M = { s θ / 2 , s θ + s θ / 2 , 2 s θ + s θ / 2 , 2 π + s θ / 2 } N = { s φ / 2 , s φ + s φ / 2 , 2 s φ + s φ / 2 , 2 π + s φ / 2 }
Thus, accumulators can be created as follows:
A = M × N × Q = { ( θ j , φ j , ρ i j ) | θ j M φ j N ρ i j Q } A = M × N × Q = { ( θ j , φ j , ρ i j ) | θ j M φ j N ρ i j Q }
A and A are voted on, and the candidate plane sets S ( θ , φ , ρ ) and S ( θ , φ , ρ ) satisfying the conditions are obtained. Then, the concatenated set S S is considered to be the final candidate plane set. Thus, the angular resolution is doubled, while the number of computations is only increased by 2× and has the same memory overhead, achieving a balance of precision and efficiency. The optimal s θ and s φ are both 1°, which can achieve a good balance of precision and efficiency. The setting of s ρ depends on the input data’s range and the available memory. The recommended s ρ range is 0.2~1 m. If s ρ is larger than 1 m, the plane detection precision may be too low. Correspondingly, if s ρ is smaller than 0.2 m, the marginal effect is marked, and the required memory increases without significant improvement in precision. Another major challenge is peak fuzziness, a prevalent issue with the Hough transform. Considering 2D HT as an example, a point in the Cartesian coordinate system corresponds to a curve in the Hough space. Theoretically, the parameter curves corresponding to points representing the same line should intersect at one point. Due to step settings, data noise, etc., these lines typically do not strictly intersect together (Figure 3; i.e., peak fuzziness), which causes difficulties for extraction, and this problem also exists in 3D HT. Therefore, we propose to perform 3D high-pass filtering on the accumulator of 3D HT to remove the low-frequency part of the accumulator and weaken the effect of peak fuzziness. The convolution kernel of 3D high-pass filtering is shown in Figure 4. The center cell of the convolution kernel is 1/2, and other cells are determined according to the inverse distance weighted method, with a sum of 1. Finally, all potential planes are obtained by performing peak detection on the filtered accumulator.

2.1.2. Facade Constraints

Based on data quality, algorithm parameter settings, etc., the roughly extracted planes inevitably contain many pseudoplanes and nonbuilding planes. We introduce facade constraints to remove these planes and obtain the real building facades. This strategy includes coplane constraint and vertical plane constraint.
Coplane constraint. The purpose of coplane constraint is to eliminate the pseudoplanes caused by excessive point cloud density, inappropriate threshold setting, and peak fuzziness, which is primarily determined by three parameters: plane dihedral angle, plane distance [27], and common point ratio. If plane p 1 and p 2 satisfy:
( ( arccos n 1 n 2 | n 1 | | n 2 | α t h ) ( m a x ( | r 12 n 1 | , | r 12 n 2 | ) Δ d t h ) ) ( C o m P r o p ( p 1 , p 2 ) c p t h )
They are regarded as coplanes and merged, where r 12 is the distance vector of the origin’s foot to lines r 1 and r 2 , which are perpendicular to planes p 1 and p 2 ; n 1 and n 2 are the normal vectors of planes p 1 and p 2 , respectively; C o m P r o p is the operator for estimating the proportion of common points between two planes based on the plane with fewer points; and α t h , Δ d t h , and c p t h are the thresholds corresponding to plane dihedral angle, plane distance, and common point proportion, and are suggested to equal 5°, 1 m, 70%, respectively.
Vertical plane constraint. The improved 3D HT extracts all potential planes in the PCD, which contain both building facades and other planes. Because the building facade should be vertical, we eliminate other planes by constraining the vertical angle of each plane after the coplane constraint:
arccos ( m n | m | | n | ) α v , t h
where m and n are the normal vectors of the current plane and the vertical plane, respectively; and α v , t h is the threshold for the vertical plane constraint, where 75° can be used in most scenarios. A plane that does not meet the plane constraint will be discarded. After the coplane constraint and vertical plane constraint, the potential facades are obtained.

2.1.3. Precise Extraction of Facade

The result after facade constraint is still an infinitely extended plane in the mathematical sense. Figure 5a shows that two noncoplanar facades are regarded as one tilted plane, and Figure 5b shows that similar facades of different buildings are considered to be one plane. This situation does not meet the requirements of facade extraction and reduces the accuracy of the facade. Therefore, it is necessary to separate these point clouds in some ways to obtain a clear range of the facade. In addition, nonplanar point clouds near the plane (e.g., trees, street lights, vehicles, and other feature point clouds) are easily mistaken for plane point clouds; thus, another role of facade precise extraction is to remove these nonplanar point clouds as much as possible. According to the goal of this study, facade precise extraction includes three parts: facade refinement, facade boundary calibration, and facade constraint. The facade constraint method is described in Section 2.1.2. This subsection introduces facade refinement and facade boundary delineation.
Facade refinement. We can enhance facade extraction accuracy by the improved 3D HT. However, its accuracy is still affected by parameter settings. The RANSAC method has the characteristics of low noise influence and no step size limitation, and can obtain a high accuracy facade from a large number of PCDs. Therefore, we propose a facade refinement strategy for further improving the quality of facade data resulting from the improved 3D HT. The overall process includes: (1) obtaining the plane equation of the potential facade corresponding to each point cloud cluster by RANSAC and acquiring the new potential facade; and (2) removing the coplanes and pseudoplanes by facade constraint to obtain the refined building facade. We focus on facade extraction, and the building facade is typically vertical (i.e., the plane of C = 0 in the planar general equation A x + B y + C z + D = 0 ). Therefore, we set C = 0 when extracting the plane to improve the facade extraction accuracy.
Facade boundary calibration. Investigating the characteristics of point clouds, their density in each building’s facade is found to be much higher than those in the interstices between facades, which agrees with the idea of the density-based clustering method. Therefore, the density-based clustering method is used for facade boundary calibration. The hierarchical density-based spatial clustering of applications with noise (HDBSCAN) method [28] is a commonly used cluster method and can cluster large-scale data robustly and efficiently. Therefore, the HDBSCAN method is used to cluster the point clouds after facade refinement. After clustering, the RANSAC method is implemented to extract facade equations and the corresponding facade point clouds from each point cloud cluster. The bounding boxes of the facade point clouds are considered to be facade boundaries. By performing the facade constraint described in Section 2.1.2 on the results of the facade boundary calibration, the bounded building facades and their corresponding point cloud data are obtained.

2.2. Building Facade Map Extraction

After obtaining the bounded building facades and their corresponding point cloud data, it is still difficult to extract facade maps from the disordered and unstructured point cloud data with varying qualities. Building facade maps can be divided into two parts: door and window boundaries and building boundaries. With the advancement of deep learning in recent years, the precision and speed of image object detection have markedly improved [29,30,31,32,33,34], achieving better performance than traditional methods. The extraction of doors and windows is a form of object detection; thus, it is possible to use deep learning methods to extract window and door boundaries. Because building boundaries are typically large and complex in shape, it is difficult to extract them effectively using existing deep learning methods; thus, we chose to use digital image processing technology (e.g., image enhancement, filtering, edge detection, morphological processing, and connected domain analysis) to extract them according to the boundary features. By combining the extracted door and window boundaries with the building boundaries, the required building facade map is obtained. The proposed approach includes three steps: feature image generation, door and window detection, and building boundary extraction. The workflow of this process is shown in Figure 6.
Feature image generation. The building facade point cloud set B ( x , y , z ) obtained from building facade extraction is converted into a point set B ( x , y , z ) with the corresponding facade as the reference coordinate system by the following equation:
α = a b s ( arctan ( A / B ) ) x = cos α x + sin α y y = z z = cos α y sin α x
where A and B are the plane general equation coefficients of the corresponding building facade. At this time, the height parameter z of the point cloud relative to the plane is discarded, and the 2D plane point cloud B ( x , y ) is acquired by projecting the 3D point cloud onto the corresponding facade. A grid is created using the specified edge length, and the suggested length is less than 0.05 m to ensure edge extraction precision. Then, the 2D point cloud is divided by the grid. To rasterize the point cloud into a single-band 2D image, the number of point clouds within each grid is used as the pixel value. To enhance the image, histogram equalization, bandpass filtering after histogram equalization, and Prewitt edge detection are performed on the single-band 2D image. These processes can weaken the effects of uneven density among the point cloud and density differences between different point cloud datasets, improving the proposed method’s robustness. A zero in the single-band feature image is masked before image enhancement to avoid its effect, particularly for histogram equalization and bandpass filtering. The resultant bands of the three operations are synthesized into a 3-band feature image, and the histogram equalization result is considered to be the single-band feature image. These two feature images are the final required feature images.
Doors and windows detection. In this study, we use a deep learning model to extract building doors, windows and their boundaries from 3-band feature images. Considering the characteristics of building door and window extraction, which requires high accuracy without realtime processing, the Faster R-CNN model [33] is used for extraction because it has higher accuracy in the image object detection. The model architecture is shown in Figure 7. The most important feature of this model is the design of Region Proposal Networks. It generates the candidate regions using feature maps after the convolution operation. This can achieve higher detection speed and ensure higher accuracy compared to Selective Search, Edge Boxes, and other methods. Because there is only one target in the same plane and the same position of the facade, the detection results are nonmaximal suppressed, and only the results with the highest confidence are retained in the same position.
Building boundary extraction. Building boundaries are determined using digital image processing and optimization on the single-band feature images. The specific processes are: (1) performing a closing operation on the feature images to reduce noise and occlusion; (2) 8-adjacent connectivity domain detection; (3) vectorizing the maximum connectivity domain as the initial building boundary; (4) filling the void on the initial building boundary; and (5) boundary simplification and orthogonalization [35]. Finally, the optimized building boundaries are combined with the window and door boundaries to obtain the final building facade map.

3. Experiments and Results

In this section, the proposed method was evaluated on the IQmulus & TerraMobilita Contest dataset [36] and the Semantic3D.Net Benchmark dataset [37], which were obtained by mobile laser scanning (MLS) and terrestrial laser scanning (TLS), respectively. To verify the validity of the proposed method, the General Iterative RANSAC (GIR) and Vertical Constrained Iterative RANSAC (VCIR) methods were also used to extract the facade based on point cloud data. Section 3.1 introduces the results on the IQmulus & TerraMobilita Contest dataset, and Section 3.2 introduces the results on the Semantic3D dataset.

3.1. Results of the IQmulus & TerraMobilita Contest Dataset

The IQmulus & TerraMobilita Contest dataset [36] is 3D MLS data collected in Paris (France), consists of 300 million points, and covers approximately 10 km of streets within a square km of the 6th district of Paris. Most streets are covered in this square kilometer area; thus, the dataset is representative of this part of Paris. Due to the technology limitations of current laser scanning techniques, there are some problems with this dataset, such as uneven density, noise, occlusion, etc. In addition, there are stitching misalignments in some areas (Figure 8d). This dataset is often used for outdoor point cloud segmentation, from which fast and accurate extraction of building facade maps remains a challenge. A subset of this dataset was selected as the experimental data, and the data range is shown in the red box in Figure 8.
Some preprocessing procedures were performed prior to facade extraction. First, the IQmulus & TerraMobilita Contest dataset is divided into nine data files and cannot be used directly. Thus, the nine files were merged into one file. Next, the point cloud data outside the subexperimental area were discarded, and the attributes other than coordinates (e.g., reflectivity and echo times) were removed to reduce the amount of data. Subsequently, these data were preprocessed as described in Section 2.1.1. The point cloud was voxel-downsampled with a 5 cm voxel side length, and the processed data consisted of 23 million points. Figure 8b,c show the processed point cloud data in 3D and 2D views, respectively.
Then, building facades were extracted from the preprocessed point cloud data by the method in Section 2.1, and the discretization step s ρ of the improved 3D HT was set to 0.4 m. From the experimental data, thirteen planes were extracted using the proposed method. Seven and five planes were obtained using the GIR and VCIR methods, respectively. The extraction results of the three methods are shown in Figure 9. To distinguish these planes easily, we used Roman numerals I and II to number the planes extracted by the proposed and VCIR methods, respectively. Figure 9b shows that the GIR method cannot be applied to this scenario at all and fails to extract the correct building facades, which are all horizontal. Both the VCIR method and the proposed method can extract most facades from the experimental data. Therefore, we only compared the proposed method and VCIR method results.
To quantitatively evaluate the effects of these two methods, the mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were used to assess the facade extraction errors. For the single facade errors (SFE), the three accuracy indices were defined as:
MAE i = ( j = 1 n | A i x i j + B i y i j + C i z i j + D i | A i 2 + B i 2 + C i 2 ) / n MSE i = ( j = 1 n ( A i x i j + B i y i j + C i z i j + D i ) 2 A i 2 + B i 2 + C i 2 ) / n RMSE i = MS E i
where A, B, C, and D are the plane general equation coefficients of the building facade, and n is the number of points in the corresponding facade. Additionally, to evaluate the overall facade extraction errors (OFEE), the means of each facade error (MEFE) and overall facade error (OFE) were defined as:
MEFE = i = 1 m S F E i m OFE = i = 1 m ( SFE i m i ) i = 1 m m i
where m is the number of the facades. The SFEs, MEFEs, and OFEs of the two methods were calculated, and the results are shown in Table 1. Table 1 shows that the overall MAE and MSE of the proposed facade extraction method are 0.314 m and 0.194 m, respectively, which are only half of the overall errors of 0.500 m and 0.403 m of the VCIR method. Among the planes extracted by the proposed method, the minimum MSE is only 0.085 m, and the average is only 0.271 m, while the corresponding values of the VCIR method are 0.208 m and 0.439 m.
To evaluate the effectiveness of the two methods in more detail, violin plots and bar plots were used to show the distribution of the MEFE for the two methods (Figure 10). As shown in Figure 10a, the median and quartiles of the three errors of the proposed method are smaller than those of the VCIR method. The probability density distribution of the proposed method also looks like a fusiform, which is narrow at the top and wide at the bottom, and the error is low overall. In contrast, the probability density distribution of the VCIR method is gourd-shaped, and the error distribution is relatively uniform. The standard deviation and median value of the MAE extracted by the proposed method are 0.17 and 0.39, respectively, which are markedly lower than the corresponding 0.21 and 0.53 extracted by the VCIR method. As shown in Figure 10b, the average errors of the three errors of the proposed method are smaller than those of the VCIR method, and the error distribution is more concentrated and stable. Thus, the proposed method outperforms the VCIR method in terms of facade extraction accuracy for the IQmulus & TerraMobilita Contest dataset.
Next, building facade map extraction was performed based on the building facades obtained by the proposed method. First, the feature images were generated as described in Section 2.2 and the edge length for feature image generation was set to 0.02 m. The windows and doors in the generated 3-band feature images were labeled using the ArcGIS Pro platform, and a total of 948 windows and 53 doors were obtained. Then, the feature images were sliced using these labeled windows and doors with a 256 × 256 slicing size and 50% overlap. Because windows and doors have typical horizontal and vertical characteristics, rotational transformations were not used for the sliced results. A total of 2835 sets of samples were generated after slicing, and these samples were separated into a training set and a validation set at a 7:3 ratio. Due to the small number of samples included in each facade, the testing set was not generated, but the entire sample of each facade was used for the final model accuracy evaluation. The next step established and trained the Faster R-CNN model. The ResNet50 pretraining model was used as a backbone, the training batch size was 4, and the number of training epochs was 200. For the learning rate determination, Smith proposed the cyclical learning rates (CLR) method [38], achieving the best training status of model parameters without adding additional computation. This method was used to determine the best learning rate of the model, and the optimal learning rate was 1.096 × 10−4. Finally, the door and window boundaries were extracted from all building facade feature images using the trained model. The final building facade maps were obtained by combining the final door and window boundaries with the building boundaries obtained by the digital image processing method.
Considering the three typical facades I3, I8, and I9 as examples, the facade map extraction results are shown in Figure 11. As shown in the figure, the vast majority of doors and windows were detected successfully, and the extracted door and window boundaries are primarily distributed horizontally and vertically, agreeing with the real door and window boundaries. The accuracy of building boundary extraction is marginally lower, while the overall trend matches the real trend, which means that the extraction effect of the proposed method meets the expectation.
Additionally, the results of window extraction are evaluated quantitatively. Precision, recall, F1 score, accuracy, confusion matrix, average precision (AP), intersection over union (IoU), and other indices are often used to evaluate the accuracy of results in the field of object detection. Because the window extraction in this paper is for single-category object detection, the confusion matrix and other indicators for multicategory object detection are not applicable; therefore, the remaining four indicators are used for the facade. The extraction accuracy for each facade is shown in Table 2. The overall accuracy, recall, and F1 score of window rough extraction with min IoU set at 50% reached 0.982, 0.977, and 0.979, respectively, which means that the vast majority of windows can achieve rough extraction correctly. For window precise extraction with min IoU set at 85%, the overall accuracy, recall, and F1 score reached 0.887, 0.882, and 0.884, respectively. The minimum F1 score and AP were 0.774 and 0.621, respectively, and the corresponding averages were 0.990 and 0.827, respectively, which means that most windows can obtain accurate edges.
To describe the extraction accuracy in more detail, box plots were drawn using each facade map extraction accuracy index (Figure 12). Figure 12a shows that, at a minimum IoU of 50%, the medians of the four accuracy indices all exceed 0.97, the lower quartiles all exceed 0.96, and the precision averages are close to 1.00. Figure 12b shows that, at a minimum IoU of 85%, the median of accuracy, recall, and the F1 score are all over 0.90, and the lower quartiles are all over 0.85. Thus, the proposed method can obtain a good result with the MLS dataset, producing a rough extraction of nearly all parts of the windows and accurate extraction of most of the windows.

3.2. Results of the Semantic3D.Net Benchmark Dataset

The Semantic3D.Net Benchmark dataset [37] is a 3D TLS point cloud dataset that was scanned statically with modern equipment and contains fine details. This dataset contains over four billion points and covers a range of diverse urban scenes, such as churches, streets, railroad tracks, squares, villages, soccer fields, castles, etc. In this paper, the “domfountain” and “marketsquarefeldkirch” subsets of this dataset were used to evaluate the proposed method for TLS point cloud data and were primarily collected in the cathedral and market square. All but the coordinates have been deleted to reduce the amount of data. Then, these data were preprocessed as described in Section 2.1.1. The point cloud was voxel-downsampled with a 5 cm voxel side length, and the processed data of these two scenes contained 76 million points and 38 million points, respectively. The processed point cloud data are shown in Figure 13.
Then, building facades were extracted from the processed point cloud data by the method in Section 2.1, and the discretization step s p of the improved 3D HT was set to 0.4 m. The extraction results of the three methods are shown in Figure 14. For the “domfountain” scene, the proposed method and VCIR method extracted nine and four planes, respectively. For the “marketsquarefeldkirch” scene, nine and four planes were extracted by the proposed method and VCIR method, respectively. To distinguish these planes easily, we use the capital English letters A, B, C, and D to number the planes extracted by the proposed and VCIR method for the “domfountain” scene and “marketsquarefeldkirch” scene, respectively. Figure 14 shows that the GIR method cannot be applied to this scenario at all, and the VCIR method and the proposed method can extract most facades from the experimental data. The results are thus similar to those with the IQmulus & TerraMobilita Contest dataset. Therefore, we only compared the proposed method and VCIR method in the following.
Additionally, the MAE, MSE, and RMSE were used to evaluate the facade extraction errors quantitatively, and the results are shown in Table 3. In the “domfountain” scene, the overall MAE and MSE of the proposed facade extraction method are 0.335 m and 0.222 m, respectively; the overall MAE and MSE of the proposed facade extraction method for the “marketplacefeldkirch” scene are 0.296 m and 0.198 m, respectively. The two scenes’ total errors are only half of the corresponding overall errors extracted by the VCIR method. To evaluate the effectiveness of the two methods for the TLS point cloud data in more detail, violin plots and bar plots were used to show the distribution of the MEFE for the two methods (Figure 15). As shown in Figure 15a,c, the median and quartiles of the three errors of the proposed method are smaller than those of the VCIR method, and the errors of the proposed method mostly are typically small, while the errors of the VCIR method are evenly distributed. The standard deviation and median of the MAE for the “marketplacefeldkirch” scene extracted by the proposed method are 0.16 and 0.37, respectively, which are markedly lower than the corresponding values of 0.25 and 0.64 extracted by the VCIR method. As shown in Figure 15b,d, the average errors of the three errors of the proposed method are smaller than those of the VCIR method, and the error distribution is more concentrated and stable. Thus, the proposed method outperforms the VCIR method in terms of facade extraction accuracy for the Semantic3D.Net Benchmark dataset.
Next, feature image generation and door and window sample labeling were performed in the same way as with the IQmulus & TerraMobilita Contest dataset. A total of 482 windows and 98 doors were obtained, and a total of 898 sets of samples were generated for the two scenes. Additionally, these samples are separated into a training set and a validation set at a 7:3 ratio. The testing set was not generated, while the entire sample of each facade was used for the final model accuracy evaluation. The Faster R-CNN model was created and trained in the same way as for the IQmulus & TerraMobilita Contest dataset. The door and window boundaries were then extracted from all building facade feature images using the trained model. The final building facade maps were obtained by combining the final door and window boundaries with the building boundaries obtained by the digital image processing method. The facade map extraction results are shown in Figure 16. Figure 16a,e show the facade maps of facades A1 and C2, respectively. The vast majority of windows have been detected successfully, and the extracted window boundaries are primarily distributed horizontally and vertically, which are in good agreement with the real boundaries.
Additionally, precision, recall, the F1 score, AP, and IoU were used to quantitatively evaluate the results of window extraction. The window extraction’s accuracy for each facade is shown in Table 4. The overall accuracy, recall, and the F1 score of window rough extraction with min IoU set to 50% for the “domfountain” scene reach 0.936, 0.970, and 0.953, respectively. The corresponding accuracy indices of window rough extraction for the “marketplacefeldkirch” scene reach 0.981, 0.984, and 0.983, respectively. These results indicate that the vast majority of windows can be roughly extracted correctly. For window precise extraction with min IoU set to 85%, the overall accuracy, recall, and F1 score for the “domfountain” scene reach 0.884, 0.916, and 0.900, respectively. The corresponding accuracy indices of window precise extraction for the “marketplacefeldkirch” scene reach 0.962, 0.965, and 0.964, respectively. These results indicate that most windows are given accurate edges.
To describe extraction accuracy, violin plots were drawn using each facade map extraction accuracy index (Figure 17). Considering Table 4, Figure 17a,b, the distribution of the window extraction accuracy for the “domfountain” scene is not uniform. The accuracy of the vast majority of facades is high, while the accuracy of facades A3, A4 and A8 is near zero. These results are primarily due to the difference between these facades’ window and door shapes and other facades’ window and door shapes. For the window extraction accuracy for the “marketplacefeldkirch” scene, at a minimum IoU of 50%, the lower quartiles all exceed 0.95, and the medians of the four accuracy indices all reach 1.0 (Figure 17c). At a minimum IoU of 85%, the lower quartiles all exceed 0.92, and the medians of the four accuracy indices all exceed 0.94 (Figure 17d). The facades with relatively low accuracy are facades C3 and C5, which is primarily due to the misalignment of the point cloud data (Figure 16f,g). Thus, the proposed method can obtain good results with the TLS point cloud data, producing rough extractions of nearly all parts of the windows and accurate extractions of most windows.

4. Discussion

Building facade map extraction is an important research topic in point cloud information extraction, and many studies have proposed different extraction methods from different perspectives. For example, slicing methods can detect windows and doors of any shape [8,11]. Maas and Vosselman proposed two algorithms to extract building models based on triangular meshes and plane intersections [14]. All of these methods are elegant and can achieve good results on good datasets. However, due to the complexity of the real world and various problems in data acquisition and processing, the collected point cloud data inevitably contain many problems, such as occlusion and misalignment. These problems make it difficult to achieve good results with traditional methods. Deep-learning-based methods have powerful information extraction capabilities, and many deep-learning-based point cloud segmentation and information extraction models have been proposed [19,20,21,39]. However, there are still many challenges to obtain building facade maps directly from the 3D PCD using deep-learning-based methods. First, the 3D PCD is unstructured and large in volume, which increases the difficulty and complexity of information extraction. In addition, deep-learning-based methods of point cloud processing are highly data-dependent and difficult to adapt to data collected by different approaches for different cities, different scenes, and different densities and occlusions.
Considering these data problems, this paper proposes a novel building facade map extraction method to improve the quality of facade map extraction from point cloud data with poor quality. For example, the proposed method combines traditional model-driven methods (the 3D HT and RANSAC methods) and the data-driven deep-learning-based method. Model-driven methods have been used to extract the facades from the 3D PCD and generate the feature images. Thus, the unstructured 3D PCD has been transformed into structured 2D feature images. These processes decreased the data dimension and complexity and reduced the variability between different datasets. Then, a deep-learning-based object detection methods was used to obtain the window and door boundaries based on the 2D feature images. This method can learn different cases regarding data problems such as occlusion and misalignment, improving robustness. In addition, for facade extraction, many traditional methods, including the VCIR method described in this paper, treat facades as mathematical planes, which extend infinitely in space; thus, it is inevitable to identify similar adjacent facades as the same facade. Figure 9c,d and Table 5 show that one plane extracted by the VCIR method typically corresponds to multiple facades that are extracted by the proposed method. Thus, considering the bounded facades, the proposed method performs the facade boundary calibration based on the HDBSCAN algorithm after the initial facade extraction. Because the point cloud density inside each building facade is typically high, while the density inside the gap between facades is low, the HDBSCAN method which is based on density clustering can effectively distinguish adjacent facades. Thus, this method improves the accuracy of facade extraction and can remove the point clouds of nonfacades. Figure 18 shows the advantages of the proposed method in facade discrimination through partial enlargement. Different colors represent different facades, and gray point clouds represent nonfacade point clouds. Considering Table 5 and Figure 18c,d, the proposed method can correctly distinguish the three facades I10, I11, and I12, while the VCIR method incorrectly considers these three facades to be the same facade (facade II4). Figure 18a,b show that the maximum distances from the point cloud on facades I10 and I11 extracted by the proposed method to the corresponding facade are 0.93 m and 0.37 m, respectively. The error of the proposed facade extraction method is near half that of the VCIR method. In addition, the VCIR method regards point clouds of other objects adjacent to the facade as facade point clouds. For example, tree point clouds are calibrated as facade II1’s point clouds (i.e., the red dotted box in the lower-left corner shown in Figure 18c), while the proposed method can correctly distinguish them (Figure 18b).
As mentioned above, occlusion and misalignment are common quality problems in point cloud data that can strongly impact facade map extraction. To reduce their impact on the facade map extraction results, the Faster R-CNN model is used to implement window and door boundary extraction. This model has a good ability to detect object boundaries from feature images and was trained based on both normal and quality problem samples, enabling the final model to extract window and door boundaries with good accuracy from point cloud data containing some occlusion and misalignment problems. Compared with traditional facade map extraction methods such as slicing-based methods, the proposed method can manage data problems more effectively. Figure 19 shows the results of the point cloud data with misalignment. Although the misalignment problem is serious in some areas of the three facades, the proposed method can still obtain window boundaries with high quality.
In addition, to evaluate the robustness of the proposed method, we experimented with two different datasets: the IQmulus & TerraMobilita Contest dataset, and the Semantic3D.Net Benchmark dataset. These two datasets were 3D point cloud data collected using the MLS and TLS approaches, respectively, with different data acquisition principles and point cloud characteristics such as density and distribution. Additionally, the two datasets correspond to different cities and scenes, and even the styles of the buildings between the two datasets are remarkably different. Experimental results show that the proposed method can achieve good results on both datasets, highlighting the good data adaptability and robustness of the proposed method. The accuracy of the MLS dataset’s results is better than the TLS dataset’s results’ in both facade extraction and facade map extraction due to the acquisition principle of both methods. The MLS approach constantly moves and scans, acquiring data in more views than the TLS approach, which only acquires data at a few sites. Thus, the MLS dataset has fewer occlusion problems than the TLS dataset. Concurrently, the TLS approach only collects data at a few sites, leading to large differences in density within the point cloud data, while the MLS approach has less variation in overall density due to its mobile scanning. Therefore, compared with the TLS point cloud data, the MLS point cloud data is more suitable for building facade map extraction.
Despite these successes, the proposed method still has some limitations. First, although the shift vote improves the efficiency and memory expenditure of the 3D HT method, it is still inadequate due to the shortcomings of the GHT method itself. More efficient HT algorithms such as the Kernel-based Hough Transform method [15] should likely replace the GHT method in the future. Second, although many methods are used in this study to improve the accuracy of facade map extraction, it is difficult to manage irregularly shaped windows and doors. This issue occurs because the Faster R-CNN model is primarily applicable to rectangular objects. In particular, the TLS dataset contains more arches and irregular windows, which reduces its window and door boundary extraction accuracy. In the future, more object detection models, such as Mask R-CNN, should be used to improve the extraction accuracy of irregular window and door boundaries.
Due to the complexity of the real world and various problems in the acquisition and processing of point cloud data, building facade map extraction based on point cloud data still has many challenges. However, there are still considerable advantages in the efficiency and cost of automatically extracting building facade maps based on point cloud data compared to traditional manual measurements. The facade maps extracted by the proposed method can be used in real production with a small amount of manual correction and can be helpful for urban modeling and planning.

5. Conclusions

An automatic and robust method to accurately extract building facade maps can be applied to urban old city reconstruction and urban planning and may serve to reconstruct large-scale 3D building models. This paper proposes a new method to extract building facade maps automatically and robustly based on 3D point cloud data. The entire process of the proposed method is divided into two steps: building facade extraction and building facade map extraction. In building facade extraction, we first improve the 3D HT algorithm to alleviate the peak fuzziness and dependence on step size selection of the traditional 3D HT algorithm using shift vote and 3D convolution with the accumulator. Then, we combine various algorithms, such as RANSAC and HDBSCAN, to differentiate adjacent facades and accurately extract facade point clouds. For building facade map extraction, we combine the Faster R-CNN model in deep learning image object detection and digital image processing techniques to achieve facade map extraction robustly and precisely. With the input of 3D point cloud data, the proposed method can automatically generate the facade maps of each facade.
The proposed method was evaluated on the IQmulus & TerraMobilita Contest dataset and the Semantic3D.Net Benchmark dataset, which were obtained using the MLS and TLS approaches, respectively. For the IQmulus & TerraMobilita Contest dataset, the total MAE and MSE of the extracted building facade are less than 0.32 m and 0.2 m, which is only approximately half of the corresponding error of the VCIR method (0.55 m and 0.44 m, respectively). The average MAE and MSE for a single facade are less than 0.41 m and 0.28 m, respectively, while the corresponding accuracy indices of the VCIR method are 0.56 m and 0.44 m, respectively. With the Semantic3D.Net Benchmark dataset, the total MAE and MSE of the extracted building facade are less than 0.34 and 0.23 m, which is only approximately half of the corresponding error of the VCIR method (0.5 m and 0.42 m, respectively). The average MAE and MSE for a single facade are less than 0.41 m and 0.28 m, respectively, while the corresponding accuracy indices of the VCIR method are 0.69 m and 0.64 m, respectively. These results indicate that facade extraction accuracy is markedly higher with the proposed method. In building facade map extraction, for the IQmulus & TerraMobilita Contest dataset, the minimum and average AP50 of window boundary extraction reach 0.938 and 0.976, and the minimum and average AP85 reach 0.621 and 0.827. With the Semantic3D.Net Benchmark dataset, the average AP50 and AP85 of window boundary extraction are all over 0.86 and 0.77, which means windows can be extracted correctly and the method can obtain an accurate facade map for most facades.
In this study, we present a new, robust method that can automatically extract accurate building facade point clouds and vectorized facade maps from point cloud data with uneven density, noise, and occlusion, and does not require auxiliary information such as point cloud intensity and color. The method’s robustness has been validated with two datasets, which were collected using different approaches from different cities with different building styles. It is a beneficial attempt for point cloud information extraction and building 3D reconstruction. Although shift vote is used to improve the efficiency of the proposed method on facade extraction, the method is still limited by the inefficiency of the GHT method itself. In the future, the efficiency of the potential plane acquisition could be improved with higher performance HT methods, such as Kernel-based Hough Transform. Moreover, the deep learning image segmentation method will be considered to identify building edges to improve the accuracy of building edge extraction.

Author Contributions

Funding acquisition, B.Y. and X.D.; Methodology, J.H. and B.Y.; Supervision, X.D.; Validation, D.X., T.W., Y.H. and B.W.; Visualization, J.H. and B.Z.; Writing—original draft, J.H.; Writing—review & editing, B.Y., X.D. and K.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly supported by the National Natural Science Foundation of China (grant numbers: 41941019, 42072306), the Youth Science Fund of the National Natural Science Foundation of China (grant numbers: 41801399), the China Postdoctoral Science Foundation (grant numbers: 2019M653476), and the Young Teachers “Passing Academic Barriers” Funding Program of Southwest Petroleum University (grant numbers: 201599010140).

Acknowledgments

The authors appreciate the IQmulus & TerraMobilita Contest for the free deliveries of the point cloud data. Constructive comments from the anonymous reviewers are also greatly appreciated.

Conflicts of Interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Wang, Y.; Ma, Y.; Zhu, A.; Zhao, H.; Liao, L. Accurate Facade Feature Extraction Method for Buildings from Three-Dimensional Point Cloud Data Considering Structural Information. ISPRS J. Photogramm. Remote Sens. 2018, 139, 146–153. [Google Scholar] [CrossRef]
  2. Liang, X.; Fu, Z.; Sun, C.; Hu, Y. MHIBS-Net: Multiscale Hierarchical Network for Indoor Building Structure Point Clouds Semantic Segmentation. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102449. [Google Scholar] [CrossRef]
  3. Wang, Q.; Kim, M.-K. Applications of 3D Point Cloud Data in the Construction Industry: A Fifteen-Year Review from 2004 to 2018. Adv. Eng. Inform. 2019, 39, 306–319. [Google Scholar] [CrossRef]
  4. Malihi, S.; Valadan Zoej, M.J.; Hahn, M. Large-Scale Accurate Reconstruction of Buildings Employing Point Clouds Generated from UAV Imagery. Remote Sens. 2018, 10, 1148. [Google Scholar] [CrossRef] [Green Version]
  5. Teboul, O.; Simon, L.; Koutsourakis, P.; Paragios, N. Segmentation of building facades using procedural shape priors. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3105–3112. [Google Scholar]
  6. Xie, L.; Zhu, Q.; Hu, H.; Wu, B.; Li, Y.; Zhang, Y.; Zhong, R. Hierarchical Regularization of Building Boundaries in Noisy Aerial Laser Scanning and Photogrammetric Point Clouds. Remote Sens. 2018, 10, 1996. [Google Scholar] [CrossRef] [Green Version]
  7. Xie, L.; Hu, H.; Zhu, Q.; Li, X.; Tang, S.; Li, Y.; Guo, R.; Zhang, Y.; Wang, W. Combined Rule-Based and Hypothesis-Based Method for Building Model Reconstruction from Photogrammetric Point Clouds. Remote Sens. 2021, 13, 1107. [Google Scholar] [CrossRef]
  8. Zhou, M.; Ma, L.; Li, Y.; Li, J. Extraction of building windows from mobile laser scanning point clouds. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4304–4307. [Google Scholar]
  9. Hao, W.; Wang, Y.; Liang, W. Slice-Based Building Facade Reconstruction from 3D Point Clouds. Int. J. Remote Sens. 2018, 39, 6587–6606. [Google Scholar] [CrossRef]
  10. Li, J.; Xiong, B.; Biljecki, F.; Schrotter, G. A Sliding Window Method for Detecting Corners of Openings from Terrestrial LiDAr Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 97–103. [Google Scholar] [CrossRef] [Green Version]
  11. Zolanvari, S.I.; Laefer, D.F. Slicing Method for Curved Facade and Window Extraction from Point Clouds. ISPRS J. Photogramm. Remote Sens. 2016, 119, 334–346. [Google Scholar] [CrossRef]
  12. Biosca, J.M.; Lerma, J.L. Unsupervised Robust Planar Segmentation of Terrestrial Laser Scanner Point Clouds Based on Fuzzy Clustering Methods. ISPRS J. Photogramm. Remote Sens. 2008, 63, 84–98. [Google Scholar] [CrossRef]
  13. Dong, Z.; Yang, B.; Hu, P.; Scherer, S. An Efficient Global Energy Optimization Approach for Robust 3D Plane Segmentation of Point Clouds. ISPRS J. Photogramm. Remote Sens. 2018, 137, 112–133. [Google Scholar] [CrossRef]
  14. Maas, H.-G.; Vosselman, G. Two Algorithms for Extracting Building Models from Raw Laser Altimetry Data. ISPRS J. Photogramm. Remote Sens. 1999, 54, 153–163. [Google Scholar] [CrossRef]
  15. Limberger, F.A.; Oliveira, M.M. Real-Time Detection of Planar Regions in Unorganized Point Clouds. Pattern Recognit. 2015, 48, 2043–2053. [Google Scholar] [CrossRef] [Green Version]
  16. Xu, Y.; Ye, Z.; Huang, R.; Hoegner, L.; Stilla, U. Robust Segmentation and Localization of Structural Planes from Photogrammetric Point Clouds in Construction Sites. Autom. Constr. 2020, 117, 103206. [Google Scholar] [CrossRef]
  17. Adam, A.; Chatzilari, E.; Nikolopoulos, S.; Kompatsiaris, I. H-RANSAC: A Hybrid Point Cloud Segmentation Combining 2D and 3D Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 4, 1–8. [Google Scholar] [CrossRef] [Green Version]
  18. Ebrahimi, A.; Czarnuch, S. Automatic Super-Surface Removal in Complex 3D Indoor Environments Using Iterative Region-Based RANSAC. Sensors 2021, 21, 3724. [Google Scholar] [CrossRef]
  19. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
  20. Lin, H.; Wu, S.; Chen, Y.; Li, W.; Luo, Z.; Guo, Y.; Wang, C.; Li, J. Semantic Segmentation of 3D Indoor LiDAR Point Clouds through Feature Pyramid Architecture Search. ISPRS J. Photogramm. Remote Sens. 2021, 177, 279–290. [Google Scholar] [CrossRef]
  21. Chen, Y.; Wu, R.; Yang, C.; Lin, Y. Urban Vegetation Segmentation Using Terrestrial LiDAR Point Clouds Based on Point Non-Local Means Network. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102580. [Google Scholar] [CrossRef]
  22. Haghighatgou, N.; Daniel, S.; Badard, T. A Method for Automatic Identification of Openings in Buildings Facades Based on Mobile LiDAR Point Clouds for Assessing Impacts of Floodings. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102757. [Google Scholar] [CrossRef]
  23. Alshawabkeh, Y. Linear Feature Extraction from Point Cloud Using Color Information. Herit. Sci. 2020, 8, 28. [Google Scholar] [CrossRef]
  24. Díaz-Vilariño, L.; Khoshelham, K.; Martínez-Sánchez, J.; Arias, P. 3D Modeling of Building Indoor Spaces and Closed Doors from Imagery and Point Clouds. Sensors 2015, 15, 3491–3512. [Google Scholar] [CrossRef] [Green Version]
  25. Duda, R.O.; Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM 1972, 15, 11–15. [Google Scholar] [CrossRef]
  26. Borrmann, D.; Elseberg, J.; Lingemann, K.; Nüchter, A. The 3D Hough Transform for Plane Detection in Point Clouds: A Review and a New Accumulator Design. 3D Res. 2011, 2, 3. [Google Scholar] [CrossRef]
  27. Li, N.; Ma, Y.; Yang, Y.; Gao, S. An Improved Method of Lee Refined Polarized Filter. Sci. Surv. Mapp. 2011, 36, 144–145+138. [Google Scholar]
  28. Campello, R.J.G.B.; Moulavi, D.; Sander, J. Density-based clustering based on hierarchical density estimates. In Advances in Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
  29. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 580–587. [Google Scholar]
  30. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
  31. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot MultiBox detector. In Computer Vision—ECCV 2016; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
  32. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. arXiv 2016, arXiv:1506.02640. [Google Scholar]
  33. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef] [Green Version]
  34. Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
  35. Esri Automation of Map Generalization: The Cutting-Edge Technology. 1996. White Paper. Redlands, ESRI Inc. Available online: http://downloads.esri.com/support/whitepapers/ao_/mapgen.pdf (accessed on 12 June 2022).
  36. Vallet, B.; Brédif, M.; Serna, A.; Marcotegui, B.; Paparoditis, N. TerraMobilita/IQmulus Urban Point Cloud Analysis Benchmark. Comput. Graph. 2015, 49, 126–133. [Google Scholar] [CrossRef] [Green Version]
  37. Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3d. Net: A New Large-Scale Point Cloud Classification Benchmark. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 41, 91–98. [Google Scholar] [CrossRef] [Green Version]
  38. Smith, L.N. Cyclical learning rates for training neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 464–472. [Google Scholar]
  39. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Neural Information Processing Systems (NIPS): La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
Figure 1. Processes of building facade extraction.
Figure 1. Processes of building facade extraction.
Remotesensing 14 03848 g001
Figure 2. Transformation of three points from the Cartesian coordinate system ( x , y , z ) into Hough space ( θ , φ , ρ ) : (a) a surface corresponds to a point in the Cartesian coordinate system; and (b) three surfaces correspond to three points, and the black diamond intersection of the surfaces represents the plane spanning the three points.
Figure 2. Transformation of three points from the Cartesian coordinate system ( x , y , z ) into Hough space ( θ , φ , ρ ) : (a) a surface corresponds to a point in the Cartesian coordinate system; and (b) three surfaces correspond to three points, and the black diamond intersection of the surfaces represents the plane spanning the three points.
Remotesensing 14 03848 g002
Figure 3. Schematic of peak fuzziness: (a) Cartesian coordinate system points; 2D points with noise (blue) bounce around true points (orange), and the orange line indicates the true line ( y = x 1 ); (b) transforming the true points to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red dot indicates the intersection of all lines; (c) Hough parameter space with noise; transforming the points with noise to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red box shows the peak fuzziness.
Figure 3. Schematic of peak fuzziness: (a) Cartesian coordinate system points; 2D points with noise (blue) bounce around true points (orange), and the orange line indicates the true line ( y = x 1 ); (b) transforming the true points to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red dot indicates the intersection of all lines; (c) Hough parameter space with noise; transforming the points with noise to the Hough parameter space, one line indicates one point in the Cartesian coordinate system, and the red box shows the peak fuzziness.
Remotesensing 14 03848 g003
Figure 4. Schematic of the 3D high-pass filtering convolution kernel: the size of the convolution kernel is 5 × 5; its center pixel (green) is 1/2; others are determined according to the distance from the pixel to the center based on the inverse distance weighted method; the sum of the pixels of the entire convolution kernel is 1.
Figure 4. Schematic of the 3D high-pass filtering convolution kernel: the size of the convolution kernel is 5 × 5; its center pixel (green) is 1/2; others are determined according to the distance from the pixel to the center based on the inverse distance weighted method; the sum of the pixels of the entire convolution kernel is 1.
Remotesensing 14 03848 g004
Figure 5. Different facades are identified as one: (a) two facades whose spatial locations are adjacent to each other are considered as one facade but are not coplanar; (b) similar facades of different buildings are incorrectly considered as one facade.
Figure 5. Different facades are identified as one: (a) two facades whose spatial locations are adjacent to each other are considered as one facade but are not coplanar; (b) similar facades of different buildings are incorrectly considered as one facade.
Remotesensing 14 03848 g005
Figure 6. Processes of building facade map extraction.
Figure 6. Processes of building facade map extraction.
Remotesensing 14 03848 g006
Figure 7. Architecture of Faster R-CNN. The cyan parallelogram, yellow parallelogram, green parallelogram, and purple parallelogram represent the convolutional layer, pooling layer, relu layer, and full connection layer, respectively. P × Q and M × N represent the height and width of the image. “cls_prob” represent the bounding box’s probability of various classes.
Figure 7. Architecture of Faster R-CNN. The cyan parallelogram, yellow parallelogram, green parallelogram, and purple parallelogram represent the convolutional layer, pooling layer, relu layer, and full connection layer, respectively. P × Q and M × N represent the height and width of the image. “cls_prob” represent the bounding box’s probability of various classes.
Remotesensing 14 03848 g007
Figure 8. Raw experimental data for the IQmulus & Terra Mobilita Contest dataset: (a) the entire IQmulus & Terra Mobilita Contest dataset; the red dashed box shows the extent of the experimental data area; (b,c) the experimental data in 3D view and 2D view, respectively; and (d) sample areas with misalignment in the point cloud data.
Figure 8. Raw experimental data for the IQmulus & Terra Mobilita Contest dataset: (a) the entire IQmulus & Terra Mobilita Contest dataset; the red dashed box shows the extent of the experimental data area; (b,c) the experimental data in 3D view and 2D view, respectively; and (d) sample areas with misalignment in the point cloud data.
Remotesensing 14 03848 g008
Figure 9. Facade extractions from the IQmulus & TerraMobilita Contest dataset: (ac) the results in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (df) the results in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray for nonfacade point clouds. Roman numeral I was used to number the facades extracted by the proposed method and Roman numeral II was used for the VCIR method.
Figure 9. Facade extractions from the IQmulus & TerraMobilita Contest dataset: (ac) the results in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (df) the results in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray for nonfacade point clouds. Roman numeral I was used to number the facades extracted by the proposed method and Roman numeral II was used for the VCIR method.
Remotesensing 14 03848 g009
Figure 10. Violin plots and bar plot for the errors of facade extraction for the IQmulus & TerraMobilita Contest dataset: (a) violin plot of the distances between points and facade, where the shape of the violin displays the probability density distribution of the data; the black bar depicts the interquartile range, the 95% confidence interval is shown by the inner line branching from it, and the median is shown by a white dot; and (b) bar plot of the total facades error, where the vertical bars show the mean of the data and the error line shows the 95% confidence interval.
Figure 10. Violin plots and bar plot for the errors of facade extraction for the IQmulus & TerraMobilita Contest dataset: (a) violin plot of the distances between points and facade, where the shape of the violin displays the probability density distribution of the data; the black bar depicts the interquartile range, the 95% confidence interval is shown by the inner line branching from it, and the median is shown by a white dot; and (b) bar plot of the total facades error, where the vertical bars show the mean of the data and the error line shows the 95% confidence interval.
Remotesensing 14 03848 g010
Figure 11. Facade map extraction with the IQmulus & TerraMobilita Contest dataset: the number in the upper left corner corresponds to the facade number in Figure 9; the red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; and the background image shows the single-band feature image, and the darker the image pixel color, the greater the number of points contained in the corresponding planar grid.
Figure 11. Facade map extraction with the IQmulus & TerraMobilita Contest dataset: the number in the upper left corner corresponds to the facade number in Figure 9; the red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; and the background image shows the single-band feature image, and the darker the image pixel color, the greater the number of points contained in the corresponding planar grid.
Remotesensing 14 03848 g011
Figure 12. Box plot of the accuracy of window extraction for the IQmulus & TerraMobilita Contest dataset: (a) the accuracy with a min IoU of 0.5; and (b) the accuracy with a min IoU of 0.85. The different colored boxes represent different precision indicators. The upper and lower quartiles of the data are shown by the box’s upper and lower boundaries, respectively, and the median is shown by the inner horizontal line. The whiskers extending from the ends of the boxes are used to represent variables other than the upper and lower quartiles, and outliers are represented by black dots.
Figure 12. Box plot of the accuracy of window extraction for the IQmulus & TerraMobilita Contest dataset: (a) the accuracy with a min IoU of 0.5; and (b) the accuracy with a min IoU of 0.85. The different colored boxes represent different precision indicators. The upper and lower quartiles of the data are shown by the box’s upper and lower boundaries, respectively, and the median is shown by the inner horizontal line. The whiskers extending from the ends of the boxes are used to represent variables other than the upper and lower quartiles, and outliers are represented by black dots.
Remotesensing 14 03848 g012
Figure 13. Raw experimental data for the Semantic3D.Net Benchmark dataset: (a,b) the processed point cloud data of the “domfountain” scene in 2D view and 3D view, respectively; and (c,d) the processed point cloud data of the “marketsquarefeldkirch” scene in 2D view and 3D view, respectively.
Figure 13. Raw experimental data for the Semantic3D.Net Benchmark dataset: (a,b) the processed point cloud data of the “domfountain” scene in 2D view and 3D view, respectively; and (c,d) the processed point cloud data of the “marketsquarefeldkirch” scene in 2D view and 3D view, respectively.
Remotesensing 14 03848 g013
Figure 14. Facade extraction with the Semantic3D.Net Benchmark dataset: (ac) results of the “domfountain” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; (df) results of the “domfountain” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; (gi) results of the “marketsquarefeldkirch” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (jl) results of the “marketsquarefeldkirch” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray was used for nonfacade point clouds. Capital English letters were used to label these facades.
Figure 14. Facade extraction with the Semantic3D.Net Benchmark dataset: (ac) results of the “domfountain” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; (df) results of the “domfountain” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively; (gi) results of the “marketsquarefeldkirch” scene in 2D view extracted by the proposed method, GIR method, and VCIR method, respectively; and (jl) results of the “marketsquarefeldkirch” scene in 3D view extracted by the proposed method, GIR method, and VCIR method, respectively. Different colors were used for different facades and gray was used for nonfacade point clouds. Capital English letters were used to label these facades.
Remotesensing 14 03848 g014
Figure 15. Violin plots and bar plot for the errors of facade extraction for Semantic3D.Net Benchmark dataset: (a,c) violin plots for distances between points and facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the shape of the violin describes the data’s probability density distribution; the black bar describes the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown by a white dot; and (b,d) bar plots for total errors of the facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the vertical bars show the mean of the data, and the error line shows the 95% confidence interval.
Figure 15. Violin plots and bar plot for the errors of facade extraction for Semantic3D.Net Benchmark dataset: (a,c) violin plots for distances between points and facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the shape of the violin describes the data’s probability density distribution; the black bar describes the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown by a white dot; and (b,d) bar plots for total errors of the facades for the “domfountain” and “marketplacefeldkirch” scenes, respectively, where the vertical bars show the mean of the data, and the error line shows the 95% confidence interval.
Remotesensing 14 03848 g015
Figure 16. Facade map extraction with the Semantic3D.Net Benchmark dataset: (ad) the facade map extraction results for the “domfountain” scene of facades A1, A3, A4, and A8, respectively; and (eg) the facade map extraction results for the “marketplacefeldkirch” scene of facades C2, C3, and C5, respectively. The red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; the background image shows the single-band feature image; and the darker the image pixel color is, the greater the number of points contained in the corresponding planar grid.
Figure 16. Facade map extraction with the Semantic3D.Net Benchmark dataset: (ad) the facade map extraction results for the “domfountain” scene of facades A1, A3, A4, and A8, respectively; and (eg) the facade map extraction results for the “marketplacefeldkirch” scene of facades C2, C3, and C5, respectively. The red, blue, and green lines in the figure represent the window, door, and building boundaries, respectively; the background image shows the single-band feature image; and the darker the image pixel color is, the greater the number of points contained in the corresponding planar grid.
Remotesensing 14 03848 g016
Figure 17. Box plot of the accuracy of window extraction with the Semantic3D.Net Benchmark dataset: (a,c) the accuracy with a minimum IoU of 0.5 of the “domfountain” and “marketplacefeldkirch” scenes, respectively; and (b,d) the accuracy with a minimum IoU of 0.85 of the “domfountain” and “marketplacefeldkirch” scenes, respectively. The different colored plots represent different precision indicators. The shape of the violin displays the data’s probability density distribution; the black bar depicts the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown as a white dot.
Figure 17. Box plot of the accuracy of window extraction with the Semantic3D.Net Benchmark dataset: (a,c) the accuracy with a minimum IoU of 0.5 of the “domfountain” and “marketplacefeldkirch” scenes, respectively; and (b,d) the accuracy with a minimum IoU of 0.85 of the “domfountain” and “marketplacefeldkirch” scenes, respectively. The different colored plots represent different precision indicators. The shape of the violin displays the data’s probability density distribution; the black bar depicts the interquartile range; the 95% confidence interval is shown by the inner line branching from it; and the median is shown as a white dot.
Remotesensing 14 03848 g017
Figure 18. Details of planes extracted by the VCIR method (a,c) and the details of facades extracted by the proposed method (b,d): (a,b) the details of plane II4 extracted by the VCIR method corresponding to the facades I10 and I11 extracted by the proposed method; and (c,d) the details of plane II1 extracted by the VCIR method corresponding to the facade I4 extracted by the proposed method.
Figure 18. Details of planes extracted by the VCIR method (a,c) and the details of facades extracted by the proposed method (b,d): (a,b) the details of plane II4 extracted by the VCIR method corresponding to the facades I10 and I11 extracted by the proposed method; and (c,d) the details of plane II1 extracted by the VCIR method corresponding to the facade I4 extracted by the proposed method.
Remotesensing 14 03848 g018
Figure 19. Details of facade map extraction: (a,c,e) results of the facade map extraction for facades I11, C3, and C5, respectively; and (b,d,f) results of zooming in on the misaligned area.
Figure 19. Details of facade map extraction: (a,c,e) results of the facade map extraction for facades I11, C3, and C5, respectively; and (b,d,f) results of zooming in on the misaligned area.
Remotesensing 14 03848 g019
Table 1. Errors of facade extraction for the IQmulus & TerraMobilita Contest dataset.
Table 1. Errors of facade extraction for the IQmulus & TerraMobilita Contest dataset.
MethodNumber of Extract FacadeMAEMSERMSE
Proposed methodI00.4270.2530.503
I10.2160.1020.319
I20.1860.0850.291
I30.2410.1090.330
I40.3210.1720.415
I50.3860.2470.497
I60.5970.4260.653
I70.5020.4130.643
I80.2440.0970.312
I90.3320.1900.435
I100.7610.6960.835
I110.4460.3020.550
I120.5850.4250.652
MEFE0.4030.2710.495
OFE0.3140.1940.440
VCIR methodII00.8200.7580.871
II10.3570.2080.456
II20.3570.2120.461
II30.5290.3630.602
II40.7050.6560.810
MEFE0.5540.4390.640
OFE0.5000.4030.634
Table 2. Accuracy of window extraction for the IQmulus & TerraMobilita Contest dataset.
Table 2. Accuracy of window extraction for the IQmulus & TerraMobilita Contest dataset.
ID of the FacadeMinIoU: 50%MinIoU: 85%
PrecisionRecallF1 ScoreAPPrecisionRecallF1 ScoreAP
I01.0001.0001.0001.0000.9330.9330.9330.871
I11.0001.0001.0001.0000.9470.9470.9470.898
I21.0000.9740.9870.9740.9210.8970.9090.827
I30.9860.9590.9720.9520.8450.8220.8330.719
I41.0001.0001.0001.0001.0001.0001.0001.000
I50.9860.9860.9860.9860.8900.8900.8900.827
I60.9120.9810.9450.9450.8420.9060.8730.805
I71.0000.9760.9880.9760.9010.8800.8900.793
I80.9641.0000.9810.9640.8910.9250.9070.839
I91.0001.0001.0001.0000.9490.9490.9490.900
I101.0000.9380.9680.9380.9590.8990.9280.862
I110.9470.9920.9690.9690.8480.8890.8680.784
I121.0000.9870.9940.9870.7790.7690.7740.621
Mean value of each facade0.9840.9840.9840.9760.9010.9000.9000.827
All facade0.9820.9770.979-0.8870.8820.884-
Table 3. Errors of facade extraction for the Semantic3D.Net Benchmark dataset.
Table 3. Errors of facade extraction for the Semantic3D.Net Benchmark dataset.
Scene NameMethodNumber of Extracted FacadeMAEMSERMSE
domfountainProposed methodA00.2040.0740.272
A10.2870.1510.389
A20.5980.5720.756
A30.1770.0940.306
A40.3460.1850.430
A50.4600.2810.531
A60.2990.1310.363
A70.4600.2890.538
A80.7930.7360.858
MEFE0.4030.2790.494
OFE0.3350.2220.471
VCIR methodB00.9791.1291.063
B10.2570.1520.390
B20.5050.3440.587
B30.7310.6430.802
MEFE0.6180.5670.710
OFE0.4940.4180.647
marketplacefeldkirchProposed methodC00.4620.3070.554
C10.7570.7460.864
C20.2430.1440.379
C30.4170.2460.496
C40.3660.2360.486
C50.2400.1620.403
C60.2670.1950.442
C70.2590.1100.331
C80.4600.2790.528
MEFE0.3860.2690.498
OFE0.2960.1980.445
VCIR methodD00.4570.3060.553
D10.4160.2580.508
D21.0191.1651.079
D30.8300.8230.907
MEFE0.6810.6380.762
OFE0.4640.3850.621
Table 4. Accuracy of window extraction for Semantic3D.Net Benchmark dataset.
Table 4. Accuracy of window extraction for Semantic3D.Net Benchmark dataset.
Scene NameID of the FacadeMin IoU: 50%Min IoU: 85%
PrecisionRecallF1 ScoreAPPrecisionRecallF1 ScoreAP
domfountainA01.0001.0001.0001.0000.9090.9090.9090.826
A10.9811.0000.9910.9810.9811.0000.9910.981
A20.9580.9580.9580.9180.8750.8750.8750.766
A31.0000.6670.8000.6670.0000.0000.0000.000
A40.1670.5000.2500.1000.0000.0000.0000.000
A51.0001.0001.0001.0001.0001.0001.0001.000
A61.0001.0001.0001.0001.0001.0001.0001.000
A71.0001.0001.0001.0000.9520.9520.9520.907
A80.3330.5000.4000.2000.0000.0000.0000.000
Mean value of each facade0.8880.8910.8750.8330.7150.7170.7160.685
All facade0.9360.9700.953-0.8840.9160.900-
marketplacefeldkirchC01.0001.0001.0001.0001.0001.0001.0001.000
C10.8101.0000.8950.8950.8101.0000.8950.895
C21.0000.9860.9930.9860.9860.9720.9790.958
C31.0001.0001.0001.0000.9620.9620.9620.925
C40.9730.9730.9730.9470.9600.9600.9600.934
C51.0001.0001.0001.0000.9720.9720.9720.972
C61.0000.9440.9710.9441.0000.9440.9710.944
C71.0001.0001.0001.0001.0001.0001.0001.000
C81.0001.0001.0001.0000.9170.9170.9170.840
Mean value of each facade0.9760.9890.9810.9750.9560.9700.9620.941
All facade0.9810.9840.983-0.9620.9650.964-
Table 5. Comparison of the two methods’ extraction results with the IQmulus & TerraMobilita Contest dataset.
Table 5. Comparison of the two methods’ extraction results with the IQmulus & TerraMobilita Contest dataset.
Results by VCIR MethodResults by Proposed Method
II0I7, I8, I9
II1I4, I5, I6
II2I2, I3
II3I0, I1
II4I10, I11, I12
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yu, B.; Hu, J.; Dong, X.; Dai, K.; Xiao, D.; Zhang, B.; Wu, T.; Hu, Y.; Wang, B. A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data. Remote Sens. 2022, 14, 3848. https://doi.org/10.3390/rs14163848

AMA Style

Yu B, Hu J, Dong X, Dai K, Xiao D, Zhang B, Wu T, Hu Y, Wang B. A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data. Remote Sensing. 2022; 14(16):3848. https://doi.org/10.3390/rs14163848

Chicago/Turabian Style

Yu, Bing, Jinlong Hu, Xiujun Dong, Keren Dai, Dongsheng Xiao, Bo Zhang, Tao Wu, Yunliang Hu, and Bing Wang. 2022. "A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data" Remote Sensing 14, no. 16: 3848. https://doi.org/10.3390/rs14163848

APA Style

Yu, B., Hu, J., Dong, X., Dai, K., Xiao, D., Zhang, B., Wu, T., Hu, Y., & Wang, B. (2022). A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data. Remote Sensing, 14(16), 3848. https://doi.org/10.3390/rs14163848

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop