1. Introduction
The continuous and rapid increase in car ownership is an important cause of air pollution and the energy crisis. The emission pollution of traditional vehicles harms the environment and human health. Vehicle emissions have become the main source of air pollution in most countries worldwide. Twenty-five percent of the global CO2 emissions come from automobiles, and the total emissions of greenhouse gases are proportional to the number of automobiles. Additionally, traditional cars are strongly dependent on oil, and volatile crude oil prices and the energy security strategies of countries have constrained the development of the auto industry. The development of electric vehicles cannot only effectively solve the emissions problem but also reduce the use of fuel, which is of great significance. The transformation of the automotive industry has already begun, and it is fast-tracked for further development, especially in China. The sales volume of new energy vehicles in China was less than 20,000 in 2013, but in 2021, 3.521 million units were sold, and 3.545 million units were produced, ranking first in the world in terms of production and sales for several consecutive years. This effectively reduces China’s pollution problems and energy problems.
The electric vehicle industry is transitioning from high-speed development to high-quality development, with higher performance testing requirements. Specifying a uniform condition for tests is necessary, which is why driving cycles need to be developed. Moreover, proper driving cycles are good for research on reducing energy consumption [
1] A driving cycle is a series of data points representing the speed of a vehicle versus time [
2], and it is the basis of testing. Europe uses the New European Driving Cycle (NEDC), while JC15 is used in Japan, and FTP75 is adopted in the USA. These are the world’s three most widely used driving cycles [
3,
4]. However, traffic varies significantly from country to country, and NEDC, JC15, and FTP75 cannot represent the local traffic conditions, which leads to inaccurate test results. Therefore, building a driving cycle suitable for the local area is of practical significance.
Constructing driving cycles has always been the frontier and focus of research in the automotive field. The micro-trips method is the most common approach [
5]. The driving data are cut into micro-trips and then divided into several clusters. The K-means clustering method is the most widely used clustering method for generating a driving cycle [
6,
7,
8]. Zhang proposed combining the distance optimization and density methods to construct a data set density metric method to construct a driving cycle as an improved K-means algorithm method [
9]. A global K-means clustering algorithm was used to construct the Zhengzhou passenger car driving cycles [
10]. Yuan proposed an improved K-means text clustering algorithm based on density peaks that avoided the pitfalls of randomly picking initial clusters [
11]. However, K-means clustering has significant defects that will negatively affect the construction of a driving cycle. For instance, K-means is sensitive to the outlier points, which will introduce uncertainty in the clustering results.
Additionally, it is difficult to determine the K value when the amount of data is large and the K value is affected by the initial clustering center. To solve this problem, this paper proposes the methodology of generating a driving cycle based on a hierarchical cluster method. A hierarchical cluster is not sensitive to outlier points, and the clusters can be determined according to actual needs, which means the clustering result is controllable with a faster and more effective process.
In this study, the data collection and preprocessing are introduced first. Then, the real-world data are divided into micro-trips, and the characteristic parameters are extracted. The principal component analysis is applied for dimensionality reduction, and then the micro-trips are divided into three categories using the hierarchical cluster method: low-speed, medium-speed, and high-speed. Next, the driving cycle construction is completed. Finally, the validity of the constructed working conditions and the feasibility of the proposed theory is verified from the perspective of statistical analysis and economic simulation.
2. Data Collection and Preprocessing
Driving data is the basis of driving cycle development. To obtain the driving data reflecting the actual driving characteristics, this paper uses the autonomous driving method to collect the driving data of an electric vehicle. The driver drives the vehicle according to their own will without a designated driving route, which can reflect the real traffic situation to the greatest extent. The test system comprises an onboard data acquisition terminal (sampling frequency is 1 Hz) (
Figure 1) and a data management platform. The onboard data acquisition terminal codes the acquired information according to the unified data protocol and sends it to the online data management platform in real-time through the GPRS network. The data source of the onboard data acquisition terminal includes two parts, including the GPS signal and the onboard diagnostic system (OBD) signal.
Due to the complex electromagnetic environment in electric vehicles, signal noise is inevitable, so it is necessary to filter the data. In this study, wavelet domain denoising is adopted, which can remove noise and preserve the information for the original data to the greatest extent. The db5 wavelet is selected to denoise and analyze the speed data; the level is 5 and the SURE threshold is used to eliminate the signal noise [
12]. To make the comparison more intuitive, the denoised case is shown in
Figure 2.
The micro-trip refers to the data period from the start of one idle to the start of the next idle, and it is the basic unit for constructing the driving cycle. In this study, a total of 1162 micro-trips are obtained after the division of the driving data.
To describe the micro-trips, 15 characteristic parameters are counted [
13], including the average speed, maximum speed, maximum acceleration, average acceleration, maximum deceleration, average deceleration, acceleration time, deceleration time, idle time, uniform time, total time, acceleration time ratio, deceleration time ratio, idle time ratio, and uniform time ratio. The characteristic parameters and the symbols are shown in
Table 1.
The 15 characteristic parameters are defined as follows:
is the vehicle speed at the second.
Maximum speed: the maximum speed in the micro-trip.
Maximum acceleration: the maximum acceleration in the micro-trip.
corresponds to the acceleration at the second and is the total time of the micro-trip.
Maximum acceleration: the maximum acceleration in the micro-trip.
corresponds to the deceleration for each deceleration point.
Acceleration time: total time of the acceleration state.
Deceleration time: total time of the deceleration state.
Idle time: total time of the idle state.
Uniform time: total time of the uniform state.
Considering the slight speed change during the driving process of the car, the definitions of the uniform speed, idle speed, acceleration, and deceleration are as follows [
4,
14]:
- (1)
Uniform: the continuous running state, in which || ≤ 0.15 m/s2 and .
- (2)
Idle: the continuous state in which and the engine is working.
- (3)
Acceleration: the continuous running state, with > 0.15 m/s2.
- (4)
Deceleration: the continuous running state, with < −0.15 m/s2.
3. Principal Component Analysis
There is a linear relationship between some parameters, so if all 15 characteristic parameters are used, it will cause overlap, redundancy of information, and wasted computing resources. Therefore, principal component analysis is used to reduce the dimensionality of the characteristic parameters, which reduces the complexity of data processing and improves the effectiveness of the computing.
The brief process of principal component analysis is as follows:
- (1)
Standardized processing.
The data matrix
A of the characteristic parameters is
The normalized matrix
is
where
- (2)
The sample correlation coefficient matrix R is calculated.
- (3)
The characteristic eigenvalues and the characteristic eigenvectors are calculated.
The characteristic eigenvalues of the correlation coefficient matrix
R and the corresponding characteristic eigenvectors
, in which,
, are calculated, and the new parameters calculated with characteristic eigenvectors can be expressed as
where
is the original component,
is called the
principal component.
- (4)
The p principal components are selected and the contribution rate is calculated:
The larger the contribution rate, the stronger the information about the original variable contained in the principal component is. The number of principal components is mainly determined by the cumulative contribution rate. In this study, the cumulative contribution rate is selected to exceed 80% to ensure that the comprehensive variables can include the greatest amount of information for the original variables. The cumulative contribution value of the 15 characteristic parameters extracted in this study is shown in
Table 2.
After the dimensionality reduction process with principal component analysis, the cumulative contribution rate of the first three principal components reaches 82.96%, which means these three principal components can represent the original 15 characteristic parameters. The correlation coefficients between the first five principal components and the 15 feature parameters are shown in
Table 3.
The greater the absolute value of the correlation coefficient is, the greater the representation of the principal component is to this parameter.
Table 4 displays the parameters of each principal component. The first principal component mainly reflects the average speed, maximum speed, maximum acceleration, average acceleration, maximum deceleration, average deceleration, acceleration time ratio, and deceleration time ratio. The second principal component mainly reflects the acceleration time, deceleration time, uniform time, and total time. The third principal component mainly reflects the idle time, idle time ratio, and uniform time ratio.
4. Hierarchical Cluster Method
The hierarchical cluster method is not sensitive to noise in the data, and the number of clusters can be selected according to the actual need in the research, which is suitable for building various driving cycles for different areas. The principle of the hierarchical cluster method is to regard each sample as a separate class and select two classes with the smallest distance to merge into a new class with the condition of specifying the distance between classes. Then the distance between the new class and other classes is calculated, and then the merging process is repeated. One category at a time is decreased until the given termination condition is met.
The general steps for the hierarchical cluster method are listed below:
- (1)
Each sample is set as an initial class.
- (2)
The inter-class distance matrix is calculated, and the two classes with the closest distance are combined into a new class. The inter-class distance is calculated as follows:
In the formula:
is the inter-class distance. and are the number of samples in class and class , respectively, and is the distance between the sample in class and the sample in class .
- (3)
Step (2) is repeated.
- (4)
The process is ended when the termination condition is met.
According to the local traffic law, the traffic condition is generally divided into three categories: low speed (low average running speed, frequent acceleration, and deceleration), medium speed (average running speed is moderate, acceleration and deceleration are frequent, normal deceleration), and high speed (high average running speed, large acceleration ratio). Therefore, the termination condition of the hierarchical cluster method in this study is the acquisition of three final classes. Finally, three clusters are obtained, and the number of micro-trips in each category is 572, 956, and 59, respectively.
5. Construction and Verification of Vehicle Driving Cycle
5.1. Construction of Vehicle Driving Cycles
S is set as the average relative error of the characteristic parameters of the constructed driving cycles and the real-world data:
In the expression,
is the relative error of the
characteristic parameter. The calculation formula is as follows:
In the formula:
is the average value of the characteristic parameter of all the micro-trips of the constructed driving cycle, and is the characteristic parameter of the real-world data.
The reasonable duration used for most urban driving cycles is about 1200 s [
15], so according to the proportion of the number of samples in each category, 10 samples with the smallest
S values are selected and combined into a driving cycle. The values of characteristic parameters of the 10 micro-trips are shown in
Table 5, and the constructed driving cycle is shown in
Figure 3. The acceleration time, deceleration time, idle time, uniform time, and the total time of real-world data are the summary values of all micro-trips, so these parameters are not included in the error calculation.
5.2. Construction of Vehicle Driving Cycles
The economy is one of the important indicators of electric vehicles, and the power consumption per 100 km is often used as an evaluation index. In addition, overall efficiency is a commonly used evaluation index. In this study, the constructed driving cycle is used as the test cycle condition for the economy simulation, and the NEDC is also used for comparison. The simulation results are shown in
Figure 4:
It can be seen from
Figure 4 that both the power consumption per 100 km and the overall efficiency are closer for the construction driving cycles and the actual driving data. The relative errors are 0.23% and 3.6%, while the NEDC relative errors are 7.47% and 41.34%.
Furthermore, the motor’s efficiency at each second is also calculated, and the frequency of the efficiency is added, as seen in
Figure 5.
In
Figure 5, the scale on the horizontal axis represents the range of efficiency; for instance, “0.1” represents the efficiency from 0.1 to 0.2.
Figure 5 indicates that the efficiency distribution for the constructed driving cycle is closer to real-world data in most cases. This is why the electric vehicle has a more realistic economy test result in the constructed driving cycle.
5.3. Statistical Verification of Constructed Driving Cycle
The reason the electric vehicle performs more “accurately” in the constructed driving cycle than in NEDC is the difference in the statistical characteristics of the driving cycles. The comparison of the characteristic parameters of different conditions is shown in
Table 6.
The calculation results indicate that the overall relative error of the characteristic parameters of the constructed driving cycle and the real-world data is 5.46% (<10%). In comparison, the NEDC is 36.8%, which reveals why the constructed driving cycle is more accurate.
The VA matrix reflects the statistical distribution law of velocity acceleration and is often used for building driving cycles. From the perspective of the VA matrix, the constructed driving cycle is also closer to the real-world data. The 3D plots of the VA probability distributions of the real-world data constructed driving cycle and NEDC are shown in
Figure 6,
Figure 7 and
Figure 8.
Figure 8 indicates that the v-a distribution of the NEDC is simple and centralized, which is very different from the real-world data. However, the VA matrix of a constructed driving cycle covers most values for real-world data, which is because NEDC covers few speeds and acceleration. For the matrices of the constructed driving cycle and real-world data, most of the acceleration from −1.4 m/s
2 to +1.2 m/s
2 appears in the low-speed zone. Additionally, the peak area distributions of the two matrices are highly similar, which means that the constructed driving cycle is more representative of the real traffic conditions.
6. Summary
This study proposes a method based on principal component analysis and the hierarchical cluster method for constructing electric vehicle driving cycles. First, the data collected with the onboard data acquisition terminal are denoised using the wavelet data denoising principle and divided into micro-trips. The characteristic parameters of the micro-trips are extracted. The principal component analysis is then applied to reduce the dimensions. Next, the hierarchical cluster method is used to classify the micro-trips into three categories, and 10 micro-trips are selected to combine the driving cycle according to the proportion of each category. Finally, the generated driving cycle is verified based on the aspects of both the electric vehicle economy test and statistical analysis.
The economy test verification shows that the constructed driving cycle has a 0.23% error in power consumption per 100 km, a 3.6% error in overall efficiency, and a closer distribution in terms of the motor efficiency frequency, which indicates that the tested electric vehicle performs similarly in both driving cycle. The statistical analysis shows that the overall relative error of the characteristic parameters of the constructed driving cycle and the real-world data is 5.46%, and the built driving cycle has a similar VA matrix to the real-world data, which both indicates that the constructed driving cycle can represent real-world data for related research. The verification results prove the validity and feasibility of the methodology proposed in this study, and this study provides a new method for the development of driving cycles.