[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Exploring How to Optimise Transformative Pro-Environmental Behaviour Changes via Nudging on Shared Values Crystallisation
Previous Article in Journal
The “Ruined Landscapes” of Mediterranean Islands: An Ecological Framework for Their Restoration in the Context of SDG 15 “Life on Land”
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Ride-Hailing Demand with Consideration of Social Equity: A Case Study of Chengdu

1
The Key Laboratory of Road and Traffic Engineering, Ministry of Education, College of Transportation Engineering, Tongji University, Shanghai 200070, China
2
Laboratory on Perception, Interactions, Behaviors, and Simulation of the Road and Street Users (PICS-L), COSYS Department, Gustave Eiffel University, 77420 Champs-sur-Marne, France
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(22), 9772; https://doi.org/10.3390/su16229772
Submission received: 20 September 2024 / Revised: 23 October 2024 / Accepted: 31 October 2024 / Published: 8 November 2024

Abstract

:
In the realm of shared autonomous vehicle ride-sharing, precise demand prediction is vital for optimizing resource allocation, improving travel efficiency, and promoting sustainable transport solutions. However, existing studies tend to overlook social attributes and demographic characteristics across various regions, resulting in disparities in prediction fairness between areas with plentiful and limited transportation resources. In order to achieve more accurate and fair prediction, an innovative Social Graph Convolution Long Short-Term Memory framework is proposed, incorporating demographic, spatial, and transportation accessibility information into multiple functional graphs, including functional similarity, population structure, and historical demand graphs. Furthermore, Mean Percentage Error indicators are employed in the loss function to balance prediction accuracy and fairness. The findings indicate that there is an enhancement in both prediction accuracy and fairness by at least 8.9% and 12.9%, respectively, compared to base models. Additionally, the predictions for rush hours in both privileged and underprivileged regions exhibit greater precision and rationality, supporting sustainable transport practices. The proposed framework effectively captures the demands of diverse social groups, thereby contributing to the advancement of social equity and long-term sustainability in urban mobility.

1. Introduction

As an on-demand “point-to-point” service, ride-hailing has gradually become an indispensable part of urban transportation systems. By the end of 2022, Uber has operated in over 900 cities worldwide, spanning more than 60 countries. Didi has already extended its services to over 400 cities in China [1].
With the advancement of technology and the maturity of autonomous driving technology, shared autonomous vehicles (SAVs) will become one of the main enablers of the next generation of online taxi services. More accurate demand forecasting will help SAV providers rationalize vehicle allocation and route planning and also maximize user satisfaction and improve the efficiency and quality of service.
Transportation equity generally refers to the situation where all people have equal access to urban transportation resources and services [2]. This includes, but is not limited to, the coverage of public transportation facilities, the burden of transportation costs, transportation safety, and access to transportation information. Urban transport provides people with the most basic and necessary mobility capabilities and promotes the relative equality of opportunities for all groups in society to access social resources; therefore, prioritizing the development of urban transport is an important way to promote the development of transport equity, which ensures that all residents have equal access to basic transport services and rights, improves their quality of life and degree of social integration, and, at the same time, further realizes social equity [3].
At present, the citizens of Chengdu city choose to travel via rail transportation, buses, and shared bicycles, which account for 82% of urban transportation travel here [4]; networks of cars, as a supplement to public transportation, have shown more market players, practitioners, idle capacity, etc., which, although to a certain extent can alleviate the pressure of urban travel, has exacerbated the imbalance in the distribution of urban transportation resources, so some vulnerable groups face greater difficulties in traveling [5]. Therefore, how to accurately predict the demand for online car rental under the premise of ensuring social fairness is of great significance for optimizing the allocation of transportation resources and improving travel efficiency [6].
In recent years, researchers at home and abroad have proposed a series of data-driven methods to achieve short-term demand forecasting, and fewer articles have focused on the issue of algorithmic fairness in transportation research. The existing literature has the following two limitations: First, current studies consider the spatio-temporal dependent heterogeneity of demographic and functional attributes between different regions less and are more one-sided in feature extraction. Second, due to issues such as data sample learning, it often leads to underestimation of travel demand in disadvantaged regions. Existing studies have assessed the effectiveness of model predictions mainly based on the average prediction accuracy of the overall region, lacking attention on inter-regional prediction differences and fairness.
In order to overcome these limitations, this paper proposes a new strategy to retain a high prediction accuracy while taking into account the fairness of prediction, and the article mainly makes the following contributions:
(1)
By constructing multiple OD (Origin–Destination) maps to encode the spatio-temporal relationship between different regions, considering multiple socio-economic attributes, we developed the center distance map, the functional similarity map, and the demographic structure map, which provide new perspectives and tools for the study of regional structure and social characteristics.
(2)
Adopting accessibility as a basis, the algorithm results can be analyzed more comprehensively and accurately by measuring the POI (point of interest) accessibility of the population in each region to classify the low accessibility population and the high accessibility population and identify the differences in demand between groups.
(3)
A regularization method for mitigating bias is developed, adding MPE to the loss function, which can more deeply understand and mitigate the problem of algorithmic bias in shared mobility platforms and effectively bridges the gap of average percentage prediction error between the disadvantaged and advantaged groups.
The remaining contents of the paper are organized as follows. Section 2 reviews the existing research on social equity, ride-hailing demand prediction, and fairness. Section 3 describes the research methodology. Section 4 introduces the baseline models and experimental results. Section 5 provides conclusions and prospects.

2. Literature Review

2.1. Social Equity

Social equity, as the cornerstone of the harmonious development of society, means that all members of society are treated fairly in many dimensions of life, such as employment, healthcare and education, and is regarded as the highest concept of a functional society. When this concept is transferred to the geospatial aspect, it can be seen as a principle of equity applicable to spatial units such as national regions. Transportation gives people the ability to move through space and determines their ability to approach spatial resources and opportunities, so spatial equity, to some extent, is also a transportation topic.
Since the 1970s, the issue of spatial equity has gradually received widespread attention from social scholars. Pirie [7] took the lead in the article “On Spatial Justice” with the concept of “spatial justice”; for the subsequent scholars of “spatial equity”, “Le Grand and Biswas [8] believe that fairness means that everyone should have the same opportunity to make choices regardless of where they live, which is the basic meaning of spatial fairness. Truelove [9] emphasizes that equity should be reflected in the spatial dimension as a balanced distribution of a given public service among the population to ensure that everyone has access to essential public services. Hay [10] closely aligns the concept of fairness with geographic distribution and access across space, pointing out the inequalities faced by the population in accessing services, and warns that even if the population distribution and the supply of public services are sometimes matched, inequalities can still occur.
Many past studies have discussed the distribution of transportation services as a spatial equity issue, specifically vehicle ownership, bus and rail service coverage, and bus and rail facility capacity. Through extensive literature reading, accessibility is now widely used as an indicator of transportation socialization attributes to explore the issue of spatial equity distribution. Accessibility generally refers to how easy it is for people to reach a destination from a point in space, portraying the potential opportunities for human interaction in space. In this paper, we choose accessibility as the object of allocation to study spatial equity, i.e., spatial equity research focusing on accessibility.

2.2. Demand Forecasting

Traffic demand prediction is one of the fundamental challenges in intelligent transportation systems. Many scholars have explored various short-term traffic demand prediction methods by combining different aspects of demand forecasting and relevant features. These mainly include statistical methods [11,12], traditional machine learning methods [13,14], and deep learning methods [15,16].
Early extensive research on traffic demand prediction was largely based on time series analysis methods, such as the Holt–Winters (HW) model, Fast Fourier Transform (FFT), the ARMR model (Auto Regressive Moving Average), seasonal and trend decomposition methods (e.g., STL), linear regression, and exponential smoothing state space models (e.g., ETS and TBATS).
Traditional machine learning prediction methods have also been widely applied. For instance, Sun et al. [17] modeled traffic flow on roads using Bayesian Networks. Zhang et al. [18] proposed a spatiotemporal clustering algorithm to predict demand hotspots and their intensity through an adaptive forecasting approach. Saadi et al. [19] introduced a new regression-based ride-hailing model considering various factors such as roads, fares, and weather.
However, existing machine learning models heavily rely on human-selected features and struggle to meet complex prediction demands. In recent years, facing unstable traffic conditions, complex road network settings, and vast real-world datasets, deep learning methods have been increasingly applied due to their powerful computational capabilities and representation learning from big data.
Several recent studies have combined different deep learning neural networks to define semantic spatio-temporal relationships in travel demand. For example, Ke et al. [20] developed a network called the Fusion Convolutional Long Short-Term Memory Network that combines CNN and LSTM to capture spatial, temporal, and exogenous dependencies simultaneously. Similarly, to address the problem that utilizing weakly correlated regions for prediction will impair performance, Yao et al. [21] proposed a localized CNN approach to construct a Deep Multi-View Spatio-Temporal Network (DMVST-Net) model. The complex spatio-temporal correlations of travel demand are captured through temporal, spatial, and semantic views. Ke et al. [22] continue to improve the CNN network by taking advantage of the hexagon’s dichotomy-free nearest-neighbor definition and propose a hexagonal-based Convolutional Neural Network (H-CNN) to predict short-term supply–demand gaps in online ride-hailing services.
While CNNs excel in capturing local spatial correlations, they struggle with many real-world problems relying on non-Euclidean correlations. To address this, recent studies have introduced Graph Convolutional Neural Networks (GCNs) into demand prediction. Graph Convolutional Networks have been applied in various industries, including face recognition analysis [23], social network recommendation systems [24], human behavior prediction [25], and bioinformatics [26].
In the field of transportation, graph convolution networks have been widely used and improved. Xiong et al. [27] designed a novel network structure including link graph convolution and node graph convolution to provide a deep learning framework fusing line graph convolution networks and Kalman filtering to predict OD demand along closed highways, which visualizes converged weights with better model interpretability. To better convey information and mitigate the gradient vanishing problem, Residual Multi-Graph Convolutional Networks (RMGC) were proposed [28], where multiple OD graphs are constructed first, and then the dependencies between OD pairs are modeled by an encoder–decoder structure.

2.3. Fairness in Machine Learning Algorithms

Fairness in machine learning, as an emerging methodology, aims to integrate fairness principles into model design to ensure that algorithmic predictions remain fair and non-discriminatory when classifying sensitive attributes such as race, gender, religion, etc. [29]. While model accuracy is an indispensable consideration when evaluating and analyzing machine learning algorithms, its potential social impacts cannot be overlooked.
Fairness has received wider attention in the areas of crime prediction, healthcare, and credit assessment. Chouldechova et al. [30] demonstrated bias in a reoffense prediction tool that provides decision makers with an assessment of the likelihood that a criminal defendant will recidivate. Rajkomar et al. [31] demonstrated that “protected groups” in the field of clinical care are more likely to be deprived of healthcare resources. Khandani et al. [32] used credit risk assessment as an example, discussing the tradeoffs between overall benefits and inequalities in individual benefits, as well as accuracy versus availability.
However, there is a dearth of literature examining algorithmic fairness issues in transportation research. Emerging modes of transportation, while changing urban mobility, can also exacerbate social inequities to some extent. These travel services rely on accurate demand forecasts, but the demand data on which these models are trained reflect biases around demographics, socioeconomic conditions, and entrenched geographic patterns.
A study by Zheng et al. [33] revealed for the first time the potential influence of factors such as race, income, and medical conditions in the prediction of travel behavior. Using the National Household Travel Survey and Chicago resident’s travel data to measure fairness in terms of equality of opportunity, they explored the presence of bias in prediction in a number of specific aspects such as the number of deep neural network layers, batch size, and weight initialization.
In addition to equality of opportunity, Yan and Howe [34] proposed a fairness-perceived demand prediction model called FairST by utilizing group fairness, which effectively reduces the gap in per capita demand prediction between disadvantaged and advantaged regions by modifying the loss function in deep learning and considering multiple sensitive attributes.
In order to further improve the fairness of the prediction model, Zheng et al. [5] further developed a socially aware neural network called SA-Net. This network incorporates demographic and economic characteristics and significantly reduces the prediction error gap between different groups by the regularization method.

2.4. Summary

Despite the considerable advancements achieved by the previously mentioned machine learning techniques in capturing endogenous data relationships, improving model interpretability, and enhancing prediction accuracy, the majority of these methods overlook the inclusion of additional information pertaining to regional spatial attributes, such as land use distribution, population age demographics, and historical demand trends, among others, during the prediction process.
Furthermore, most methods commonly use absolute metrics for model evaluation without considering the relative positive or negative aspects of the prediction results, which may result in overestimation or underestimation of demand in predicted areas. Additionally, existing methods lack attention on the fairness of prediction results between disadvantaged and advantaged areas.

3. Methodology

3.1. Network Architecture

In this section, we propose a novel deep learning architecture for predicting short-term ride-hailing demand, as depicted in the network structure diagram shown in Figure 1:
Figure 1 shows the model framework of this study. Figure 1a preprocesses regional geo-economic data, four feature maps, order demand data, and weather and temperature data as input. Figure 1b is the training process of the main model, including the specific structure of GCN and LSTM networks. Figure 1c covers the loss function of prediction accuracy and prediction fairness and the final output demand data.

3.2. OD Graph

The foundation of graph convolutional networks lies in non-Euclidean spatial data structures [22]. The study area for this research is 162 streets in the downtown area of Chengdu City, Sichuan Province, China. This study explores the non-Euclidean spatial data relationships between regions from both geographic and semantic perspectives. Geographically, if two regions are adjacent, it is easy to assume a strong correlation in the demand generated by these two regions. Semantically, it is generally believed that regions with similar functions (such as commercial areas, entertainment districts, residential areas, etc.) and similar population structures (densities across different age groups) exhibit strong demand correlations. Additionally, it is further assumed that regions with similar current demands will also demonstrate similar demand trends in the future. Here are four illustrative diagrams showing the following (Figure 2):
Based on the aforementioned spatial attribute data and the actual situation in Chengdu, the graphs used in this study include the following: (1) an adjacency matrix graph G n V , E , A n ,   A n R N N ; (2) a functional similarity graph G d V , E , A d , A d R N N ; (3) a population structure similarity graph G p V , E , A p , A p R N N ; (4) and a historical demand similarity graph G f V , E , A f , A f R N N .

3.2.1. Adjacency Matrix Graph

The similarity of demand between two nearby locations may be higher. In this study, an adjacency matrix is defined to represent whether two regions are adjacent:
A n i , j = 1 , if   region   is   adjacent   to   j 0 ,   otherwise

3.2.2. Functional Similarity Matrix Graph

Different areas of a city may have different functions or land use properties. Some areas have many shopping malls, some have tourist areas, and some are full of residential areas. This paper uses spatial attribute data from 162 streets, including park land use, industrial land use, transportation facility land use, residential land use, commercial land use, public land use, and point of interest land use density, which are highly correlated with land use types and travel mode choices. Let F i represent the functional data vector of region i, then the functional similarity matrix can be constructed as follows:
A f i , j = F i · F j F i   ·   F j

3.2.3. Population Structure Similarity Graph

The population structure of each street will affect the amount of the distribution of orders, too. In this study, the population of each region is divided into three stages: below 14 years old, 14 to 64 years old, and 64 years old and above. Let P i represent the population structure density vector of region i, then the population structure similarity matrix can be constructed as follows:
A p i , j = P i P j P i P j T 1

3.2.4. Historical Demand Similarity Graph

Regions with similar historical demand trends should have some common characteristics and can assist each other in prediction [22]. Let D i denote the historical demand vector within the research time frame. Therefore, the calculation method for the historical demand similarity matrix between regions is as follows:
A d i , j = C o v D i , D j v a r D i v a r D j

3.3. Accessibility

Accessibility is a concept measuring the ease of movement from one place to another. In transportation networks, urban planning and accessibility serve as significant evaluation metrics, aiding in the understanding of connectivity between different locations and the ease of mobility between them [35].
The concept of accessibility was first introduced by Hansen in 1959, defining it as the probability of interaction between nodes in a transportation network. Accessibility disparities are defined as differences in accessibility across different geographical areas, population groups, and time periods. In this study, accessibility is calculated based on the total number of points of interest (POIs) reachable within a 30 min drive from the centroid of a given area.
Initially, centroids for 162 streets are computed, followed by the utilization of the Gaode API to acquire the number of POIs reachable within a 30 min drive from each centroid. Obtaining travel times from online maps provides greater accuracy and real-time data compared to travel times calculated through models, enabling better reflection of traffic conditions.
Based on the computed results, the 162 areas are divided into 102 high accessibility areas and 60 low accessibility areas using the mean value as the threshold.

3.4. Evaluation Metrics

The performance of various models is evaluated based on two types of indicators: accuracy indicators and fairness indicators.

3.4.1. Accuracy Indicators

The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used to assess the prediction accuracy of the models. They are defined as follows:
M A E = 1 T N t = 1 T i = 1 N y t i y ^ t i
R M S E = 1 T N t = 1 T i = 1 N y t i y ^ t i 2
Although MAE and RMSE are widely used to measure the accuracy of model predictions, they both calculate the absolute difference between the true values and predicted values, without considering the direction of the error. The positive and negative directions of the difference between the true demand and predicted one have significantly different practical implications. In this case, the Mean Percentage Error (MPE) considering the direction of error is presented. The MPE is defined as follows:
M P E = 1 T N t = 1 T i = 1 N y t i y ^ t i y t i
In this equation, y t i represents the true demand value of region i at time t; similarly, y ^ t i represents the predicted demand value of region i at time t, N denotes the total number of regions, and T represents the total number of hours within the study time range. If MPE is a positive value, it indicates that the demand is underestimated (i.e., the actual demand is greater than the predicted demand), while a negative MPE indicates an overestimation of the demand.

3.4.2. Fairness Indicators

It is crucial to ensure that there is no systematic difference in MPE between disadvantaged groups and advantaged areas within the same scenario. We propose the MPE gap as a metric for fairness, measuring the difference in MPE between two groups (e.g., high accessibility areas and low accessibility areas). The MPE gap is defined as
M P E   G a p = 1 T N 1 t = 1 T i = 1 N 1 y t i y ^ t i y t i 1 T N 2 t = 1 T i = 1 N 2 y t i y ^ t i y t i
Here, N 1 refers to the set of identifiers for low accessibility areas, and N 2 represents the set of identifiers for high accessibility areas. In addition to accessibility, this study can also explore the prediction relationships between variables such as income and age, by using them as criteria for division, investigating more univariate and even multivariate relationships.
To achieve fair predictions, it is desired that the absolute value of the MPE gap be as close to zero as possible, which is also the most important manifestation of “fairness” in this study.

3.5. Loss Function Regularization

To ensure that the model training balances accuracy and fairness, this study defines a loss function that is a weighted sum of accuracy loss and fairness loss, defined as follows:
L = L a c c u r a c y + λ L f a i r n e s s
The accuracy loss aims to minimize the MAE:
L a c c u r a c y = t = 1 T i = 1 N y t i y ^ t i
where y t i and y ^ t i represent the actual and predicted travel demand for region i at time interval t, respectively. T denotes the total number of time intervals. N represents the total number of regions. The fairness loss is proposed as follows:
L f a i r n e s s = t = 1 T i = 1 N z ˜ i y t i y ^ t i y t i s . t   z ˜ i = z i z ¯ σ z
where z ˜ i represents the standardized value of the selected sensitivity indicator, and z ¯ , σ z represent the mean and standard deviation of the original values, respectively. L f a i r n e s s measures the covariance between the sensitivity indicator and MPE; thus, it should be incorporated into the loss function.

4. Experiments and Results

4.1. Data Description

This study utilized five distinct categories of data. The order dataset comprised over 6.1 million order records spanning from 1 November 2016 to 30 November 2016, sourced from the Didi Gaia Data Open Program. These records encompassed 162 streets across 11 administrative districts within Chengdu City, with each order containing the order number, vehicle ID, start and end times, and latitude and longitude coordinates. Subsequently, spatial attribute data were employed, encompassing demographic, land use, and socio-economic information. In the predictive modeling, weekdays were designated as 0 and non-workdays as 1, thereby integrating temporal data with weather data. Lastly, yPOI data were included, detailing the names and geographic coordinates of establishments like stores, restaurants, attractions, gas stations, and banks.

4.2. Spatial-Temporal Analysis

This study analyzes the order data in terms of both time and space for data visualization and characterization:
First of all, the number of orders is examined in the time dimension. Figure 3 shows the hour-by-hour average order changes from weekend to midweek, and the weekday curve shows a “triple-peak” state, i.e., the morning peak (7–9 a.m.), the afternoon peak (12–14 p.m.), and the evening peak (17:00 p.m.–19:00 p.m.), which is different from the conventional cognitive “double-peak” (morning peak, evening peak). Chengdu shows a more diversified pattern within the time range of the study, i.e., 12:00–14:00 p.m. in the afternoon peak time.
Meanwhile, non-working days did not show a morning peak scenario compared to working days, but the number of orders from noon onwards was higher than on working days, and orders increased in the early hours of the morning.
Figure 4 is a violin plot of order duration, which focuses more on the distribution of the data, such as the most concentrated middle 50% data (i.e., 25% quartile~75% quartile) and 95% concentrated data (i.e., 2.5% quartile~97.5% quartile). From the information in the graph, it can be seen that the average duration of order duration is higher during the morning peak hour (8:00~10:00 a.m.) and the evening peak hour (16:00~18:00 p.m.). The overall order duration is concentrated around 15 min.
Next, the inflow and outflow of response orders are analyzed through the spatial dimension. Figure 5 shows the thermal distribution of orders at the origin and destination, with the orders at the origin mainly concentrated in the northeast direction of the center of the study area. The distribution of orders at the arrival location is more peripheral and evenly distributed than that at the departure location. Therefore, the flow of orders within the study timeframe is mainly center-to-center and center-to-periphery and less outward-to-center.

4.3. Baselines

In this study, four main benchmark models are used; there are the traditional statistical models MA and ARIMA and the deep learning network LSTM, and finally, according to the different input data, the GCN-LSTM model is divided into four types (GCN-LSTM I II III IV), in which the input of model I is the adjacency matrix (that is, the single graph), the input of model II is the fusion graph (that is, the feature multiple graph containing the adjacency matrix, population structure, functional similarity, and historical demand similarity), and the input of model III is the adjacency matrix and weather and time series data. The input of model IV is the characteristic multi-graph and weather and time series data.

4.4. Results

We compared our proposed algorithm (SGC-LSTM) with the baseline model in both accuracy and fairness dimensions, and the computational results show that the baselines ignore fairness issues such as systematic underestimations of travel demand in disadvantaged regions, and our algorithm achieves better results in both accuracy and fairness. The results of accuracy and fairness comparison between different models are shown in Table 1.

4.4.1. Prediction Accuracy

The table above reports the MAE, RMSE, and MPE of all models, as well as the MPE values for high accessibility and low accessibility ones, along with the MPE gap. The results indicate that our proposed SGC-LSTM model achieves the lowest overall MAE, RMSE, and MPE gap on the test set among all models.
Looking at the overall MAE and MAPE, the SGC-LSTM and GCN-LSTM networks significantly outperform other models, indicating that these models, which simultaneously capture temporal and spatial dependencies, have good predictive capabilities.
In the GCN-LSTM network, the results show improvement in the accuracy metrics MAE and RMSE when using multiple graphs compared to using a single graph (i.e., adjacency matrix) alone. Additionally, incorporating weather sequences into predictions alongside demand data also contributes to improved prediction results. As for the SGC-LSTM network, the construction of multiple graphs based on the adjacency matrix considering population structure, land use attributes, and historical trends enables better capturing of inter-regional characteristics. Furthermore, incorporating temporal features and loss function regularization leads to improvements in MAE from 9.671 to 8.809 and in RMSE from 21.705 to 20.703 compared to the best-performing model IV in GCN-LSTM, representing at least 8.9% and 5.6% enhancements, respectively. The specific fitted curves are shown in Figure 6 (The left part is ‘Morning peak hours’, the right one is ‘evening peak hours’).

4.4.2. Prediction Fairness

Improved prediction fairness of SGC-LSTM is evidenced in two aspects: Firstly, compared to the baseline model, our model alleviates the common phenomenon of overestimating high accessibility regions and underestimating low accessibility regions.
Secondly, our results indicate that the proposed bias mitigation strategy can narrow the MPE gap between the disadvantaged group and the privileged group without compromising overall prediction accuracy, thereby achieving fairer predictions.
In comparison to other baseline models, the SGC-LSTM network exhibits better performance in terms of MPE and MPE gap indicators in high and low accessibility areas due to the regularization of the MPE indicator in the loss function.
Compared to the best-performing model IV in SGC-LSTM and GCN-LSTM, the overall average MPE gap decreases from 0.031 to 0.027, representing a 12.9% improvement.
It is worth noting that deep learning methods tend to overestimate the demand in the prediction of high accessibility regions, while traditional statistical methods underestimate the demand, which may be related to the more complex network structure and more auxiliary prediction data adopted by deep learning, which is also worth discussing in the future.
By adjusting the λ parameter in the loss function to alter the weight of “fairness” during training, we conducted bias mitigation by increasing the bias mitigation weight λ from 0 to 10, 20, 30, and 40. The results in the table below regarding the MPE and MPE gap indicate that as λ increases, the MPE values in high accessibility areas fluctuate more. Moreover, when λ = 20, the model exhibits the best performance in both prediction accuracy and fairness. The results are shown in Table 2.
Furthermore, we visualize the MPE values of the best-performing GCN-LSTM algorithm currently and our proposed algorithm in high accessibility areas and low accessibility areas, as shown in Figure 7. In high accessibility areas, GCN-LSTM I, II, and III all exhibit overestimation in their predictions. However, both model IV and the GCN-LSTM models consistently underestimate the demand in high accessibility areas across all time periods (i.e., positive MPE values), possibly due to the regularization of the loss function and the incorporation of multiple graphs, which “sacrifice” some high accessibility demands, thus satisfying the demands in low accessibility areas more. However, the SGC-LSTM algorithm overall outperforms GCN-LSTM throughout the day, mitigating the degree of underestimation, especially during the morning peak (7 a.m.–9 a.m.) and evening peak periods (5 p.m.–7 p.m.).
In low accessibility areas, the differences between the two algorithms are more pronounced. During the morning and evening peak periods, the GCN-LSTM model IV algorithm underestimates demand, while the SGC-LSTM model, due to MPE regularization and the aforementioned “sacrifice” of some high accessibility demands, avoids neglecting the demands in low accessibility areas, thereby optimizing the overall MPE values.
With the acceleration of urbanization, more and more peri-urban areas are being planned in future development strategies of central cities and infrastructure and business support will be increased for low accessibility areas; so, the demand for online car-sharing is bound to further diversify. Whether it is from peripheral areas to the central area or within the peripheral areas, there is a need for online car-sharing platforms to plan and deploy in advance and to explore and pay attention to the potential demand of population migration. Especially in the morning and evening peaks, more attractive registration thresholds and aggressive price adjustment mechanisms can be introduced to help frame the future market model of driverless shared cars.
Overall, our proposed regularization method significantly reduces prediction bias between disadvantaged and advantaged groups. The SGC-LSTM model achieves improvements in fairness while maintaining a high prediction accuracy.

5. Conclusions

This paper proposes an SGC-LSTM deep learning framework considering spatial attributes and transportation accessibility. It has the following features: (i) A series of time-series data containing temperature and rainfall are input into the LSTM network for auxiliary prediction. (ii) Socio-economic geographic data are applied to extract non-Euclidean semantic relationships, including a functional similarity graph, population structure graph, and historical demand graph, which are then input into the GCN network. (iii) Accessibility as a criterion for classifying advantaged and disadvantaged groups. (iv) The fairness index MPE is incorporated into the loss function to regularize the prediction results, and the MPE gap is used to measure the prediction gap between disadvantaged and advantaged areas. The result shows the following:
  • The prediction accuracy is enhanced by 8.9%.
  • The prediction fairness can be improved by at least 12.9%.
  • The proposed algorithm mitigates the underestimation of demand in peak hours in low accessibility areas.
This paper delves into the problem of online car demand forecasting from the unique perspective of social fairness, effectively focusing on the actual needs of disadvantaged groups while improving the forecasting accuracy in advantaged areas. This study is expected to play a positive role for self-driving car platforms in optimizing the allocation of urban transportation resources, improving travel efficiency, and promoting social fairness; when considering different bases for dividing the population, this paper will provide more convenient travel choices for people of different ages and abilities. Meanwhile, the proposed regularization method can be adopted in many short-term travel demand estimation models and can be used in a wider range of spatio-temporal forecasting tasks.
This paper still has some limitations. For example, when multiple graphs are fused, the weight of each graph is set to be equal, and then training and correction can be carried out to improve the model performance. In addition, the calculation method of reachability is relatively simple but can be further calculated with the concept of a questionnaire or isochronous circle.
For future work, we will explore the following aspects: (i) Further enriching the construction of relationship maps. This study utilized data on function, demographic structure, and historical demand, and in the future, we can also construct relationship maps using information on land use properties and road network structures, which can reflect the function, spatial structure, and planning characteristics of the city, to enrich the dimensionality of the relationship representation and seek to improve the interpretability of the model. (ii) Further focus on other sensitivity indicators, study the portability of the method proposed in this paper in other applications or cities, and explore other indicators for dividing advantageous and disadvantageous areas according to the characteristics of different regions, such as per capita annual income, house price, elderly population, etc., to further validate the equity mitigation bias proposed in this study. (iii) Further explore the research angle of mitigating equity. By developing smarter passenger and driver matching algorithms or introducing dynamic pricing strategies and rational incentive mechanism design, we can explore the potential to improve the fairness gap between different regions.

Author Contributions

Conceptualization, X.C., M.T., D.G. and T.S.; Methodology, X.C. and M.T.; Formal analysis, X.C.; Writing—original draft, X.C., M.T., D.G. and T.S.; Writing—review & editing, M.T. and D.G.; Supervision, M.T. and D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Natural Science Foundation of China [Grant number: 52302441].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tencent News. Available online: https://cloud.tencent.com/developer/news/1169217 (accessed on 15 May 2024).
  2. Bruzzone, F.; Cavallaro, F.; Nocera, S. The definition of equity in transport. Transp. Res. Procedia 2023, 69, 440–447. [Google Scholar] [CrossRef]
  3. The United Nations New Urban Agenda. Available online: https://www.un.org/zh/node/182272 (accessed on 15 May 2024).
  4. Chengdu Municipal Bureau of Transportation. Available online: https://jtys.chengdu.gov.cn/cdjtysWap/c148683/2024-04/24/content_4e8c6676aa55443794a98bca60306e9a.shtml (accessed on 15 May 2024).
  5. Zheng, Y.; Wang, Q.; Zhuang, D.; Wang, S.; Zhao, J. Fairness-enhancing deep learning for ride-hailing demand prediction. IEEE Open J. Intell. Transp. Syst. 2023, 551–569. [Google Scholar] [CrossRef]
  6. Guo, X.; Xu, H.; Zhuang, D.; Zheng, Y.; Zhao, J. Fairness-enhancing vehicle rebalancing in the ride-hailing system. arXiv 2023, arXiv:2401.00093. [Google Scholar]
  7. Pirie, G.H. On spatial justice. Environ. Plan. A 1983, 15, 465–473. [Google Scholar] [CrossRef]
  8. Le Grand, J. Equity and Choice: An Essay in Economics and Applied Philosophy; Routledge: London, UK, 1991. [Google Scholar]
  9. Truelove, M. Measurement of spatial equity. Environ. Plan. C Gov. Policy 1993, 11, 19–34. [Google Scholar] [CrossRef]
  10. Hay, A.M. Concepts of equity, fairness and justice in geographical studies. Trans. Inst. Br. Geogr. 1995, 20, 500–508. [Google Scholar] [CrossRef]
  11. Zhang, N.; Zhang, Y.; Lu, H. Seasonal autoregressive integrated moving average and support vector machine models: Prediction of short-term traffic flow on freeways. Transp. Res. Rec. 2011, 2215, 85–92. [Google Scholar] [CrossRef]
  12. Li, X.; Pan, G.; Wu, Z.; Qi, G.; Li, S.; Zhang, D.; Zhang, W.; Wang, Z. Prediction of urban human mobility using large-scale taxi traces and its applications. Front. Comput. Sci. 2012, 6, 111–121. [Google Scholar] [CrossRef]
  13. Xu, H.; Ying, J.; Wu, H.; Lin, F. Public bicycle traffic flow prediction based on a hybrid model. Appl. Math. Inf. Sci. 2013, 7, 667–674. [Google Scholar] [CrossRef]
  14. Li, Y.; Zheng, Y.; Zhang, H.; Chen, L. Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3 November 2015; pp. 1–10. [Google Scholar]
  15. Guo, G.; Zhang, T. A residual spatio-temporal architecture for travel demand forecasting. Transp. Res. Part C Emerg. Technol. 2020, 115, 102639. [Google Scholar] [CrossRef]
  16. Li, C.; Bai, L.; Liu, W.; Yao, L.; Waller, S.T. A multi-task memory network with knowledge adaptation for multimodal demand forecasting. Transp. Res. Part C Emerg. Technol. 2021, 131, 103352. [Google Scholar] [CrossRef]
  17. Sun, S.; Zhang, C.; Yu, G. A Bayesian network approach to traffic flow forecasting. IEEE Trans. Intell. Transp. Syst. 2006, 7, 124–132. [Google Scholar] [CrossRef]
  18. Zhang, K.; Feng, Z.; Chen, S.; Huang, K.; Wang, G. A framework for passengers demand prediction and recommendation. In Proceedings of the 2016 IEEE International Conference on Services Computing (SCC), San Francisco, CA, USA, 27 June 2016; pp. 340–347. [Google Scholar]
  19. Saadi, I.; Wong, M.; Farooq, B.; Teller, J.; Cools, M. An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service. arXiv 2017, arXiv:1703.02433. [Google Scholar]
  20. Ke, J.; Zheng, H.; Yang, H.; Chen, X.M. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transp. Res. Part C Emerg. Technol. 2017, 85, 591–608. [Google Scholar] [CrossRef]
  21. Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; Li, Z. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 26 April 2018; Volume 32. [Google Scholar]
  22. Ke, J.; Qin, X.; Yang, H.; Zheng, Z.; Zhu, Z.; Ye, J. Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp. Res. Part C Emerg. Technol. 2021, 122, 102858. [Google Scholar] [CrossRef]
  23. Shi, X.; Chai, X.; Xie, J.; Sun, T. Mc-gcn: A multi-scale contrastive graph convolutional network for unconstrained face recognition with image sets. IEEE Trans. Image Process. 2022, 31, 3046–3055. [Google Scholar] [CrossRef]
  24. Yang, Y.; Qi, Y.; Qi, S. Relation-consistency graph convolutional network for image super-resolution. Vis. Comput. 2024, 40, 619–635. [Google Scholar] [CrossRef]
  25. Wang, X.; Zhang, W.; Wang, C.; Gao, Y.; Liu, M. Dynamic dense graph convolutional network for skeleton-based human motion prediction. IEEE Trans. Image Process. 2023, 33, 1–15. [Google Scholar] [CrossRef]
  26. Zhang, Z.R.; Jiang, Z.R. GraphDPA: Predicting drug-pathway associations by graph convolutional networks. Comput. Biol. Chem. 2022, 99, 107719. [Google Scholar] [CrossRef]
  27. Xiong, X.; Ozbay, K.; Jin, L.; Feng, C. Dynamic origin–destination matrix prediction with line graph neural networks and kalman filter. Transp. Res. Rec. 2020, 2674, 491–503. [Google Scholar] [CrossRef]
  28. Ke, J.; Yang, H.; Zheng, H.; Chen, X.; Jia, Y.; Gong, P.; Ye, J. Hexagon-based convolutional neural network for supply-demand forecasting of ride-sourcing services. IEEE Trans. Intell. Transp. Syst. 2018, 20, 4160–4173. [Google Scholar] [CrossRef]
  29. Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
  30. Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 2017, 5, 153–163. [Google Scholar] [CrossRef] [PubMed]
  31. Rajkomar, A.; Hardt, M.; Howell, M.D.; Corrado, G.; Chin, M.H. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 2018, 169, 866–872. [Google Scholar] [CrossRef]
  32. Khandani, A.E.; Kim, A.J.; Lo, A.W. Consumer credit-risk models via machine-learning algorithms. J. Bank. Financ. 2010, 34, 2767–2787. [Google Scholar] [CrossRef]
  33. Zheng, Y.; Wang, S.; Zhao, J. Equality of opportunity in travel behavior prediction with deep neural networks and discrete choice models. Transp. Res. Part C Emerg. Technol. 2021, 132, 103410. [Google Scholar] [CrossRef]
  34. Yan, A.; Howe, B. Fairness-aware demand prediction for new mobility. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1079–1087. [Google Scholar] [CrossRef]
  35. Ben-Akiva, M.; Lerman, S.R. Disaggregate travel and mobility-choice models and measures of accessibility. In Behavioural Travel Modelling; Routledge: London, UK, 2021; pp. 654–679. [Google Scholar]
Figure 1. Architecture of SGC-LSTM.
Figure 1. Architecture of SGC-LSTM.
Sustainability 16 09772 g001
Figure 2. Construction of OD graphs.
Figure 2. Construction of OD graphs.
Sustainability 16 09772 g002
Figure 3. Hour-by-hour orders on weekdays vs. weekends.
Figure 3. Hour-by-hour orders on weekdays vs. weekends.
Sustainability 16 09772 g003
Figure 4. Order time duration violin chart.
Figure 4. Order time duration violin chart.
Sustainability 16 09772 g004
Figure 5. Heat map of OD orders.
Figure 5. Heat map of OD orders.
Sustainability 16 09772 g005
Figure 6. Fitted curves of SGC-LSTM for all areas in peak hours.
Figure 6. Fitted curves of SGC-LSTM for all areas in peak hours.
Sustainability 16 09772 g006
Figure 7. Comparison results of algorithms for high and low accessibility areas.
Figure 7. Comparison results of algorithms for high and low accessibility areas.
Sustainability 16 09772 g007
Table 1. Comparison of accuracy and fairness among models.
Table 1. Comparison of accuracy and fairness among models.
ModelThe Average Value of Evaluation Indicators Across All Regions
MAERMSEMPE (Total)MPE (Low)MPE (High)MPE Gap
MA27.72032.0504.0732.4445.0322.588
ARIMA10.69014.6560.7560.2851.0330.748
LSTM26.93830.993−2.690−1.160−3.3302.170
GCN-LSTM I (single-graph)11.26725.392−0.0450.076−2.1542.230
GCN-LSTM II (multi-graph)10.60224.155−0.0670.565−0.0380.603
GCN-LSTM III (single+temporal)10.56023.840−0.1150.074−0.1390.213
GCN-LSTM IV (multi+temporal)9.67121.7050.0070.015−0.0160.031
SGC-LSTM8.80920.7030.003−0.0210.0060.027
Table 2. Comparison of the regularization results of different weight parameters.
Table 2. Comparison of the regularization results of different weight parameters.
Parameter λThe Average Value of Evaluation Indicators Across All Regions
MAERMSEMPE (Total)MPE (Low)MPE (High)MPE Gap
SGC-LSTM (λ = 10)10.56024.890−0.088−0.219−0.0490.171
SGC-LSTM (λ = 20)8.80920.7030.0030.021−0.0060.027
SGC-LSTM (λ = 30)9.41621.657−0.014−0.066−0.0090.058
SGC-LSTM (λ = 40)11.05726.349−0.137−0.358−0.0560.302
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, X.; Tu, M.; Gruyer, D.; Shi, T. Predicting Ride-Hailing Demand with Consideration of Social Equity: A Case Study of Chengdu. Sustainability 2024, 16, 9772. https://doi.org/10.3390/su16229772

AMA Style

Chen X, Tu M, Gruyer D, Shi T. Predicting Ride-Hailing Demand with Consideration of Social Equity: A Case Study of Chengdu. Sustainability. 2024; 16(22):9772. https://doi.org/10.3390/su16229772

Chicago/Turabian Style

Chen, Xinran, Meiting Tu, Dominique Gruyer, and Tongtong Shi. 2024. "Predicting Ride-Hailing Demand with Consideration of Social Equity: A Case Study of Chengdu" Sustainability 16, no. 22: 9772. https://doi.org/10.3390/su16229772

APA Style

Chen, X., Tu, M., Gruyer, D., & Shi, T. (2024). Predicting Ride-Hailing Demand with Consideration of Social Equity: A Case Study of Chengdu. Sustainability, 16(22), 9772. https://doi.org/10.3390/su16229772

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop