Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Eray Mert Kavuk^1,3,
Ayse Tosun¹,
Mucahit Cevik²,
Aysun Bozanta²,
Sibel B. Sonuç³,
Mehmetcan Tutuncu³,
Bilgin Kosucu³ &
…
Ayse Basar²

1794 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

This paper proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. Before applying off-the-shelf reinforcement learning methods, we define the specific problem at Getir and one of the solutions the company has implemented. We discuss the novel aspects of Getir’s problem compared to the state-of-the-art order dispatching studies and highlight the limitations of Getir’s solution. The overall aim of the company is to deliver to as many customers as possible within 10 minutes. The orders arrive throughout the day, and centralized warehouses in the regions decide whether an incoming order should be served or canceled depending on their couriers’ shifts and status. We use Deep Q-networks to learn the actions of warehouses, i.e., accepting or canceling an order, directly from state dimensions using reinforcement learning. We design the networks with two different rewards. We conduct empirical analyses using real-life data provided by Getir to generate training samples and to assess the models’ performance during a selected 30-day period with a total of 9880 orders. The results indicate that our proposed models are able to generate policies that outperform the rule-based heuristic employed in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Deep Reinforcement Learning Framework for Optimal Trade Execution

Online food ordering delivery strategies based on deep reinforcement learning

Article 17 September 2021

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Due to privacy agreements with the company, we do not disclose the exact numbers for the queue size and delivery time limits in this paper.
https://kovan.itu.edu.tr/index.php/s/bG1VPCovocpnKyU

References

Agarap AF (2018) Deep learning using rectified linear units (relu). CoRR arXiv:1803.08375
Chen B, Qu R, Bai R, Laesanklang W (2019a) A variable neighborhood search algorithm with reinforcement learning for a real-life periodic vehicle routing problem with time windows and open routes. RAIRO Operations Research
Chen Y, Qian Y, Yao Y, Wu Z, Li R, Zhou Y, Hu H, Xu Y (2019b) Can sophisticated dispatching strategy acquired by reinforcement learning? In: 18th International Conference on Autonomous Agents and MultiAgent Systems, pp 1395–1403
Han S, Zhao L, Chen K, Zw Luo, Mishra D (2017) Appointment scheduling and routing optimization of attended home delivery system with random customer behavior. Eur J Oper Res 262(3):966–980
Article MathSciNet Google Scholar
Holler J, Vuorio R, Qin Z, Tang X, Jiao Y, Jin T, Singh S, Wang C, Ye J (2019) Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem. In: 2019 IEEE International Conference on Data Mining (ICDM), pp 1090–1095. https://doi.org/10.1109/ICDM.2019.00129
Huang Y, Zhao L, Powell W B, Tong Y, Ryzhov I O (2019) Optimal learning for urban delivery fleet allocation. Transp Sci 53(3):623–641. https://doi.org/10.1287/trsc.2018.0861
Article Google Scholar
Jung J, Jayakrishnan R (2013) Design and modeling of real-time shared-taxi dispatch algorithms. In: Transportation Research Board 92nd Annual Meeting
Kingma D P, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980
Li Y, Zheng Y, Yang Q (2019) Efficient and effective express via contextual cooperative reinforcement learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 510–519
Lin C, Choy K L, Ho G T, Lam H, Pang G K, Chin K S (2014) A decision support system for optimizing dynamic courier routing operations. Expert Syst Appl 41(15):6917–6933
Article Google Scholar
Lin K, Zhao R, Xu Z, Zhou J (2018) Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD’18. Association for Computing Machinery, New York, pp 1774–1783. https://doi.org/10.1145/3219819.3219993
Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: a view from the width. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, Curran Associates, Inc., pp 6231–6239
Mahmud M, Kaiser MS, Hussain A, Vassanelli S (2018) Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst 29(6):2063–2079. https://doi.org/10.1109/TNNLS.2018.2790388
Article MathSciNet Google Scholar
Masoud N, Jayakrishnan R (2017) A real-time algorithm to solve the peer-to-peer ride-matching problem in a flexible ridesharing system. Transportation Research Part B Methodological. https://doi.org/10.1016/j.trb.2017.10.006
Massey Jr FJ (1951) The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 46 (253):68–78. https://doi.org/10.1080/01621459.1951.10500769
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533
Article Google Scholar
Ota M, Vo H, Silva C, Freire J (2015) A scalable approach for data-driven taxi ride-sharing simulation. In: 2015 IEEE International Conference on Big Data (Big Data), pp 888–897
Pitel L (2020) Michael moritz backs turkish grocery start-up. Available from https://www.ft.com/content/d0a427f6-36e0-11ea-a6d3-9a26f8c3cba4
Qin Z, Tang X, Jiao Y, Zhang F, Xu Z, Zhu H, Ye J (2020) Ride-hailing order dispatching at didi via reinforcement learning. INFORMS J Appl Anal 50(5):272–286
Article Google Scholar
Restrepo M I, Semet F, Pocreau T (2019) Integrated shift scheduling and load assignment optimization for attended home delivery. Transp Sci 53(4):1150–1174
Article Google Scholar
Reyes D, Erera A L, Savelsbergh M W P, Sahasrabudhe S, O’Neil RJ (2018) The meal delivery routing problem. Technical Report
Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: International Conference on Learning Representations
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Sungur I, Ren Y, Ordonez F, Dessouky M, Zhong H (2010) A model and algorithm for the courier delivery problem with uncertainty. Transp Sci 44(2):193–205. https://doi.org/10.1287/trsc.1090.0303
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd edn. The MIT Press
Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2020) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern 50(6):2687–2700. https://doi.org/10.1109/TCYB.2019.2904742
Article Google Scholar
Tang X, Qin ZT, Zhang F, Wang Z, Xu Z, Ma Y, Zhu H, Ye J (2019) A deep value-network based approach for multi-driver order dispatching. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Association for Computing Machinery, New York, p 1780–1790. https://doi.org/10.1145/3292500.3330724
Ulmer MW, Thomas BW, Mattfeld DC (2019) Preemptive depot returns for dynamic same-day delivery. EURO J Transp Logist 8(4):327–361. https://doi.org/10.1007/s13676-018-0124-0, https://www.sciencedirect.com/science/article/pii/S2192437620300479
Uwano F, TATEBE N, TAJIMA Y, NAKATA M, KOVACS T, TAKADAMA K (2018) Multi-agent cooperation based on reinforcement learning with internal reward in maze problem. SICE J Control Measur Syst Integr 11(4):321–330. https://doi.org/10.9746/jcmsi.11.321
Article Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 30
Vera JM, Abad AG (2019) Deep reinforcement learning for routing a heterogeneous fleet of vehicles. In: 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp 1–6
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd International Conference on Machine Learning, PMLR, vol 48. Proceedings of Machine Learning Research, New York, pp 1995–2003. http://proceedings.mlr.press/v48/wangf16.html
Wang Z, Qin Z, Tang X, Ye J, Zhu H (2018) Deep reinforcement learning with knowledge transfer for online rides order dispatching. In: 2018 IEEE International Conference on Data Mining (ICDM), pp 617–626, https://doi.org/10.1109/ICDM.2018.00077
Zhao J, Mao M, Zhao X, Zou J (2020) A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans Intell Transp Syst:1–11. https://doi.org/10.1109/TITS.2020.3003163
Zhou M, Jin J, Zhang W, Qin Z, Jiao Y, Wang C, Wu G, Yu Y, Ye J (2019) Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19. Association for Computing Machinery, New York, pp 2645–2653. https://doi.org/10.1145/3357384.3357799

Download references

Acknowledgments

This research is partly funded by Getir Perakende Lojistik A.S., Istanbul, Turkey.

Author information

Authors and Affiliations

Faculty of Computer and Informatics Engineering, Istanbul Technical University, Istanbul, 34467, Turkey
Eray Mert Kavuk & Ayse Tosun
Data Science Lab, Mechanical Industrial Engineering Department, Ryerson University, Toronto, Ontario, M5B 2K3, Canada
Mucahit Cevik, Aysun Bozanta & Ayse Basar
Getir Perakende Lojistik A.S., Istanbul, 34337, Turkey
Eray Mert Kavuk, Sibel B. Sonuç, Mehmetcan Tutuncu & Bilgin Kosucu

Authors

Eray Mert Kavuk
View author publications
You can also search for this author in PubMed Google Scholar
Ayse Tosun
View author publications
You can also search for this author in PubMed Google Scholar
Mucahit Cevik
View author publications
You can also search for this author in PubMed Google Scholar
Aysun Bozanta
View author publications
You can also search for this author in PubMed Google Scholar
Sibel B. Sonuç
View author publications
You can also search for this author in PubMed Google Scholar
Mehmetcan Tutuncu
View author publications
You can also search for this author in PubMed Google Scholar
Bilgin Kosucu
View author publications
You can also search for this author in PubMed Google Scholar
Ayse Basar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eray Mert Kavuk.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kavuk, E.M., Tosun, A., Cevik, M. et al. Order dispatching for an ultra-fast delivery service via deep reinforcement learning. Appl Intell 52, 4274–4299 (2022). https://doi.org/10.1007/s10489-021-02610-0

Download citation

Accepted: 11 June 2021
Published: 20 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02610-0

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Deep Reinforcement Learning Framework for Optimal Trade Execution

Online food ordering delivery strategies based on deep reinforcement learning

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Deep Reinforcement Learning Framework for Optimal Trade Execution

Online food ordering delivery strategies based on deep reinforcement learning

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation