MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Jiacheng Chen¹³,
Yuefan Wu¹³,
Jiaqi Tan¹³,
Hang Ma¹³ &
…
Yasutaka Furukawa^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15064))

Included in the following conference series:

European Conference on Computer Vision

196 Accesses

Abstract

This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird’s-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian-crossings, lane-dividers, and road-boundaries). The approach borrows the query propagation paradigm from the tracking literature that explicitly associates tracked road elements from the previous frame to the current, while fusing a subset of memory latents selected with distance strides to further enhance temporal consistency. A vector latent is decoded to reconstruct the geometry of a road element. The paper further makes benchmark contributions by 1) Improving processing code for existing datasets to produce consistent ground truth with temporal alignments and 2) Augmenting existing mAP metrics with consistency checks. MapTracker significantly outperforms existing methods on both nuScenes and Agroverse2 datasets by over 8% and 19% on the conventional and the new consistency-aware metrics, respectively. The code and models are available on our project page: https://map-tracker.github.io.

J. Chen, Y. Wu and J. Tan—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Enhancing Vectorized Map Perception with Historical Rasterized Maps

ADMap: Anti-disturbance Framework for Vectorized HD Map Construction

References

Online hd map construction challenge for autonomous driving on cvpr 2023 workshop on end-to-end autonomous driving. https://github.com/Tsinghua-MARS-Lab/Online-HD-Map-Construction-CVPR2023 (2023)
Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Google Scholar
Cai, J., et al.: Memot: multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, J., Deng, R., Furukawa, Y.: Polydiffuse: Polygonal shape reconstruction via guided set diffusion models. arXiv preprint arXiv:2306.01461 (2023)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Ieee (2009)
Google Scholar
Ding, W., Qiao, L., Qiu, X., Zhang, C.: Pivotnet: vectorized pivot learning for end-to-end hd map construction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3672–3682 (2023)
Google Scholar
Gao, R., Wang, L.: Memotr: long-term memory-augmented transformer for multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9901–9910 (2023)
Google Scholar
Gu, J., et al.: Vip3d: end-to-end visual trajectory prediction via 3d agent queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5496–5506 (2023)
Google Scholar
Han, C., et al.: Exploring recurrent long-term temporal fusion for multi-view 3d perception. arXiv preprint arXiv:2303.05970 (2023)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, J., Huang, G.: Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054 (2022)
Li, E., Casas, S., Urtasun, R.: Memoryseg: online lidar semantic segmentation with a latent memory. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13619–13627 (2022)
Google Scholar
Li, H., et al.: Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe. IEEE Trans. Pattern Analy. Mach. Intell. (2023)
Google Scholar
Li, Q., Wang, Y., Wang, Y., Zhao, H.: Hdmapnet: an online hd map construction and evaluation framework. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 4628–4634. IEEE (2022)
Google Scholar
Li, Z., et al.: Bevformer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: European conference on computer vision. pp. 1–18. Springer (2022). https://doi.org/10.1007/978-3-031-20077-9_1
Liao, B., et al.: Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437 (2022)
Liao, B., et al.: Maptrv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736 (2023)
Lilja, A., Fu, J., Stenborg, E., Hammarstrand, L.: Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. arXiv preprint arXiv:2312.06420 (2023)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, X., Lin, T., Pei, Z., Huang, L., Su, Z.: Sparse4d v2: Recurrent temporal fusion with sparse model. arXiv preprint arXiv:2305.14018 (2023)
Lin, X., Pei, Z., Lin, T., Huang, L., Su, Z.: Sparse4d v3: Advancing end-to-end 3d detection and tracking. arXiv preprint arXiv:2311.11722 (2023)
Liu, Y., Yuan, T., Wang, Y., Wang, Y., Zhao, H.: Vectormapnet: end-to-end vectorized hd map learning. In: International Conference on Machine Learning, pp. 22352–22369. PMLR (2023)
Google Scholar
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. ArXiv abs/ arXiv: 1711.05101 (2017)
Ma, Y., et al.: Vision-centric bev perception: A survey. arXiv preprint arXiv:2208.02797 (2022)
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Google Scholar
Philion, J., Fidler, S.: Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 194–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_12
Chapter Google Scholar
Qiao, L., Ding, W., Qiu, X., Zhang, C.: End-to-end vectorized hd-map construction with piecewise bezier curve. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13218–13228 (2023)
Google Scholar
Qiao, L., et al.: Machmap: End-to-end vectorized solution for compact hd-map construction. arXiv preprint arXiv:2306.10301 (2023)
Shan, T., Englot, B.: Lego-loam: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4758–4765. IEEE (2018)
Google Scholar
Shan, T., Englot, B., Meyers, D., Wang, W., Ratti, C., Rus, D.: Lio-sam: tightly-coupled lidar inertial odometry via smoothing and mapping. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5135–5142. IEEE (2020)
Google Scholar
Sun, P., et al.: Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Wang, S., et al.: Stream query denoising for vectorized hd map construction. arXiv preprint arXiv:2401.09112 (2024)
Wilson, B., et al.: Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493 (2023)
Xu, Z., Wong, K.K., Zhao, H.: Insightmapper: A closer look at inner-instance information for vectorized high-definition mapping. arXiv preprint arXiv:2308.08543 (2023)
Yang, C., et al.: Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17830–17839 (2023)
Google Scholar
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13–es (2006)
Google Scholar
Yuan, T., Liu, Y., Wang, Y., Wang, Y., Zhao, H.: Streammapnet: streaming mapping network for vectorized online hd map construction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 7356–7365 (2024)
Google Scholar
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: Motr: End-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision, pp. 659–675. Springer (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, G., et al.: Online map vectorization for autonomous driving: A rasterization perspective. arXiv preprint arXiv:2306.10502 (2023)
Zhang, J., Singh, S.: Loam: Lidar odometry and mapping in real-time. In: Robotics: Science and Systems (2014)
Google Scholar
Zhang, Y., Wang, T., Zhang, X.: Motrv2: bootstrapping end-to-end multi-object tracking by pretrained object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22056–22065 (2023)
Google Scholar
Zhang, Z., Zhang, Y., Ding, X., Jin, F., Yue, X.: Online vectorized hd map construction using geometry. arXiv preprint arXiv:2312.03341 (2023)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

This research is partially supported by NSERC Discovery Grants, NSERC Alliance Grants, and John R. Evans Leaders Fund (JELF). We thank the Digital Research Alliance of Canada and BC DRI Group for providing computational resources.

Author information

Authors and Affiliations

Simon Fraser University, Burnaby, Canada
Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma & Yasutaka Furukawa
Wayve, London, UK
Yasutaka Furukawa

Authors

Jiacheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuefan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Tan
View author publications
You can also search for this author in PubMed Google Scholar
Hang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yasutaka Furukawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiacheng Chen .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5626 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Wu, Y., Tan, J., Ma, H., Furukawa, Y. (2025). MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15064. Springer, Cham. https://doi.org/10.1007/978-3-031-72658-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-72658-3_6
Published: 02 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72657-6
Online ISBN: 978-3-031-72658-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Enhancing Vectorized Map Perception with Historical Rasterized Maps

ADMap: Anti-disturbance Framework for Vectorized HD Map Construction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5626 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Enhancing Vectorized Map Perception with Historical Rasterized Maps

ADMap: Anti-disturbance Framework for Vectorized HD Map Construction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5626 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation