[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

ESS: Learning Event-Based Semantic Segmentation from Still Images

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.

Z. Sun and N. Messikommer—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 79.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 99.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For clarity, we omit the subscript i in the future.

References

  1. Alonso, I., Murillo, A.C.: EV-SegNet: semantic segmentation for event-based cameras. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2019)

    Google Scholar 

  2. Bardow, P., Davison, A.J., Leutenegger, S.: Simultaneous optical flow and intensity estimation from an event camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 884–892 (2016). https://doi.org/10.1109/CVPR.2016.102

  3. Binas, J., Neil, D., Liu, S.C., Delbruck, T.: DDD17: end-to-end DAVIS driving dataset. In: ICML Workshop on Machine Learning for Autonomous Vehicles (2017)

    Google Scholar 

  4. Brandli, C., Berner, R., Yang, M., Liu, S.C., Delbruck, T.: A 240x180 130dB 3\(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014). https://doi.org/10.1109/JSSC.2014.2342715

    Article  Google Scholar 

  5. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195

  6. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  7. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robotics Learning (CoRL) (2017)

    Google Scholar 

  8. Falanga, D., Kleber, K., Scaramuzza, D.: Dynamic obstacle avoidance for quadrotors with event cameras. Sci. Robot. 5(40), eaaz9712 (2020). https://doi.org/10.1126/scirobotics.aaz9712

  9. Gallego, G., et al.: Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3008413

    Article  Google Scholar 

  10. Gehrig, D., Gehrig, M., Hidalgo-Carrió, J., Scaramuzza, D.: Video to Events: recycling video datasets for event cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  11. Gehrig, D., Loquercio, A., Derpanis, K.G., Scaramuzza, D.: End-to-end learning of representations for asynchronous event-based data. In: International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  12. Gehrig, D., Rüegg, M., Gehrig, M., Hidalgo-Carrio, J., Scaramuzza, D.: Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction. In: IEEE Robotic and Automation Letters (RA-L) (2021)

    Google Scholar 

  13. Gehrig, M., Aarents, W., Gehrig, D., Scaramuzza, D.: DSEC: a stereo event camera dataset for driving scenarios. In: IEEE Robotics and Automation Letters (2021). https://doi.org/10.1109/LRA.2021.3068942

  14. Hu, Y., Delbruck, T., Liu, S.-C.: Learning to exploit multiple vision modalities by using grafted networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 85–101. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_6

    Chapter  Google Scholar 

  15. Hidalgo-Carrio, J., Gehrig, D., Scaramuzza, D.: Learning monocular dense depth from events. IEEE International Conference on 3D Vision (3DV) (2020)

    Google Scholar 

  16. Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  17. Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D.: Event-based vision meets deep learning on steering prediction for self-driving cars. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5419–5427 (2018). https://doi.org/10.1109/CVPR.2018.00568

  18. Messikommer, N., Gehrig, D., Gehrig, M., Scaramuzza, D.: Bridging the gap between events and frames through unsupervised domain adaptation. In: IEEE Robotics and Automation Letters (2022)

    Google Scholar 

  19. Muglikar, M., Moeys, D., Scaramuzza, D.: Event-guided depth sensing. In: IEEE International Conference on 3D Vision (3DV) (2021)

    Google Scholar 

  20. Perot, E., de Tournemire, P., Nitti, D., Masci, J., Sironi, A.: Learning to detect objects with a 1 megapixel event camera. In: Conference on Neural Information Processing Systems (NIPS) (2020)

    Google Scholar 

  21. Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2963386

    Article  Google Scholar 

  22. Reinbacher, C., Graber, G., Pock, T.: Real-time intensity-image reconstruction for event cameras using manifold regularisation. In: British Machine Vision Conference (BMVC) (2016). https://doi.org/10.5244/C.30.9

  23. Rosinol Vidal, A., Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Ultimate SLAM? combining events, images, and IMU for robust visual SLAM in HDR and high speed scenarios. IEEE Robot. Autom. Lett. 3(2), 994–1001 (2018). https://doi.org/10.1109/LRA.2018.2793357

    Article  Google Scholar 

  24. Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821 (2020)

  25. Tulyakov, S., et al.: TimeLens: event-based video frame interpolation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  26. Wang, L., Kim, T.K., Yoon, K.J.: EventSR: from asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8312–8322 (2020)

    Google Scholar 

  27. Wang, L., Chae, Y., Yoon, K.J.: Dual transfer learning for event-based end-task prediction via pluggable event to image translation. In: International Conference on Computer Vision (ICCV), pp. 2135–2145 (2021)

    Google Scholar 

  28. Wang, L., Chae, Y., Yoon, S.H., Kim, T.K., Yoon, K.J.: EvDistill: asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  29. Zhu, A.Z., Atanasov, N., Daniilidis, K.: Event-based visual inertial odometry. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5816–5824 (2017). https://doi.org/10.1109/CVPR.2017.616

  30. Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based learning of optical flow, depth, and egomotion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Centre of Competence in Research (NCCR) Robotics through the Swiss National Science Foundation (SNSF) and the European Research Council (ERC) under grant agreement No. 864042 (AGILEFLIGHT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nico Messikommer .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2997 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Z., Messikommer, N., Gehrig, D., Scaramuzza, D. (2022). ESS: Learning Event-Based Semantic Segmentation from Still Images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19830-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19829-8

  • Online ISBN: 978-3-031-19830-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics