Optical Flow Estimation by Matching Time Surface with Event-Based Cameras
<p>Overview of the proposed method. <b>Left</b>: The plot of the real event data [<a href="#B13-sensors-21-01150" class="html-bibr">13</a>] taken by DAVIS [<a href="#B14-sensors-21-01150" class="html-bibr">14</a>] in <math display="inline"><semantics> <mrow> <mi>x</mi> <mo>−</mo> <mi>y</mi> <mo>−</mo> <mi>t</mi> </mrow> </semantics></math> space. The red and green dots indicate the positive and negative events respectively. <b>Middle</b>: Time surface and shifted time surface (<b>upper</b> and <b>bottom</b>) at each polarity (<b>left</b> and <b>right</b>). The event timestamp is color-coded (with red for the most recent and blue for the oldest part of the time surface). The brackets represent the time width in which the time surface is formed. <b>Right</b>: Warp the time surface by the optical flow parameters, measure the matching cost, and minimize it.</p> "> Figure 2
<p>Principle of the time surface matching. (<b>a</b>) The time surface formed when a line segment over the Y direction moves in the X direction. The event timestamp is color-coded (with red for the most recent and blue for the oldest part of the time surface). The non-transparent surface in (<b>b</b>) is the <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> shifted time surface. Time Surface Matching Loss evaluated the consistency of the timestamps between the time surface and the shifted time surface.</p> "> Figure 3
<p>A one-dimensional time surface when the velocity changes over a time period of <math display="inline"><semantics> <mi>τ</mi> </semantics></math>. Blue and red plots indicate the time surface and the shifted time surface, respectively. The green and magenta arrows indicate the motion vector in <math display="inline"><semantics> <mrow> <mi>x</mi> <mo>−</mo> <mi>t</mi> </mrow> </semantics></math> space and the optical flow, respectively. The assumption of constant speed is only necessary for the very short interval <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> to calculate the optical flow.</p> "> Figure 4
<p>Comparison of the contrast maximization (<b>left</b>) and our surface matching approach (<b>right</b>). In the contrast maximization, alignment of events is implicitly measured by the contrast of the image of warped events (IWE) generated from warped events. In our surface-matching approach, alignment is evaluated by the timestamp consistency referring to the fixed previous time surface <span class="html-italic">S</span>. The green dots indicate the events in <math display="inline"><semantics> <mrow> <mi>x</mi> <mo>−</mo> <mi>t</mi> </mrow> </semantics></math> space.</p> "> Figure 5
<p>(<b>a</b>) Description of a rectangle scene and an event image where a rectangle is translating at <math display="inline"><semantics> <mrow> <mi mathvariant="bold">v</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mo>,</mo> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>⊤</mo> </msup> </mrow> </semantics></math> on the image plane. (<b>b</b>) Plots of the losses as a function of the optical flow parameters <math display="inline"><semantics> <mrow> <mi mathvariant="bold">v</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mn>2</mn> </msup> </mrow> </semantics></math>. Each loss map is normalized by its maximum. Bottom of figure shows the brightness image, the IWE, time surface (TS) and those enlarged in a magenta patch when the loss functions are evaluated at (<b>c</b>) <math display="inline"><semantics> <mrow> <mi mathvariant="bold">v</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mo>,</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>⊤</mo> </msup> </mrow> </semantics></math>(pix/<math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math>) and (<b>d</b>) <math display="inline"><semantics> <mrow> <mi mathvariant="bold">v</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>⊤</mo> </msup> </mrow> </semantics></math>(pix/<math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math>). The magenta arrows show the gradient of variance for the IWE and Surface Matching Loss for the TS.</p> "> Figure 6
<p>Visualization of loss landscape of Variance and Surface Matching Loss in the scene of checkerboard, brick and grass under the condition that the true optical flow is set to <math display="inline"><semantics> <mrow> <mi mathvariant="bold">v</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>⊤</mo> </msup> </mrow> </semantics></math>(pix/<math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math>). The loss functions are evaluated in a magenta patch.</p> "> Figure 7
<p>Qualitative results from Multi-Vehicle Stereo Event Camera Dataset (MVSEC). Left to right: the image, events, the optical flow by Reconstruction [<a href="#B9-sensors-21-01150" class="html-bibr">9</a>], Variance [<a href="#B10-sensors-21-01150" class="html-bibr">10</a>], Surface Matching (ours). Events are shown as accumulation for <math display="inline"><semantics> <mrow> <mi>τ</mi> <mo>+</mo> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math>. The red and green events indicate each polarity and yellow shows that both of them occur at the pixel. Optical flow is colored by direction as the lower right colormap in each image.</p> "> Figure 8
<p>Qualitative results from HVGA Asynchronous Time-based Image Sensor (ATIS) Corner Dataset (HACD). Left to right: event data, positive time surface <math display="inline"><semantics> <msub> <mi>S</mi> <mo>+</mo> </msub> </semantics></math>, negative time surface <math display="inline"><semantics> <msub> <mi>S</mi> <mo>−</mo> </msub> </semantics></math>, the predicted optical flow by Reconstruction [<a href="#B9-sensors-21-01150" class="html-bibr">9</a>], Variance [<a href="#B10-sensors-21-01150" class="html-bibr">10</a>] and Surface Matching (ours). The red and green events indicate each polarity and yellow shows that both of them occur at the pixel. The time surfaces are colored with red for the most recent and blue for the oldest part of the time surface. Optical flow is colored by direction as the lower right colormap in each image.</p> "> Figure 9
<p>Qualitative results when shaking by the ground in an outdoor scene.</p> "> Figure 10
<p>The average end-point error (<b>upper</b>) and the pitch rate from inertial measurement unit (<b>bottom</b>) in the part of the outdoor_day scene. In the left part of the graph, where there is less vertical vibration on the ground, ours has the smaller error.</p> ">
Abstract
:1. Introduction
- We propose the loss function measuring the timestamp consistency of the time surface for optical flow estimation using event-based cameras. This proposed loss function makes it possible to estimate dense optical flows without explicitly reconstructing image intensity or utilizing additional sensor information.
- Visualizing the loss landscape, we show that our loss is more stable regardless of the texture than the variance used in the motion compensation framework. Alongside this, we also show that the gradient is calculated in the correct direction in our method even around a line segment.
- We evaluate the dense optical flow estimated by optimization with L1 smoothness regularization. Our method recodes with higher accuracy compared with the conventional methods in the various scenes from the two publicly available datasets.
2. Related Work
3. Methodology
3.1. Event Representation
3.2. Time Surface
3.3. Surface Matching Loss
3.4. Comparison with Contrast Maximization
4. Experiment
4.1. Datasets
- ESIM
- An Open Event Camera Simulator (ESIM) [28] can accurately and efficiently simulate an event-based camera and output a set of events and the ground truth of optical flows with any camera motion and scene.
- MVSEC
- The Multi-Vehicle Stereo Event Camera Dataset (MVSEC) [29] contains an outdoor driving scene—by day and by night—and an indoor flight scene by the drone. The event-based camera used is mDAVIS-346B with a resolution of 346 × 260, which can capture general images simultaneously. The dataset provides the ground truth optical flow generated from depth maps by LiDAR and poses information by the Inertial Measurement Unit (IMU).
- HACD
- The HVGA ATIS Corner Dataset (HACD) [25] is built with a recording of planar patterns to evaluate corner detectors. Those sequences were taken by an Asynchronous Time-based Image Sensor (ATIS) [30] with a resolution of 480 × 360. It also contains the position of markers at four corners of the poster, each 10 ms. With this information, the homography of the plane can be calculated, and the ground truth optical flow at any point on the poster can be obtained.
4.2. Loss Landscape
- Variance The variance loss represented by Equation (11). Events are warped by the optical flow which is common at all pixels. The duration of the events used and the time interval of the optical flow were set to and in order to match our method with the condition.
- Surface Matching Loss The proposed loss function represented by Equation (7). This loss is calculated by the difference between the time surface and the shifted time surface warped by the optical flow. The sign has been inverted to match the variance.
- Results and discussions
4.3. Dense Optical Flow Estimation
- Reconstruction The method for estimating optical flow simultaneously with luminance restoration [9]. A large number of optical flows and image parameters were optimized under a temporal smoothness assumption sequentially in the sliding window containing the events with a duration of 128 .
- Variance The method of maximizing a variance of the IWE [10]. In [10], the optical flow parameters were common in the patches, but in order to make the conditions uniform, a dense optical flow estimation was performed by adding the L1 smoothness regularization as follows:Events were warped by the optical flow at each pixel: . The loss function is optimized by the primal-dual algorithm in the same way as the TV-L1 method [31]. The duration of the events used was set to to match our method.
- Surface Matching Our proposed method rewritten as follows:
- Qualitative result
- Quantitative result
4.4. Study on Hyperparameters
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gallego, G.; Delbruck, T.; Orchard, G.; Bartolozzi, C.; Taba, B.; Censi, A.; Leutenegger, S.; Davison, A.; Conradt, J.; Daniilidis, K.; et al. Event-based Vision: A Survey. arXiv 2019, arXiv:1904.08405. [Google Scholar]
- Kim, H.; Handa, A.; Benosman, R.; Ieng, S.H.; Davison, A. Simultaneous Mosaicing and Tracking with an Event Camera. In Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK, 1–5 September 2014. [Google Scholar] [CrossRef] [Green Version]
- Kim, H.; Leutenegger, S.; Davison, A.J. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 349–364. [Google Scholar] [CrossRef] [Green Version]
- Rebecq, H.; Ranftl, R.; Koltun, V.; Scaramuzza, D. Events-to-Video: Bringing Modern Computer Vision to Event Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3857–3866. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. 1981. Available online: https://ri.cmu.edu/pub_files/pub3/lucas_bruce_d_1981_2/lucas_bruce_d_1981_2.pdf (accessed on 20 January 2021).
- Horn, B.K.B.; Schunck, B.G. Determining Optical Flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef] [Green Version]
- Benosman, R.; Ieng, S.H.; Clercq, C.; Bartolozzi, C.; Srinivasan, M. Asynchronous frameless event-based optical flow. Neural Netw. 2012, 27, 32–37. [Google Scholar] [CrossRef] [PubMed]
- Brosch, T.; Tschechne, S.; Neumann, H. On event-based optical flow detection. Front. Neurosci. 2015. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bardow, P.; Davison, A.J.; Leutenegger, S. Simultaneous Optical Flow and Intensity Estimation from an Event Camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 884–892. [Google Scholar] [CrossRef] [Green Version]
- Gallego, G.; Rebecq, H.; Scaramuzza, D. A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3867–3876. [Google Scholar] [CrossRef] [Green Version]
- Zhu, A.Z.; Yuan, L.; Chaney, K.; Daniilidis, K. Unsupervised Event-based Learning of Optical Flow, Depth, and Egomotion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 989–997. [Google Scholar]
- Gallego, G.; Gehrig, M.; Scaramuzza, D. Focus Is All You Need: Loss Functions for Event-Based Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 12272–12281. [Google Scholar] [CrossRef] [Green Version]
- Mueggler, E.; Rebecq, H.; Gallego, G.; Delbruck, T.; Scaramuzza, D. The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM. Int. J. Robot. Res. 2017, 36, 142–149. [Google Scholar] [CrossRef]
- Brandli, C.; Berner, R.; Yang, M.; Liu, S.C.; Delbruck, T. A 240 × 180 130 dB 3 μs latency global shutter spatiotemporal vision sensor. IEEE J. Solid State Circuits 2014, 49, 2333–2341. [Google Scholar] [CrossRef]
- Benosman, R.; Clercq, C.; Lagorce, X.; Sio-Hoi, I.; Bartolozzi, C. Event-Based Visual Flow. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 407–417. [Google Scholar] [CrossRef] [PubMed]
- Rueckauer, B.; Delbruck, T. Evaluation of Event-Based Algorithms for Optical Flow with Ground-Truth from Inertial Measurement Sensor. Front. Neurosci. 2016, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, A.; Yuan, L.; Chaney, K.; Daniilidis, K. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras. Robot. Sci. Syst. 2018. [Google Scholar] [CrossRef]
- Ye, C.; Mitrokhin, A.; Fermüller, C.; Yorke, J.A.; Aloimonos, Y. Unsupervised Learning of Dense Optical Flow, Depth and Egomotion from Sparse Event Data. arXiv 2018, arXiv:1809.08625. [Google Scholar]
- Gallego, G.; Scaramuzza, D. Accurate Angular Velocity Estimation With an Event Camera. IEEE Robot. Autom. Lett. 2017, 2, 632–639. [Google Scholar] [CrossRef] [Green Version]
- Zhu, A.Z.; Atanasov, N.; Daniilidis, K. Event-based feature tracking with probabilistic data association. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4465–4470. [Google Scholar] [CrossRef]
- Stoffregen, T.; Kleeman, L. Simultaneous Optical Flow and Segmentation (SOFAS) using Dynamic Vision Sensor. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), Sydney, Australia, 11–13 December 2017; pp. 52–61. [Google Scholar]
- Mitrokhin, A.; Fermuller, C.; Parameshwara, C.; Aloimonos, Y. Event-Based Moving Object Detection and Tracking. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Stoffregen, T.; Kleeman, L. Event Cameras, Contrast Maximization and Reward Functions: An Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Lagorce, X.; Orchard, G.; Galluppi, F.; Shi, B.E.; Benosman, R.B. HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1346–1359. [Google Scholar] [CrossRef] [PubMed]
- Manderscheid, J.; Sironi, A.; Bourdis, N.; Migliore, D.; Lepetit, V. Speed Invariant Time Surface for Learning to Detect Corner Points With Event-Based Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 10237–10246. [Google Scholar] [CrossRef] [Green Version]
- Zach, C.; Pock, T.; Bischof, H. A Duality Based Approach for Realtime TV-L1 Optical Flow. Pattern Recognit. 2007, 214–223. [Google Scholar] [CrossRef]
- Sánchez Pérez, J.; Meinhardt-Llopis, E.; Facciolo, G. TV-L1 Optical Flow Estimation. Image Process. Line 2013, 3, 137–150. [Google Scholar] [CrossRef] [Green Version]
- Rebecq, H.; Gehrig, D.; Scaramuzza, D. ESIM: An Open Event Camera Simulator. Conf. Robot. Learn. 2018, 87, 969–982. [Google Scholar]
- Zhu, A.Z.; Thakur, D.; Ozaslan, T.; Pfrommer, B.; Kumar, V.; Daniilidis, K. The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robot. Autom. Lett. 2018, 3, 2032–2039. [Google Scholar] [CrossRef] [Green Version]
- Posch, C.; Serrano-Gotarredona, T.; Linares-Barranco, B.; Delbruck, T. Retinomorphic event-based vision sensors: Bioinspired cameras with spiking output. Proc. IEEE 2014, 102, 1470–1484. [Google Scholar] [CrossRef] [Green Version]
- Chambolle, A.; Pock, T. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 2011, 40, 120–145. [Google Scholar] [CrossRef] [Green Version]
Day 1 | Day 2 | Night 1 | Night 2 | Night 3 | Flying 1 | Flying 2 | Flying 3 | Guernica | Paris | Graffiti | |
---|---|---|---|---|---|---|---|---|---|---|---|
Reconstruction [9] | 0.267 | 0.307 | 0.283 | 0.313 | 0.365 | 0.348 | 0.525 | 0.468 | 1.99 | 2.79 | 1.91 |
Variance [10] | 0.479 | 0.479 | 0.418 | 0.368 | 0.438 | 0.351 | 0.525 | 0.469 | 4.01 | 3.11 | 1.90 |
Surface Matching | 0.257 | 0.350 | 0.334 | 0.363 | 0.356 | 0.278 | 0.422 | 0.377 | 1.50 | 2.30 | 1.36 |
25 ms | 50 ms | 75 ms | ||
---|---|---|---|---|
2.5 ms | 0.95 | 1.14 | - | |
5.0 ms | 1.34 | 1.50 | 1.61 | |
7.5 ms | - | 1.74 | 1.81 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nagata, J.; Sekikawa, Y.; Aoki, Y. Optical Flow Estimation by Matching Time Surface with Event-Based Cameras. Sensors 2021, 21, 1150. https://doi.org/10.3390/s21041150
Nagata J, Sekikawa Y, Aoki Y. Optical Flow Estimation by Matching Time Surface with Event-Based Cameras. Sensors. 2021; 21(4):1150. https://doi.org/10.3390/s21041150
Chicago/Turabian StyleNagata, Jun, Yusuke Sekikawa, and Yoshimitsu Aoki. 2021. "Optical Flow Estimation by Matching Time Surface with Event-Based Cameras" Sensors 21, no. 4: 1150. https://doi.org/10.3390/s21041150
APA StyleNagata, J., Sekikawa, Y., & Aoki, Y. (2021). Optical Flow Estimation by Matching Time Surface with Event-Based Cameras. Sensors, 21(4), 1150. https://doi.org/10.3390/s21041150