A Dataset of Annotated Omnidirectional Videos for Distancing Applications
<p>A scene acquired by a dual-lens 360° camera (Gear 360 by Samsung, Seoul, Korea) before (<b>top</b>) and after (<b>bottom</b>) stitching. The bottom figure is the equirectangular projection of the spherical acquisition. “Quattro Canti”, Palermo.</p> "> Figure 2
<p>Two frames from the videos of the CVIP360 dataset. Outdoors (<b>top</b>) and indoors (<b>bottom</b>).</p> "> Figure 3
<p>Projection of a circle onto an equirectangular image: in the case where the circle is on a plane parallel to the camera horizontal plane (<b>top</b>), and in the case where the two planes are not parallel (<b>bottom</b>). Note that in the second case points that belong to the same circle relate in the equirectangular image to a sinusoid-like curve.</p> "> Figure 4
<p>Relationship between the markers’ distance to the camera (in meters) and the corresponding y-coordinate in the equirectangular image, with different values of camera height (colored curves) for outdoor (<b>top</b>) and indoor (<b>bottom</b>) videos. Points at distances 1, 2, 3 … meters relate to the markers’ locations. Points between two markers are linearly interpolated. The points at the “zero” distance refer to the pixels belonging to the last row of the image (y = 2160).</p> "> Figure 4 Cont.
<p>Relationship between the markers’ distance to the camera (in meters) and the corresponding y-coordinate in the equirectangular image, with different values of camera height (colored curves) for outdoor (<b>top</b>) and indoor (<b>bottom</b>) videos. Points at distances 1, 2, 3 … meters relate to the markers’ locations. Points between two markers are linearly interpolated. The points at the “zero” distance refer to the pixels belonging to the last row of the image (y = 2160).</p> "> Figure 5
<p>Annotation of pedestrians into the equirectangular image (<b>top</b>). Bird’s eye view of the same scene (<b>bottom</b>). For each pedestrian, the image shows the corresponding distance, <span class="html-italic">d</span>, on the ground and the angle, <math display="inline"><semantics> <mi>β</mi> </semantics></math> (pan).</p> "> Figure 6
<p>Geometrical representation of a subject acquired by a 360° device. <span class="html-italic">d</span> is the distance between the camera and the subject, <span class="html-italic">h<sub>C</sub></span> is the camera height, and α is the angle between the line that joins the center of the camera and the horizon and the segment that joins the center of the camera and the point of intersection between the subject and the ground.</p> "> Figure 7
<p>How to compute α from an equirectangular image. α is proportional to the segment that joins the horizon to the lower bound of the bounding box, as shown in Equation (3).</p> "> Figure 8
<p>Experimental results. Mean absolute error versus distance to the camera, for outdoor (<b>top</b>) and indoor (<b>bottom</b>) videos. Note that the maximum measurable distance is 6 m in indoor videos and 10 m in outdoor videos. Graphs also report the overall values.</p> "> Figure 9
<p>Experimental results. Mean relative error versus distance to the camera, for outdoor (<b>top</b>) and indoor (<b>bottom</b>) videos. Note that the maximum measurable distance is 6 m in indoor videos and 10 m in outdoor videos. Graphs also report the overall values.</p> "> Figure 10
<p>Experimental results. Sensitivity of MAE and MRE errors versus the height of the camera. x-axis is the deviation (in percentage) between the camera height set by the user in Equation (2) and the true value.</p> "> Figure 11
<p>We observed that the method is less accurate when pedestrians approach the borders of the equirectangular image, as in the case in the figure (<b>top</b>). The bird’s eye view of the scene (<b>bottom</b>), showing the positions of all the persons into the camera system. For the person close to the borders, we show both the correct (according to the ground truth) and the estimated (according to our method) positions, in green and black respectively. In the other cases, the error is negligible with respect to the scale of the graph.</p> "> Figure 12
<p>Some representative cases where the method cannot be applied: person jumping high (<b>top</b>); person going upstairs (<b>bottom</b>).</p> ">
Abstract
:1. Introduction
2. Omnidirectional Cameras
3. Related Work
3.1. 360° Videos Datasets
3.2. Depth Estimation
360° Datasets for Depth Estimation
4. CVIP360 Dataset
- -
- We manually labeled all the pedestrians’ locations in the equirectangular frames by using bounding boxes.
- -
- We assigned to each pixel of the equirectangular frames a “real” distance to the camera’s location, according to a methodology described below in this section. In practice, we annotated only the image portion representing the area under the horizon. This area is related to the ground where people walk, as further explained in this section.
4.1. Distancing Annotation
5. Distance Estimation Algorithm
6. Results
Limitations and Strengths
- -
- The lens distortion is at its maximum at the borders, so the assumption that the projection is spherical may not hold;
- -
- Pixels near the borders of the equirectangular image are reconstructed by interpolation;
- -
- After correcting the distortion to ensure the horizontality hypothesis, as explained in Section 4.1, a residual misalignment between the markers at the center and those at the borders results in different pixel positions.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Rhee, T.; Petikam, L.; Allen, B.; Chalmers, A. Mr360: Mixed reality rendering for 360 panoramic videos. IEEE Trans. Vis. Comput. Graph. 2017, 23, 1379–1388. [Google Scholar] [CrossRef] [PubMed]
- Teo, T.; Lawrence, L.; Lee, G.A.; Billinghurst, M.; Adcock, M. Mixed reality remote collaboration combining 360 video and 3d reconstruction. In Proceedings of the 2019 CHI Conference on Human Factors in COMPUTING Systems, Glasgow, UK, 4–9 May 2019; pp. 1–14. [Google Scholar]
- Gaspar, J.; Winters, N.; Santos-Victor, J. Vision-based navigation and environmental representations with an omnidirectional camera. IEEE Trans. Robot. Autom. 2000, 16, 890–898. [Google Scholar] [CrossRef]
- Rituerto, A.; Puig, L.; Guerrero, J.J. Visual slam with an omnidirectional camera. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 348–351. [Google Scholar]
- Monteleone, V.; Lo Presti, L.; La Cascia, M. Pedestrian tracking in 360 video by virtual PTZ cameras. In Proceedings of the 2018 IEEE 4th International Forum on Research and Technology for Society and Industry (IEEE RTSI), Palermo, Italy, 10–13 September 2018; pp. 1–6. [Google Scholar]
- Monteleone, V.; Lo Presti, L.; La Cascia, M. Particle filtering for tracking in 360 degrees videos using virtual PTZ cameras. In Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy, 9–13 September 2019; pp. 71–81. [Google Scholar]
- Lo Presti, L.; La Cascia, M. Deep Motion Model for Pedestrian Tracking in 360 Degrees Videos. In Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy, 9–13 September 2019; pp. 36–47. [Google Scholar]
- Nayar, S.K. Catadioptric omnidirectional camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 482–488. [Google Scholar]
- Abdelhamid, H.; Weng, D.; Chen, C.; Abdelkarim, H.; Mounir, Z.; Raouf, G. 360 degrees imaging systems design, implementation and evaluation. In Proceedings of the 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC), Shenyang, China, 20–22 December 2013; pp. 2034–2038. [Google Scholar]
- Corbillon, X.; Simon, G.; Devlic, A.; Chakareski, J. Viewport-adaptive navigable 360-degree video delivery. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–7. [Google Scholar]
- Elwardy, M.; Hans-Jürgen, Z.; Veronica, S. Annotated 360-Degree Image and Video Databases: A Comprehensive Survey. In Proceedings of the 2019 13th International Conference on Signal Processing and Communication Systems ICSPCS, Gold Coast, QLD, Australia, 16–18 December 2019. [Google Scholar]
- Coors, B.; Alexandru, P.C.; Andreas, G. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Delforouzi, A.; David, H.; Marcin, G. Deep learning for object tracking in 360 degree videos. In Proceedings of the International Conference on Computer Recognition Systems, Polanica Zdroj, Poland, 20–22 May 2019. [Google Scholar]
- Mi, T.W.; Mau-Tsuen, Y. Comparison of tracking techniques on 360-degree videos. Appl. Sci. 2019, 9, 3336. [Google Scholar] [CrossRef] [Green Version]
- Liu, K.C.; Shen, Y.T.; Chen, L.G. Simple online and realtime tracking with spherical panoramic camera. In Proceedings of the 2018 IEEE International Conference on Consumer Electronics ICCE, Las Vegas, NV, USA, 12–14 January 2018. [Google Scholar]
- Yang, F. Using panoramic videos for multi-person localization and tracking in a 3d panoramic coordinate. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, Barcelona, Spain, 4–8 May 2020. [Google Scholar]
- Chen, G.; St-Charles, P.-L.; Bouachir, W.; Joeisseint, T.; Bilodeau, G.-A.; Bergevin, R. Reproducible Evaluation of Pan-Tilt-Zoom Tracking, Technical Report. arXiv 2015, arXiv:1505.04502. [Google Scholar]
- Demiröz, B.E.; Ari, I.; Eroğlu, O.; Salah, A.A.; Akarun, L. Feature-based tracking on a multi-omnidirectional camera dataset. In Proceedings of the 2012 5th International Symposium on Communications, Control and Signal Processing, Rome, Italy, 2–4 May 2012. [Google Scholar]
- Yogamani, S.; Hughes, C.; Horgan, J.; Sistu, G.; Varley, P.; O’Dea, D.; Uricar, M.; Milz, S.; Simon, M.; Amende, K.; et al. Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, present, and future of face recognition: A review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
- Lo Presti, L.; La Cascia, M. Boosting Hankel matrices for face emotion recognition and pain detection. Comput. Vis. Image Underst. 2017, 156, 19–33. [Google Scholar] [CrossRef]
- Lo Presti, L.; Morana, M.; La Cascia, M. A data association algorithm for people re-identification in photo sequences. In Proceedings of the 2010 IEEE International Symposium on Multimedia, Taichung, Taiwan, 13–15 December 2010; pp. 318–323. [Google Scholar]
- Li, Y.H.; Lo, I.C.; Chen, H.H. Deep Face Rectification for 360° Dual-Fisheye Cameras. IEEE Trans. Image Process. 2020, 30, 264–276. [Google Scholar] [CrossRef] [PubMed]
- Fu, J.; Alvar, S.R.; Bajic, I.; Vaughan, R. FDDB-360: Face detection in 360-degree fisheye images. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 15–19. [Google Scholar]
- Lo Presti, L.; Sclaroff, S.; La Cascia, M. Object matching in distributed video surveillance systems by LDA-based appearance descriptors. In Proceedings of the International Conference on Image Analysis and Processing, Vietri sul Mare, Italy, 8–11 September 2009; pp. 547–557. [Google Scholar]
- Liu, Y.; Jiang, J.; Sun, J.; Bai, L.; Wang, Q. A Survey of Depth Estimation Based on Computer Vision. In Proceedings of the 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Hong Kong, China, 27–30 July 2020; pp. 135–141. [Google Scholar]
- Laga, H.; Jospin, L.V.; Boussaid, F.; Bennamoun, M. A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef] [PubMed]
- Bhoi, A. Monocular depth estimation: A survey. arXiv 2019, arXiv:1901.09402. [Google Scholar]
- Zhao, C.; Sun, Q.; Zhang, C.; Tang, Y.; Qian, F. Monocular depth estimation based on deep learning: An overview. Sci. China Technol. Sci. 2020, 63, 1–16. [Google Scholar] [CrossRef]
- Liu, F.; Shen, C.; Lin, G.; Reid, I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2024–2039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiang, H.; Huang, R. High quality monocular depth estimation via a multi-scale network and a detail-preserving objective. In Proceedings of the 2019 IEEE International Conference on Image Processing ICIP, Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Li, B.; Dai, Y.; He, M. Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference. Pattern Recognit. 2018, 83, 328–339. [Google Scholar] [CrossRef] [Green Version]
- Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G.J. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3828–3838. [Google Scholar]
- Wang, N.H.; Solarte, B.; Tsai, Y.H.; Chiu, W.C.; Sun, M. 360sd-net: 360 stereo depth estimation with learnable cost volume. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 582–588. [Google Scholar]
- Jiang, H.; Sheng, Z.; Zhu, S.; Dong, Z.; Huang, R. Unifuse: Unidirectional fusion for 360 panorama depth estimation. IEEE Robot. Autom. Lett. 2021, 6, 1519–1526. [Google Scholar] [CrossRef]
- Wang, F.E.; Yeh, Y.H.; Sun, M.; Chiu, W.C.; Tsai, Y.H. Bifuse: Monocular 360 depth estimation via bi-projection fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 462–471. [Google Scholar]
- Jin, L.; Xu, Y.; Zheng, J.; Zhang, J.; Tang, R.; Xu, S.; Gao, S. Geometric structure based and regularized depth estimation from 360 indoor imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 889–898. [Google Scholar]
- Im, S.; Ha, H.; Rameau, F.; Jeon, H.G.; Choe, G.; Kweon, I.S. All-around depth from small motion with a spherical panoramic camera. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 156–172. [Google Scholar]
- Zioulis, N.; Karakottas, A.; Zarpalas, D.; Daras, P. Omnidepth: Dense depth estimation for indoors spherical panoramas. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 448–465. [Google Scholar]
- Zioulis, N.; Karakottas, A.; Zarpalas, D.; Alvarez, F.; Daras, P. Spherical view synthesis for self-supervised 360 depth estimation. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 690–699. [Google Scholar]
- Chang, A.; Dai, A.; Funkhouser, T.; Halber, M.; Niessner, M.; Savva, M.; Zhang, Y. Matterport3d: Learning from rgb-d data in indoor environments. arXiv 2017, arXiv:1709.06158. [Google Scholar]
- Armeni, I.; Sax, S.; Zamir, A.R.; Savarese, S. Joint 2d-3d-semantic data for indoor scene understanding. arXiv 2017, arXiv:1702.01105. [Google Scholar]
- Handa, A.; Patraucean, V.; Stent, S.; Cipolla, R. Scenenet: An annotated generator for indoor scene understanding. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; pp. 5737–5743. [Google Scholar]
- Song, S.; Yu, F.; Zeng, A.; Chang, A.X.; Savva, M.; Funkhouser, T. Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Wang, F.E.; Hu, H.N.; Cheng, H.T.; Lin, J.T.; Yang, S.T.; Shih, M.L.; Sun, M. Self-Supervised Learning of Depth and Camera Motion from 360° Videos. arXiv 2018, arXiv:1811.05304. [Google Scholar]
- Available online: https://drive.google.com/drive/folders/1LMPW3HJHtqMGcIFXFo74x3t0Kh2REuoS?usp=sharing (accessed on 20 August 2021).
- Available online: https://buy.garmin.com/en-US/US/p/562010/pn/010-01743-00 (accessed on 20 August 2021).
Name | Year | N° Videos | N° Frames | N° Annotations | Resolution | Availability |
---|---|---|---|---|---|---|
Delforouzi et al. | 2019 | 14 | n/a | n/a | 720 × 250 | Private |
Mi et al. | 2019 | 9 | 3601 | 3601 | 1920 × 1080 | Public |
Liu et al. | 2018 | 12 | 4303 | n/a | 1280 × 720 | Private |
Yang et al. | 2020 | n/a | 1800 | n/a | n/a | Private |
Chen et al. | 2015 | 3 | 3182 | 16,782 | 640 × 480 | Public |
CVIP360 (ours) | 2021 | 16 | 17,000 | 50,000 | 3840 × 2160 | Public |
Name | N° 360° Images | Synthetic/Real | Environment | Pedestrians |
---|---|---|---|---|
Matterport3D | 10,800 | Real | Indoors | No |
Stanford2D3D | 1413 | Real | Indoors | No |
3D60 | 23,524 | Real/Synthetic | Indoors | No |
PanoSUNCG | 25,000 | Synthetic | Indoors | No |
CVIP360 (ours) | 17,000 | Real | Indoors/Outdoors | Yes |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mazzola, G.; Lo Presti, L.; Ardizzone, E.; La Cascia, M. A Dataset of Annotated Omnidirectional Videos for Distancing Applications. J. Imaging 2021, 7, 158. https://doi.org/10.3390/jimaging7080158
Mazzola G, Lo Presti L, Ardizzone E, La Cascia M. A Dataset of Annotated Omnidirectional Videos for Distancing Applications. Journal of Imaging. 2021; 7(8):158. https://doi.org/10.3390/jimaging7080158
Chicago/Turabian StyleMazzola, Giuseppe, Liliana Lo Presti, Edoardo Ardizzone, and Marco La Cascia. 2021. "A Dataset of Annotated Omnidirectional Videos for Distancing Applications" Journal of Imaging 7, no. 8: 158. https://doi.org/10.3390/jimaging7080158
APA StyleMazzola, G., Lo Presti, L., Ardizzone, E., & La Cascia, M. (2021). A Dataset of Annotated Omnidirectional Videos for Distancing Applications. Journal of Imaging, 7(8), 158. https://doi.org/10.3390/jimaging7080158