Enhancing Robotic Perception through Synchronized Simulation and Physical Common-Sense Reasoning †
<p>The figure shows an instance of the CORTEX architecture with a double connection to the robot’s body and a simulator. The working memory is instantiated as a directed graph (see details of the graph in <a href="#sensors-24-02249-f002" class="html-fig">Figure 2</a>).</p> "> Figure 2
<p>Example DSR state. Arm controller writes camera readings as an attribute of the hand_camera node (top). Position estimations of the cubes (middle) are presented as RT edges from <span class="html-italic">hand_camera</span>, inserted by the scene-estimator agent. Virtual_RT edges (named V_RT in this figure) represent geometric transformations from the origin (<span class="html-italic">world</span> node, bottom) to every cube and are given by the simulation-handler agent.</p> "> Figure 3
<p>Upon encountering state (<b>a</b>), the robot utilizes RGBD information to derive estimation (<b>b</b>), which proposes poses where cubes seem to hover above the table. Following the application of simulation physics, scene (<b>c</b>) is produced, rectifying the initial misperceptions.</p> "> Figure 4
<p>Possible scenarios for recalibration. In red are shown the cube positions reported by the perceptual system(<math display="inline"><semantics> <msup> <mi>C</mi> <mi>e</mi> </msup> </semantics></math>) and in white those previously inserted in the simulator (<math display="inline"><semantics> <msup> <mi>C</mi> <mi>s</mi> </msup> </semantics></math>). (<b>a</b>,<b>b</b>) represent cases of systematic error, where a correction in the camera position explains the discrepancy. (<b>c</b>) shows an unsystematic error, where some cubes moved, but not coordinately and (<b>d</b>) presents the case where the cubes were moved coordinately, indistinguishable for the system from cases (<b>a</b>,<b>b</b>).</p> "> Figure 5
<p>Results of the recalibration after an undesired change in camera position. The blue line represents the mean across trials with the 95% confidence interval, showing how the system can find a low-error solution despite initiating from highly erroneous guesses.</p> "> Figure 6
<p>Values for translation (x, y, z) and rotation (raw, pitch, yaw) from the arm tip to the camera found across trials.</p> "> Figure 7
<p>Overlapped arm positions used to evaluate the consistency in the estimates generated by the proposed camera positions in the calibration phase.</p> "> Figure 8
<p>Self-calibration robustness. (<b>a</b>) Average error according to the number of cubes used for calibration. (<b>b</b>) Mean error according to the cube configuration used for calibration. The dotted line in both figures represents the error using the transformation before calibration.</p> "> Figure 9
<p>Cups and balls experiment. (<b>a</b>) Shows the initial configuration. When a human hand is detected for the first time, a new node is inserted, and two spheres representing fingertips placed in the simulation (<b>b</b>). Grasp detection can be seen in (<b>c</b>,<b>d</b>) where the WM has a <span class="html-italic">grasping</span> edge from the hand to the box being manipulated. In (<b>e</b>), the box containing the cube is placed over the edge, and the simulator shows the cube falling to the ground. (<b>f</b>) shows the final state of the experiment.</p> "> Figure 9 Cont.
<p>Cups and balls experiment. (<b>a</b>) Shows the initial configuration. When a human hand is detected for the first time, a new node is inserted, and two spheres representing fingertips placed in the simulation (<b>b</b>). Grasp detection can be seen in (<b>c</b>,<b>d</b>) where the WM has a <span class="html-italic">grasping</span> edge from the hand to the box being manipulated. In (<b>e</b>), the box containing the cube is placed over the edge, and the simulator shows the cube falling to the ground. (<b>f</b>) shows the final state of the experiment.</p> "> Figure 10
<p>Full-tower movement. The system maintains the final state well represented in (<b>a</b>) until the experimenter moves both towers out of place (<b>b</b>). These new positions trigger a surprise signal, since there is no match between perception and simulation (<b>c</b>). By consulting the working memory, the system tries to explain this situation by placing cubes four and five below cubes three and six, respectively (<b>d</b>). This modification results in a stable state, so it is taken as the explanation of the situation.</p> "> Figure 11
<p>Correction of errors in the simulation. While the arm was placing one cube on top of the other (<b>a</b>), an error in positioning causes the one below to shoot out (<b>b</b>). When this happens and the cube is no longer being grabbed, it can be seen in the simulation that it falls on the table, but this position does not match the perception and triggers a surprise signal (<b>c</b>). Again, thanks to semantic information, the cube is placed under the other and a stable configuration is achieved (<b>d</b>).</p> "> Figure 12
<p>Tower disassembly. The robot has already formed the first tower, placing cube three on top of cube four. But before it can go for the second, the experimenter places cube three back on the table (<b>a</b>). Perceiving cube three in this new location triggers the surprise signal, and since this cube was on top of four, the system places it underneath to test if that solves the discrepancy (<b>b</b>). As can be seen in (<b>c</b>), it is now cube four that generates the surprise, since by placing it underneath three its position disagrees with the perceived one. The system, as a reaction to this, erases the edge that connects the two cubes, managing to return to a harmonious state between the perception and the model (<b>d</b>).</p> "> Figure 12 Cont.
<p>Tower disassembly. The robot has already formed the first tower, placing cube three on top of cube four. But before it can go for the second, the experimenter places cube three back on the table (<b>a</b>). Perceiving cube three in this new location triggers the surprise signal, and since this cube was on top of four, the system places it underneath to test if that solves the discrepancy (<b>b</b>). As can be seen in (<b>c</b>), it is now cube four that generates the surprise, since by placing it underneath three its position disagrees with the perceived one. The system, as a reaction to this, erases the edge that connects the two cubes, managing to return to a harmonious state between the perception and the model (<b>d</b>).</p> "> Figure 13
<p>Three-cube tower. The robot has already finished assembling the two towers (<b>a</b>), and the experimenter places cube six on top of cube three (<b>b</b>). Upon detecting this cube out of the expected position and knowing its relation to cube five, an attempt is made to explain the new position by placing five underneath (<b>c</b>). Similarly to what happens in the previous test, the presence of cube five there disagrees with the perception and, therefore, it is positioned where it is perceived, and the edge that connects it with six is deleted (<b>d</b>). The state in which the simulation is left corresponds perfectly to reality, but the working memory does not fully capture the relationships between the cubes. With this state, the robot can take any of the cubes since the positions it contains for them are correct, but it will not be able to correct the simulation for some particular perturbations (such as moving the whole tower of three as a whole).</p> "> Figure 13 Cont.
<p>Three-cube tower. The robot has already finished assembling the two towers (<b>a</b>), and the experimenter places cube six on top of cube three (<b>b</b>). Upon detecting this cube out of the expected position and knowing its relation to cube five, an attempt is made to explain the new position by placing five underneath (<b>c</b>). Similarly to what happens in the previous test, the presence of cube five there disagrees with the perception and, therefore, it is positioned where it is perceived, and the edge that connects it with six is deleted (<b>d</b>). The state in which the simulation is left corresponds perfectly to reality, but the working memory does not fully capture the relationships between the cubes. With this state, the robot can take any of the cubes since the positions it contains for them are correct, but it will not be able to correct the simulation for some particular perturbations (such as moving the whole tower of three as a whole).</p> ">
Abstract
:1. Introduction
2. The CORTEX Architecture
3. The Benefits of Embedded Simulation
4. Experimental Setup
- Arm controller: This agent operates in real time, reading and inserting the gripper pose from the Kinova arm into the working memory as an spatial transformation originating from the arm base. Additionally, it injects the raw RGBD data stream acquired from the RealSense camera as an attribute of the hand_camera node.
- Scene estimator: Responsible for detecting AprilTags, this agent inserts model cubes into the working memory. The cubes are linked to the camera node through an RT edge, incorporating the estimated relative pose.
- Simulation handler: Tasked with bidirectional synchronization between the working memory and the simulation, this agent reads cube poses from the graph to update the simulation and publishes them back as Virtual_RT edges. These Virtual_RT edges, while not part of the RT tree to avoid inducing loops, are treated as symbolic edges, representing an opinion from the simulator.
4.1. Model-Based Perception
4.2. Self Calibration
Limitations
4.3. Occlusion
4.4. Expectation Violation during Plan Execution
4.4.1. Surprise Detection Agent
4.4.2. The Task
Algorithm 1 Handling of surprising cubes |
Input: |
for do |
if then |
under c |
simlation.update_position() |
else if then |
dsr.delete () |
simlation.update_position(c) |
else |
simlation.update_position(c) |
end if |
end for |
- Full-tower movement: It consists of moving a tower of two cubes and evaluating whether the system is able to recognize this perturbation and use the semantic information of the graph to correct it. Figure 10 shows snapshots of the execution of this perturbation.
- Correction of errors in the simulation: When a grasped cube collides with another one that is not being perceived at that moment, as the grasped cube does not respond to physics, when it collides with another one, it can apply a great force to it (since it is the second one that absorbs all the reaction). When this second cube is not being perceived, the system cannot return it to its place and the model remains in an inconsistent state. An example of this situation is presented in the Figure 11.
- Tower disassembly: The experimenter disassembles a tower in the middle of the plan, causing the robot to react, resynchronize, and re-plan. Figure 12 presents the execution of the test.
- Three-cube tower: As a final test and as a way to explore the reaction of the system to situations outside of its planning, this simple experiment is designed when the robot has already finished assembling the two towers (Figure 13a) and the experimenter places the cube from the top of one tower on top of the other.
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hesslow, G. Conscious thought as simulation of behaviour and perception. Trends Cogn. Sci. 2002, 6, 242–247. [Google Scholar] [CrossRef] [PubMed]
- Kubricht, J.R.; Holyoak, K.J.; Lu, H. Intuitive Physics: Current Research and Controversies. Trends Cogn. Sci. 2017, 21, 749–759. [Google Scholar] [CrossRef] [PubMed]
- Tomer, D.U.; Spelke, E.; Battaglia, P.; Tenenbaum, J.B. Mind games: Game engines as an architecture for intuitive physics. Trends Cogn. Sci. 2017, 21, 649–665. [Google Scholar]
- Battaglia, P.W.; Hamrick, J.B.; Tenenbaum, J.B. Simulation as an engine of physical scene understanding. Proc. Natl. Acad. Sci. USA 2013, 110, 18327–18332. [Google Scholar] [CrossRef] [PubMed]
- Ziemke, T.; Jirenhed, D.A.; Hesslow, G. Internal simulation of perception: A minimal neuro-robotic model. Neurocomputing 2005, 68, 85–104. [Google Scholar] [CrossRef]
- Jirenhed, D.-A.; Hesslow, G.; Ziemke, T. Exploring internal simulation of perception in mobile robots, In 2001 Fourth European Workshop on Advanced Mobile Robotics—Proceedings; Arras, K., Baerveldt, A.-J., Balkenius, C., Burgard, W., Siegwart, R., Eds.; Lund University Cognitive Studies: Lund, Sweden, 2001; Volume 86, pp. 107–113. [Google Scholar]
- Wintermute, S. Integrating Action and Reasoning through Simulation. In Proceedings of the 2nd Conference on Artificial General Intelligence, Arlington, VA, USA, 6–9 March 2019; pp. 102–107. [Google Scholar] [CrossRef]
- Davis, E.; Marcus, G. The scope and limits of simulation in automated reasoning. Artif. Intell. 2016, 233, 60–72. [Google Scholar] [CrossRef]
- Mania, P.; Kenfack, F.K.; Neumann, M.; Beetz, M. Imagination-enabled robot perception. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
- Kenghagho, F.; Neumann, M.; Mania, P.; Tan, T.; Siddiky, F.; Weller, R.; Zachmann, G.; Beetz, M. NaivPhys4RP-Towards Human-like Robot Perception “Physical Reasoning based on Embodied Probabilistic Simulation”. In Proceedings of the 2022 IEEE-RAS 21st International Conference On Humanoid Robots (Humanoids), Ginowan, Japan, 28–30 November 2022; pp. 815–822. [Google Scholar]
- Sallami, Y.; Lemaignan, S.; Clodic, A.; Alami, R. Simulation-based physics reasoning for consistent scene estimation in an HRI context. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Venetian Macao, Macau, 3–8 November 2019; pp. 7834–7841. [Google Scholar]
- Barnech, G.T.; Tejera, G.; Valle-Lisboa, J.; Núñez, P.; Bachiller, P.; Bustos, P. Initial Results with a Simulation Capable Robotics Cognitive Architecture. In ROBOT2022: Fifth Iberian Robotics Conference, Zaragoza, Spain, 23–25 November 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 612–623. [Google Scholar]
- Bustos, P.; Manso-Argüelles, L.; Bandera, A.; Bandera, J.P.; García-Varea, I.; Martínez-Gómez, J. CORTEX: A new Cognitive Architecture for Social Robots. In Proceedings of the EUCognition Meeting—Cognitive Robot Architectures, Viena, Austria, 8–9 December 2016. [Google Scholar]
- Bustos, P.; Manso, L.; Bandera, A.; Bandera, J.; García-Varea, I.; Martín-Gomez, J. The cortex cognitive robotics architecture: Use cases. Cogn. Syst. Res. 2019, 55, 107–123. [Google Scholar] [CrossRef]
- Laird, J.E.; Lebiere, C.; Rosenbloom, P.S. A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Mag. 2017, 38, 13–26. [Google Scholar] [CrossRef]
- Garcia, J.C. G: A Low-Latency, Shared-Graph for Robotics Cognitive Architectures. Master’s Thesis, University of Extremadura, Caceres, Spain, 2021. [Google Scholar]
- Núñez, P.; García, J.C.; Bustos, P.; Bsahciller, P. Towards the design of efficient and versatile cognitive robotic architecture based on distributed, low-latency working memory. In Proceedings of the International Conference in Advanced Robotics and Competitions, Santa Maria da Feira, Portugal, 29–30 April 2022. [Google Scholar]
- Bustos, P.G.; Cintas, J.C.; Martinena, R.; Bachiller, E.; Núñez, P.; Bandera, P.A. DSRd: A Proposal for a Low-Latency, Distributed Working Memory for CORTEX. In Advances in Physical Agents II; Bergasa, L.M., Ocaña, M., Barea, R., López-Guillén, E., Revenga, P., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany; pp. 109–122. ISBN 978-3-030-62579-5.
- Beetz, M.; Beßler, D.; Haidu, A.; Pomarlan, M.; Bozcuoğlu, A.K.; Bartels, G. Know rob 2.0—A 2nd generation knowledge processing framework for cognition-enabled robotic agents. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018. [Google Scholar]
- Olson, E. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011. [Google Scholar] [CrossRef]
- Rohmer, E.; Singh, S.P.N.; Freese, M. CoppeliaSim (formerly V-REP): A Versatile and Scalable Robot Simulation Framework. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013. [Google Scholar]
- Mosenlechner, L.; Beetz, M. Fast temporal projection using accurate physics-based geometric reasoning. In Proceedings of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1821–1827. [Google Scholar]
- Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.L.; Grundmann, M. Mediapipe hands: On-device real-time hand tracking. arXiv 2020, arXiv:2006.10214. [Google Scholar]
- Smith, K.; Mei, L.; Yao, S.; Wu, J.; Spelke, E.; Tenenbaum, J.; Ullman, T. Modeling expectation violation in intuitive physics with coarse probabilistic object representations. Adv. Neural Inf. Process. Syst. 2019, 32, 8985–8995. [Google Scholar]
- Ghallab, M.; Howe, A.; Knoblock, C.; Mcdermott, D.; Ram, A.; Veloso, M.; Weld, D.; Wilkins, D. PDDL—The Planning Domain Definition Language. 1998. Available online: https://www.researchgate.net/profile/Craig-Knoblock/publication/2278933_PDDL_-_The_Planning_Domain_Definition_Language/links/0912f50c0c99385e19000000/PDDL-The-Planning-Domain-Definition-Language.pdf (accessed on 1 February 2024).
- Helmert, M. The fast downward planning system. J. Artif. Intell. Res. 2006, 26, 191–246. [Google Scholar] [CrossRef]
- Friston, K.; FitzGerald, T.; Rigoli, F.; Schwartenbeck, P.; Pezzulo, G. Active inference: A process theory. Neural Comput. 2017, 29, 1–49. [Google Scholar] [CrossRef] [PubMed]
- Taniguchi, T.; Murata, S.; Suzuki, M.; Ognibene, D.; Lanillos, P.; Ugur, E.; Jamone, L.; Nakamura, T.; Ciria, A.; Lara, B.; et al. World models and predictive coding for cognitive and developmental robotics: Frontiers and challenges. Adv. Robot. 2023, 37, 780–806. [Google Scholar] [CrossRef]
- Bass, I.; Smith, K.A.; Bonawitz, E.; Ullman, T.D. Partial mental simulation explains fallacies in physical reasoning. Cogn. Neuropsychol. 2022, 38, 413–424. [Google Scholar] [CrossRef] [PubMed]
- Ellis, K.; Wong, C.; Nye, M.; Sable-Meyer, M.; Cary, L.; Morales, L.; Hewitt, L.; Solar-Lezama, A.; Tenenbaum, J.B. DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning. arXiv 2020, arXiv:2006.08381. [Google Scholar] [CrossRef] [PubMed]
- Buyukgoz, S.; Grosinger, J.; Chetouani, M.; Saffiotti, A. Two ways to make your robot proactive: Reasoning about human intentions or reasoning about possible futures. Front. Robot. AI 2022, 9, 929267. [Google Scholar] [CrossRef] [PubMed]
- Shekhar, S.; Favier, A.; Alami, R.; Croitoru, M. A Knowledge Rich Task Planning Framework for Human-Robot Collaboration. In Proceedings of the International Conference On Innovative Techniques And Applications Of Artificial Intelligence, Cambridge, UK, 15–17 December 2003; pp. 259–265. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Trinidad Barnech, G.; Tejera, G.; Valle-Lisboa, J.; Núñez, P.; Bachiller, P.; Bustos, P. Enhancing Robotic Perception through Synchronized Simulation and Physical Common-Sense Reasoning. Sensors 2024, 24, 2249. https://doi.org/10.3390/s24072249
Trinidad Barnech G, Tejera G, Valle-Lisboa J, Núñez P, Bachiller P, Bustos P. Enhancing Robotic Perception through Synchronized Simulation and Physical Common-Sense Reasoning. Sensors. 2024; 24(7):2249. https://doi.org/10.3390/s24072249
Chicago/Turabian StyleTrinidad Barnech, Guillermo, Gonzalo Tejera, Juan Valle-Lisboa, Pedro Núñez, Pilar Bachiller, and Pablo Bustos. 2024. "Enhancing Robotic Perception through Synchronized Simulation and Physical Common-Sense Reasoning" Sensors 24, no. 7: 2249. https://doi.org/10.3390/s24072249
APA StyleTrinidad Barnech, G., Tejera, G., Valle-Lisboa, J., Núñez, P., Bachiller, P., & Bustos, P. (2024). Enhancing Robotic Perception through Synchronized Simulation and Physical Common-Sense Reasoning. Sensors, 24(7), 2249. https://doi.org/10.3390/s24072249