
Bridging the Sim-to-Real Gap in Collective Robotic Construction:
A Mixed Reality Reinforcement Learning Approach
15.10.2024 - Stuttgart, Germany
Integrative Technologies and Architectural Design Research (ITECH)
Master of Science
Thesis Project (Autonomous mobile robots)
ABSTRACT

The transition of collective robotic construction from controlled laboratory settings to real-world applications faces a significant barrier: limited autonomy. Advances in learning-based approaches offer promising potential to allow robots to autonomously discover optimal construction strategies through cost-effective, scalable, and safe simulation training. Nonetheless, simulation environments cannot entirely capture the complexities of real-world conditions, leading to the phenomenon known as the "reality gap." To address these limitations, this research proposes a mixed-reality reinforcement learning workflow for augmenting simulation-based training with real-world observations, action executions, and environment state changes. To this end, a sim-real-sim feedback loop is established between the Unity training environment (ML-Agents Toolkit), the Robot Operating System, and the OptiTrack motion capture system. The methodology is demonstrated through a case study on the assembly of a bending-active structure by a team of Roaming Autonomous Distributed robots. The agents are trained to execute the isolated task of manipulating a rod into a target position, first in a simulation-only environment for 25,000 episodes, followed by 5,000 episodes in a mixed-reality training setup. In this setup, robots operate in the physical world, with certain environmental aspects, such as spatial constraints and objectives, remaining virtual. Real-time tracked and monitored data provide observations that better reflect the real-world states of the robots and the rod. The effectiveness of the mixed-reality training is assessed by comparing its deployment performance to that of parallel simulation-only training with the same total number of episodes. The results obtained from 200 runs indicate that the mixed-reality-trained policy significantly outperforms its counterpart across several key metrics, including task completion time, success rate, and system stability. Notably, there is a substantial reduction in the number of immobilisation and rod drop incidents. The findings confirm that the agents exhibit the emergence of error-recovery behaviours when confronted with previously unseen environmental states. Ultimately, the proposed workflow leverages the advantages of simulation training while incorporating essential real-world variability to diminish the reality gap and increase the feasibility of deploying CRC systems in on-site construction tasks.
Team Collaboration: Niki Kentroti, Rabih Koussa
#Linux, #ROS2, #UNITY, #C++, #C#, #Python, #OptiTrack, #DigitalTwin, #Reinforcement Learning, #TensorFlow

Thesis Introduction

Case Study
To demonstrate and highlight the potential of the proposed methodology to bridge the reality gap in the context of CRC, a case study, focusing on the collaborative aspect of an assembly process is developed. As such, the methodology is envisioned to be implemented into a homogeneous and autonomous mobile robot system that learns to deploy and assemble a temporary, bending-active structure. The collaboration of two or more agents is required to achieve the transportation, manipulation (bending), and positioning of the material. In a parallel assembly process with multiple agents, bending-active structures present dynamically changing environments, leading to uncertainties.

Workflow
A Sim-Real-Sim feedback loop that integrates simulation, real-world execution, and reinforcement learning to enable autonomous robotic control. In the simulation phase, Unity's environment uses articulated body physics to train reinforcement learning agents, which generate motor rotation commands transmitted to the robot via ROS2. The robots, differential-drive systems with Dynamixel motors, collaboratively manipulate a glass fiber rod in real-world tasks, while OptiTrack captures their positions and orientations, providing real-time feedback. This data is looped back into the simulation to refine the virtual model, minimizing the "reality gap" and improving multi-agent collaboration in dynamic environments.

Mixed-Reality Reinforcement Learning Framework
To design collaborative robots tasked with manipulating a bending-active structure in a dynamic environment. The state observations are collected from the real world, including parameters such as the relative position, angle, and distance to other agents, rod twist, angular and linear velocities, wheel velocities, and environmental constraints like the position to the target, angle to the target point, boundary walls, and restricted areas. These real-world states are used to inform the reinforcement learning process, which is conducted in a virtual environment. In the virtual simulation, agents are trained using a reward system that includes positive incentives for coordinated rotation, alignment, proximity to the target, efficient movement, and coordinated speed, as well as penalties for wall proximity, rod twist, and entering restricted areas. This mixed-reality approach enables the reinforcement learning framework to leverage the efficiency of virtual training while grounding the learning process in real-world observations and actions, ensuring precise coordination and adaptability for bending-active structure assembly.

Experiments Overview
The process begins with a foundational simulation training phase of 25 million steps, where agents learn to manipulate a rod and collaboratively reach target positions in a virtual environment. After this initial phase, the workflow splits into two streams: one continues training in simulation-only for an additional 5 million steps, while the other transitions to a mixed-reality training phase for the same duration, integrating real-world observations and the environment to address the reality gap. Following these training phases, the policies from both streams undergo a deployment phase, during which their performance is evaluated in real-world conditions. The method allows for the integration of a curriculum training approach.


A.


B.
Simulation Results
The first graph shows the "Cumulative Reward" over 30 million training steps, with a general upward trend indicating progressive learning. High fluctuations in reward values reflect continuous strategy exploration, while more consistent improvement after 10 million steps suggests agents overcoming early barriers. The second graph tracks "Targets Reached" and "Continuous Targets Reached," showing steady growth and a sharper rise in the later stages, highlighting improved coordination and accuracy. The side panels provide videos of the robots' progress: Panel A (10 million steps) shows intermediate coordination, while Panel B (30 million steps) demonstrates significant improvement in collaborative task execution. These results emphasize the agents' evolution from exploration to advanced collaboration.



Simulation-Only Training and Mixed-Reality Training Results Comparison
This diagram compares simulation-only and mixed-reality (MR) training for collaborative robots over 30 million steps, alongside deployment videos illustrating real-world performance. The top graph shows that while simulation-only training steadily improves, MR training, starting at 25 million steps, quickly adapts simulation-only performance by the 30 million-step mark. The videos demonstrate deployment results. On the left, the simulation-only stream shows a significant sim-to-real gap, with real robots (black) failing to lift the rod properly, highlighting poor alignment with the simulated movements (red). On the right, MR-trained robots adapt effectively, identifying targets and overcoming challenges like lifting a dropped rod, showcasing MR training’s success in bridging the reality gap and enabling real-world autonomy.

2m 47s
1m 40s
33s
54s
Visual Observations of Training Behaviour in Mixed Reality
The results validate that the Mixed Reality (MR) stream outperforms the simulation-only stream during deployment, particularly in demonstrating the emergence of error-recovery behaviours in response to previously unseen environmental states. Initially, the agents experienced a performance drop due to the transition to the physical domain, where the complexities and uncertainties of real-world interactions presented new challenges. However, as training progressed, the agents developed adaptive strategies to lift the rod when it was dropped, a crucial ability for reaching designated targets and navigating efficiently within the environment.




Furthermore, two additional observations were made regarding the agents’ recovery mechanisms: first, when one of their wheels became immobilized on an uneven surface, and second, when the robot was flipped by the twisting rod. Initially, the agents experienced a temporary loss of mobility but demonstrated the ability to self-correct, ultimately regaining movement after being stuck for an extended period.