Abstract
Recent years have witnessed great advancement in visual artificial intelligence (AI) research based on deep learning. To take advantage of deep learning, we need to collect a large amount of data in various environments and conditions. However, collecting such data is a time-consuming and labor-intensive task. Apart from
... read more
that, developing and testing visual AI algorithms for robots are expensive and in some cases dangerous processes in the real world. To address these challenges, in this thesis we investigate algorithms to design a high-quality simulator for mobile robots. We aim to narrow the gap between simulation and reality, generate infinitely many photo-realistic color-and-depth image pairs from arbitrary locations and allow transferring algorithms that are developed and tested in simulation to physical platforms without domain constraints.
To achieve our goals, we design a view synthesis module used for our simulator to synthesize free-viewpoint photo-realistic color-and-depth image pairs. Our approach combines depth refinement, adaptive view selection and layered 3D warping to lower the rendering complexity and improve the quality of synthesized images. We also design controller, recorder, and visualizer modules for our simulator. These modules are designed to work together, providing a variety of data including real-time camera poses, synthesized color-and-depth image pairs, trajectories of the robot for training robotic tasks.
Based on our simulator, we build a 3D dataset for benchmarking 6D object pose estimation which pays an important role in robotic grasping and manipulation research. The dataset consists of different objects that cover a variety of shapes, rigidity, sizes, weight and textures. For our simulator can seamlessly integrate robots with virtual scenes, we generate a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses for training data-driven pose estimation approaches. Our dataset is freely distributed to research groups worldwide by the Shape Retrieval Challenge (SHREC) benchmark on 6D pose estimation.
We conduct a variety of experiments to investigate the performance of different pose estimation approaches proposed from our benchmark using different evaluation metrics. We learn important lessons from the current pose estimation algorithms. This gives insight into where researchers' attention should be paid to make progress on pose estimation. Apart from that, we propose a novel approach to further improve the performance of 6D object pose estimation by effectively computing hidden representations from color and depth images, and then fusing them properly with a graph attention network which fully exploits the relationship between visual and geometric features.
Overall, we propose a 3D photo-realistic virtual environment simulator to develop vision-based algorithms for AI research. Experiments demonstrate our simulator narrows the reality gap between the virtual environment and the real scene. Thus, computer vision-based algorithms including depth estimation, object recognition and 6D object pose estimation developed in simulation can be transferred to the real world without domain adaption.
show less