Fusion of Camera and LiDAR Data for Large Scale Semantic Mapping|TadoLab

Index

Fusion of Camera and Lidar Data for Large Scale Semantic Mapping

Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan. To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.

Processing pipeline of our method

In our approach we fuse depth data acquired by a Lidar sensor with color and semantic information acquired by RGB-cameras. We employ our autonomous electric vehicle (AEV) that is equipped with a 360° surround view camera and a 360° Lidar. The surround-view camera is made up of 5 individual cameras. We segment the pictures using the DeepLabV3+ algorithm to predict the semantic labels. We chose DeepLabV3+ as it is one of the best scoring algorithms on the Cityscapes dataset. We then project each point of the Lidar scan onto the time-wise closest camera picture. As the sensors are not synchronized, a time difference of up to 50ms can occur that has to be accounted for. From the projection we extract the color and semantic information for each scan point. The Lidar scans are simultaneously used for simultaneous localization and mapping (SLAM). We employed the LOAM algorithm for this purpose. To decrease the size of the point cloud and to have a more balanced distribution of points, we downsampled the point cloud using an octree to one point per voxel with a side-length of 10cm. In the downsampling process we employ a simple majority voting process for the semantic class information and calculate the mean value for the color information. To improve the semantic segmentation results, we reclassify the semantic label of each point. We compare a random forest and a deep-learning approach for this step. For the random forest a feature vector consisting of a mixture of features based on the 3D-shape measured by the Lidar and features based on the image and its semantic version were employed. For the deep-learning approach we extended the input of the network to include a forth channel for the predicted semantic label alongside the three channels for the RGB colors.

Results

Detection results for pedestrians (left figure) and vehicles (right figure) for a single trial in the inner city of Sendai City.

Detailed overview

A detailed version, explaining the algorithm in detail can be found in the paper presented at IEEE Intelligent Transportation Systems Conference 2019.

T. Westfechtel, K. Ohno, R. Bezerra, S. Kojima, S. Tadokoro "Fusion of Camera and Lidar Data for Large Scale Semantic Mapping" in IEEE Intelligent Transportation Systems Conference (ITSC), 2019

https://ieeexplore.ieee.org/document/8917107

Contact

Do you have any questions or comments? Please contact:

thomas@rm.is.tohoku.ac.jp

staff@rm.is.tohoku.ac.jp

ページ先頭へ戻る

Index