VEDECOM is very pleased to announce you the defense of the thesis of Li YU

VEDECOM is very pleased to announce you the defense of the thesis of Li YU entitled “Absolute localization by mono-camera for a vehicle in urban environment using Street View.” on Friday 6 april 2018 at Mines ParisTech School.

Patrick RIVES, INRIA Sophia Antipolis (Rapporteur)
Paul CHECCHIN, Institut Pascal Université Clermont Auvergne (Rapporteur)
Mme Samia BOUCHAFA, Université d’Évry-Val-d’Essonne (Examinateur)
Fabien MOUTARDE, MINES ParisTech (Directeur de thèse)
Cyril JOLY, MINES ParisTech (Examinateur)
Guillaume BRESSON, Institut VEDECOM (Examinateur)

In a work made at Centre de Robotique and Institut VEDECOM, we studied robust visual urban localization systems for self-driving cars. Obtaining an exact pose from a monocular camera is difficult and cannot be applied to the current autonomous cars. Rather than using approaches like Global Navigation Satellite Systems, Simultaneous Localization And Mapping, and data fusion techniques, we mainly focused on fully leveraging Geographical Information Systems (GIS) to achieve a low-cost, robust, accurate and global urban localization requiring no prior passage of an equipped vehicle and based on a single camera.

Our first task was to design a robotic accessible online database from a dense public GIS, namely Google Maps, which has the advantage to propose a worldwide coverage. We make a compact topometric representation for the dynamic urban environment by extracting four useful data from the GIS, including topologies, geo-coordinates, panoramic Street Views, and associated depth maps. We proposed two localization methods to exploit the GIS: one is a handcrafted features based computer vision approach, the other is a convolutional neural network (convnet) based learning technique.

In computer vision, extracting handcrafted features is a popular way to solve the image based positioning. We take advantage of the abundant sources from Google Maps and benefit from the topo-metric online data structure to build a coarse-to-fine positioning, namely a topological place recognition process and then a metric pose estimation by a graph optimization. The only input of this approach is an image sequence from a monocular camera and the database constructed from Google Maps. Moreover, it is not necessary to establish frame to frame correspondences, nor odometry estimates. The method is tested on an urban environment and demonstrates both sub-meter accuracy and robustness to viewpoint changes, illumination and occlusion. Sparse Street View locations produce a significant error in the metric pose estimation phase. Thus our former framework is refined by synthesizing more artificial Street Views to compensate the sparsity of original Street Views and improve the precision.

However, this method suffers from an important computational time. Since the GIS offers us a global scale geotagged database, it motivates us to regress global localizations from convnet features in an end-to-end manner. The previously constructed online database is still insufficient for a convnet training. We hereby augment the originally constructed database by a thousand factor and take advantage of the transfer learning method to make our convnet regressor converge and have a good performance. In our test, the regressor can also give a global localization of an input camera image in real time.

Articles récents :