ECE 5554 Final Project

SuperPixel Stereo for Outdoor Mapping

The problem this project is trying to solve is using superpixels for a stereo matching algorithm. The project I am currently funded working on in the Unmanned Systems Lab is a UAV UGV cooperation where a stereo camera on the aircraft maps obstacles for planning the route for the UGV. Right now we are using a CUDA accelerated Semi-Global Matching Algorithm [1]. While this runs relatively quickly (~3fps) and produces somewhat reliable results, this project looks at using superpixels to improve matching speed since we are mostly flying over simple large objects (cars, buildings, etc). The idea is that using superpixels large regions of similar texture and color will be grouped together and will have a similar disparity value. Thus we can calculate the disparity for this region as a whole instead of pixel by pixel like standard stereo algorithms. This simplification of the problem should lead to faster computation of the disparity.


To get some background for this project several papers were looked at. The first paper was the paper for the semi-global mapping algorithm we currently use on the system [1]. This paper provided some insight into a method that uses not a complete global matching but still uses a "semi-global" matching to help with smoothness. This paper also had some insight into how to parallelize a stereo algorithm since they implemented it on a GPU.

Stereo Algorithm Basics [1]

The next paper looked at was specifically about using normalized cross correlation for stereo matching [2]. The normalized cross-correlation uses cross-correlation to measure the similarity of regions in the image. The images are normalized before the cross-correlation calculation to remove the effect of gain and bias. The NCC equation is presented below.

The final paper looked at was "PMSC: PatchMatch-Based Superpixel Cut for Accurate Stereo Matching" [3]. This paper provided what the state of the art is for superpixel stereo. They used much more complex methods of label space and a-expansion graph cuts. This was useful for adding more complex methods to the algorithm and background into stereo before starting thi project.


Technical Approach

In our system we already have calibration and rectification handled so this project focuses on the process after we have a rectified image pair. The algorithm is presented in the pseudocode. First SLIC is used to decompose the left image into regions of super pixels. Next we iterate through each superpixel. For each superpixel, we get the square region around the pixel and the corresponding "scanrow" from the right image. The scan row is the horizontal strip of the right image corresponding to the superpixel in the left image. Similar to a scanline but multiple pixels in height. Then we get the normalized cross-correlation of the superpixel patch and the scanrow. This results in a vector that we take the argmax of to get the peak value corresponding to the best-correlated location in the right image. This pixel location is subtracted from the corresponding center of the superpixel to get the disparity.


Results and Analysis

To test the algorithm several datasets were used. In initial testing and validation, the Middlebury dataset was used. One of the results is presented below. The disparity image shows pretty good results with the disparity being captured for a majority of the image with only a few stray superpixels with a bad result. Most of these bad results happen where there are occlusions and the superpixel version of NCC does not perform as well as standard NCC on occlusions. This dataset was also used for speed benchmarking. A similar program was written to use the standard normalized cross correlation algorithm with a 3x3 window. Both programs were run on the same dataset and timed. The results showed that the superpixel algorithm was ~6 times faster than the standard NCC.

The second dataset tested was the KITTI stereo dataset. This dataset is images from a stereo system on a car with included ground truth from a LIDAR. Three results with the ground truth are shown below. The results on this dataset are pretty good for what this needs to produce for aerial mapping of large structures. You can see that it does not perfectly capture every disparity because it assigns the entire superpixel a disparity. While this might not produce as pretty or smooth of results this is sufficient for outdoor mapping of large obstacles. The KITTI results were run through their dispaity error program. On the train dataset results were and average error of 14.22% with an disparity error threshold of 10 pixels. This result appears to be sufficient for our needs but definetly leaves room for improvement.

Finally, the method was tested on data collected from our system. Our system consists of a 60cm baseline custom stereo system designed to operate around 20m in altitude. We were only able to test on one dataset since we had some calibration issues and a lot of our data has a bad stereo calibration which results in improperly rectified images. Thus we show one image pair of a car that we captured. You can see that the stereo algorithm picks up the car but is hard to visualize due to the large inaccurate disparities in the improperly rectified portions of the image. The potions are mostly seen on the left side of the disparity image. We have confirmed that this is a calibration issue as the disparities generated by our current stereo algorithm does not produce accurate disparities in these regions.

Overall this method performs pretty well but still leaves some things I would like to address. I would like to test it on our system when it is properly calibrated. I would also like to look at some smoothing methods such as using graph cuts.

KITTI Results with Groundtruth


Test Result on Middleburry Dataset:
Generated Disparity: Ground Truth:
Test Result on Data from Our System:
Segmented Image Generated Disparity

References

[1] D. Hernandez-Juarez, A. Chacón, A. Espinosa, D. Vázquez, J. Moure and A. López, "Embedded Real- time Stereo Estimation via Semi-global Matching on the GPU" Procedia Computer Science, vol. 80, pp. 143-153, 2016.

[2] A. Shetty, V. George, C. Nayak, R. Shetty, "Normalized Cross Correlation for Stereo Matching Under Varying Illumination" International Journal of Computer Technology, vol. 9, issue 21, pp. 39-42, 2016.

[3] L. Li, S. Zhang, X. Yu and L. Zhang, "PMSC: PatchMatch-Based Superpixel Cut for Accurate Stereo Matching," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 3, pp. 679- 692, 2018.