Download Stereo vision problem The stereo problem asks given a stereo image pair, such as the one below, how can we recover the depth information. Left image Right image The images are taken at slightly different view, similar to our eyes. We know from the parallax effect that objects closer to us will appear to move quicker than those further away.

The same idea applies here. We expect the pixels on the statue to have a larger disparity than those in the background. The Tsukuba stereo pairs have been rectified such that each pixel row in the left image is perfectly corresponds to the right.

Or in multi-view geometry speak, the scan lines are the epipolar lines. What this means is that a pixel in the left image has a matching correspondence in the right image somewhere along the same row assuming it is not occluded.

This greatly simplifies the problem because pixel matching just becomes a 1D horizontal line search. This is probably the simplest stereo vision setup you can work with. Naive attempt at recovering disparity Having explained the stereo problem, lets attempt to recover the disparity map using simple block matching with the following parameters: Kind of ugly really.

We can make out the statue, lamp, and maybe some of the background, with pixels closer to us being brighter. Compare this with the ground truth disparity map.

Ground truth disparity map We could improve the results with some post filtering …. Lets choose the fancy route! We expect pixels near each other to have similar disparity, unless there is a genuine boundary.

This is where MRF are useful. MRF are undirected graphical models that can encode spatial dependencies. Like all graphical models, they consist of nodes and links. However, unlike some graphical model eg.

The pink nodes are the hidden variables, which represents the disparity values we are trying to find. The hidden variable values are more generally referred to as labels. The links between each node represents a dependency.

The beauty of this simple assumption is that it allows us to solve for the hidden variables in a reasonably efficient manner. MRF formulation We can formulate the stereo problem in terms of the MRF as the following energy function The variables Y and X are the observed and hidden node respectively, i is the pixel index, j are the neighbouring nodes of node.

Refer to the MRF diagram above. The energy function basically sums up all the cost at each link given an image Y and some labeling X. The aim is to find a labeling for X disparity map for stereo that produces the lowest energy.

The energy function contains two functions that we will now look at, DataCost and SmoothnessCost. This means we want a low cost for good matches and high value otherwise. An obvious choice is the sum of absolute difference mentioned earlier. The pseudo code is: Below is a table showing some commonly used cost functions.

Also known as the Potts model. The Potts model is a binary penalising function with a single tunable variable. This value controls how much smoothing is applied. The linear and quadratic models have an extra parameter K. K is a truncation value that caps the maximum penalty. Choosing a suitable DataCost and Smoothness function as well as the parameter seems like a black art, at least to me.

My guess is through experimentation.

