Next problem: Finding intrinsic/extrinsic parameters from M

If we want to find the intrinsic parameters we can use QR decomposition in order to determine the intrinsic matrix.

To recover the intrinsic parameters of the rotation matrix:

Invert the first 3x3 columns of the 3x4 rotation matrix

Apply the QR decomposition to

The result of QR gives an orthonormal matrix Q, and the R matrix.

The orthonormal matrix (of - the camera to world roation) and

To summarize then, in order to calibrate and find the camera matrix parameters and and

First find a set of corresponding points and solve the 2Nx12 matrix’s null space to get M

SVD M and use a null space equation to find

Use QR decomposition with the first 3x3 matrix inverted in order to get the and matrices.

Real-World Calibration

Because lenses are not perfect and real cameras do not exactly follow the pinhole-camera model, there are other parameters that need to be taken into account

While we won’t go in depth on those in this course we still should know how other systems work. The Caltech calibration system which is included in the OpenCV distribution is available and uses checkerboard patterns in order to properly calibrate cameras account for lens distortions. It uses nonlinear optimizations in order to solve the problem.

Typically about 30 images are needed for proper calibration.

Lens distortions:

Lens distortions - radial, barrel, and pincushion distortions are all possible results of camera lenses and must be taken into account to properly calibrate a camera.

Simple distortion models:

3D Reconstruction

Ideally we want to reverse the typical image formation pipeline (see notes above) where we move from pixel coordinates to world coordinates. However when dividing the image coordinates by the final components we lose the information on the depth of the image.

Fully Calibrated Cameras

We can perform 3D reconstruction via triangulation if we have 2 images where we have the M matrix of each camera (Or the , , and . However because we’re looking to do 3D reconstruction we assume we don’t have access to the 3D points in space. So the M matrices must not have been obtained from the the DLT method explained in the calibration section above.

In fully calibrated construction we:

Multiply a pixel coordinate 1 by then multiply a corresponding pixel coordinate in the other image by

These results represent rays

Intersect rays to get a 3D point

HOWEVER, these rays may not always intersect perfectly

Take the midpoint of the location between the location in 3D space where the rays are closest.

Midpoint is the reconstructed 3D point

For stereo reconstruction of an x, y coordinate match, do the following

Given

Take and

Take the

Convert this into a 4x4 matrix

find

Next, solve the equation:

Convert into a Aq = B form

Invert to solve for a, b, and c

After obtaining a, b, and c, find the left half of the equation, from the step above, except divide c by 2 (midpoint)

Finally to obtain we must invert from earlier and multiply by the 3D coordinate point we just obtained.

Uncalibrated Reconstruction and Epipolar Geometry

Using epipolar geometry is a kind of “shortcut” to the full triangulation of fully calibrated cameras. It allows us to get the 3d point without having to find the 11+ parameter matrix that is required as seen in the previous method.

Epipolar Geometry

Terminology:

Epipolar plane: The 2D plane formed by the 2 camera centers and the world point at which a ray intersects from each camera.

Epipolar line: The intersection of the epipolar plane and the image plane.

Epipole: The point on the image where the epipolar line intersects the line of the epipolar plane coming from the other camera. This point is allowed to be at infinity if the two cameras are parallel.

Property: All epipolar lines go through the epipole

Using a fundamental matrix to perform reconstruction is called projective reconstructions. Using the essential matrix is reconstruction up to a scale factor

We know that a . However w.r.t the camera matrices the translation which is just the difference of the camera centers. Thus the we refer to from here on is

Epipolar Constraint

Given the equations above we can derive

Where

With a couple more steps…

The epipolar constraint

Computation of Essential Matrix

Given

Computing F (or E), the 8 point algorithm

Also for any set of corresponding points, the following holds:

If we multiply this out and label the matrix parameters as we get:

If we use up to 8 different points then we can calculate the parameters for F using the null space solution to the matrix in the form where A is the matrix

We can follow the same logic to calculate E given that the following is also true:

Fundamental Matrix Properties

The fundamental matrix is of rank 2.

corresponds to the epipolar line of

corresponds to the epipolar line of

Now for every point in the image, the epipolar constraint is satisfied, even at the epipole.

Thus , and we can take the SVD(F) to find the null space of F, giving us the epipole of the left image.

Partial Calibration

Get intrinsic matrices using corresponding world and image points via calibration

Get point correspondences between images

Estimate fundamental matrix via point correspondences

Obtain Essential matrix from fundamental

Get camera matrices from essential matrix

Triangulate points

Finding Camera Matrices from F

Right Camera Matrix:

Left Camera Matrix:

Where is the epipole and is the cross product of the epipole

Finding Camera Matrices from F

First define the following:

Factor where :

From this you must calculate the left and right matrices using all combinations of S and R, and the combinations that result in all triangulated points in front of the camera () will result in the correct reconstruction.

Where is the matrix from above. is the 3rd column of U from SVD of E or also

Image Rectification

Makes correspondence easier if images are parallel.

Given two parallel images which have known intrinsic parameters we can represent the essential matrix as:

So in other words, the goal of rectification is to find a homography, , which causes the images to ‘become’ parallel

We can use disparity to help calculate where

Because of this, correpsondence is easier because corresponding points lie on the same y-value pixels.

To calculate:

Trw=[Mextright;0001];Tlw=[Mextleft;0001];Twr=inv(Trw);% can be done using transposeTwl=inv(Tlw);% can be done using transposeTlr=Tlw*Twr;% Rotation from right to left coordinate frame Rlr=Tlr(1:3,1:3);% translationtlr=Tlr(1:3,4);% For rectification, we want the x axis of the new coordinate frame to be the vector connecting the% optical axes, so we can write the first column as:v1=tlr./norm(tlr);% The y axis should be perpendicular to the optical axis [0 0 1] and the x% axis, so take the cross productv2=[-v1(2)v1(1)0]';v2=v2./norm(v2);% the new optical axis or z axis, must be perpendicular to the x and y axisv3=cross(v1,v2);Rrectleft=[v1'; v2';v3';]';% Now we can write the rotation matrix by interpreting the columns% to rectify the points in the right camera, first multiply by Rrect% so that the relative orientation of the two coordinates is the same;% then Rlr will align the right coordinate points with the left coordinate% frame to make parallel cameras.

Interest Point Detection

Many algorithms rely on point correspondences.

How do we find them?

A few options - most widely used are SIFT (Scale Invariant Feature Transform) and SURF

SIFT

Uses a 16x16 grid where each pixel gradient is calculated in the x and y direction, then the hitogram of gradient angles is created and the descriptor is a 4x4 patch of weighted gradient orientation histograms.

For a 4x4 grid with histograms of 8 bins, the descriptor vector is 128x1 (4 * 4 * 8).

The vector is normalized to reduce effects of contrast or gain.

Efficiency is a large concern for SIFT as computing on every set of pixels can be time and resource consuming

Many feature matching algorithms will use RANSAC to find the best set of matches.

Deep Learning

Deep learning is all about gradient optimization. The overall goal is to always minimize some type of loss function.

The basic procedure is as follows:

Given some labeled training data , and the labels for the outputs

Compute a forward pass on the data to obtain outputs

Compute the difference

Update network weights to reduce loss

Repeat

These are all usually repeated through stochastic gradient descent (SGD)

The network is really just a composition of functions for each layer represented as something like

Weights can be calculated via an iterative process where

To calculate SGD on networks the weights are typically a function of the previous layers’ weights and outputs and the current layer’s inputs

We can use the Jacobian matrix to help us calculate the SGD for our networks. The jacobian matrix paramters are defined as the following:

So in order to calculate the backpropagation of the network we

Multiply the gradient of the input of the layer by the expected output, or the input of the layer beforehand.

Classifier Accuracy

Computational Appearance - Reflectance, Texture, and Light Fields

Reflectance is the existence of a BRDF function or Bidirectional Reflectance Distribution Function. Basically this means that the angle at which light reflects will change depending on the object.

Mathematically, the function will, given an input angle and an output angle, return the amount of reflected light for the two specific angles.

Work by prof. Dana has been done to entirely measure the reflectance at all viewing angles by using a parabolic mirror.

Motion Estimation

Given two images where there is movement (via camera, or world movement), we want to find out the function of that motion. For our purposes we assume that the motion is linear or affine.

That means the intensity of any given coordinate pair at a time is given as and the same intensity should be found at in the 2nd image.

These u and v are composed of linear equations such that and .

Overall this is called the brightness constancy assumption and we use it to solve for our u and v parameters

the equation for this is

If we do this for all points on the image we can use a least-squares estimate to obtain the best fit parameters for a, b, c, d, e, and f