I am working on finding the distance between a camera and an object using image processing.

In many places, it's written that we have to use a stereo pair, but my project allows only one camera. Is it feasible to find distance to an object using just one camera? If so, what are the possible ways to do this?

This question came from our site for professional and enthusiast programmers.

$\begingroup$Can you move the camera? Moreover, can you assume the scene doesn't move?$\endgroup$
– Jan DvorakDec 31 '12 at 8:18

5

$\begingroup$Suggestion: close one eye, and try to judge the distances. Is it feasible?$\endgroup$
– Jan DvorakDec 31 '12 at 8:19

$\begingroup$ya i can move camera and scene is fixed.$\endgroup$
– swapnaDec 31 '12 at 8:20

1

$\begingroup$Then just take two snapshots at different times (with the camera moving sideways, preferably) and pretend they come from different cameras at the same time.$\endgroup$
– Jan DvorakDec 31 '12 at 8:22

2

$\begingroup$May I ask what this question has to do with MATLAB?$\endgroup$
– EitanTDec 31 '12 at 9:46

7 Answers
7

There are various ways to measure distance and 3D using a single camera.
These include:

Moving the camera and taking another image - this simulates a 2 camera system.

If you are imaging an object of a known size, then you can know the distance with some simple trigonometry.

You can project a known pattern from a known projector location to reconstruct 3D. This is structured light and e.g. how Kinect works. Obviously the image(s) with the pattern will appear different from the normal image, so multiple exposures might be neccesary if you also want the appearance image.

$\begingroup$Another way is when you are located on a flat plane, and you know the height of the camera and other intrinsic parameters. By using trigonometry, you can find the distance of the object (where it touches the floor)$\endgroup$
– Andrey RubshteinJan 1 '13 at 15:11

$\begingroup$thank you for your reply i am currently using trignometry.but if i move object then will stereo concept work?$\endgroup$
– swapnaJan 3 '13 at 8:40

You can take one snapshot, move the camera sideways (or orbit the scene) and take another snapshot. This becomes your stereo pair.

This won't work if the scene isn't fixed.

This will work to a great extent if the camera is moving much faster than the scene. This will also work somewhat (with large errors but mostly preserving the distance ordering) if the camera is moving at a comparable speed as the scene.

The advantage of using two cameras is that you can easily calculate the distance to an arbitrary object, without knowing anything about your surroundings..

If you only have one camera it becomes alot more difficult to calculate distance to a random point in the image..

If you for example close one of your eyes you will notice that it gets much more difficult to tell how far away something really is. But you still have a general idea of the distance because you can see objects that you recognize and know how big they really are based on experience and can thus guesstimate a distance to them based on how big they appear to you.

If you only have one camera this is something you have to duplicate for your robot. Now creating a database of all known objects and making your robot recognize them is probably a little to big of a task.. but you could for example make the robot recognize a bright orange ball and based on how many pixels the ball occupies in your image guess a fairly accurate distance to it..

If you can however move the camera and know the distance the cameras has moved you can pretend you have two cameras by taking multiple pictures at known distances and comparing them.

If you know the width of the object, and the transformation the camera applies, you could calculate it.

Observe the quick and dirty drawing below:

The weird thing on the left is the camera, the other thing on the right is the object you want to measure the width (or height, ...) of.

The camera applies a simple transformation on the scene: it shrinks (by estimation). There can be other transformations as well (wide-angle lenses and such), but I'll leave those out of the picture.

Now, to know how much the camera shrinks, you need a reference object. That's saying: you need an object in the picture, of which you know both the width and the distance to the camera. Let's name those w1 and d1. You can measure the width on the picture, that's w1'.

Now one more assumption: The objects should be close to each other. The objects transform more to the sides of the image, and we want them to transform in the same way.

Now, another quick and dirty drawing, zoomed in:

(E is inside the camera, DC is the picture, AB is an object in the scene)

The same can be done for the measured object (same naming conventions, but with number 2): w2/w2' = EB/EC. Therefore, w2 = w2' * (d2 + EC) / EC.

Now, if the two objects are close to each other on the picture, you can use the EC we calculated with the reference object for the measured object. It's an approximation, though. It is only exact when there's an common edge.

Stereo vision is the best way to get the depth map. However if you want to find coordinates of the end point and not the actual depth, monocular vision is possible. This is done by having knowledge of the environment or by using monocular cues like 'depth by defocus', variation in hue, saturation, obstruction in the image and so on. This might not provide the exact depth but it might give relative depth. From the relative depth, you will have to calculate the exact depth by using the known elements in the image( ex: say there is a human in the picture, you know the height of the human. From this, you will have to estimate the depth of the human i.e how far the human may be inside the image(i hope you get what I am saying here)). Prof. Andrew Ng and Ashutosh Saxena from Stanford University AI lab have worked on this idea. You may want to refer to those papers. They have fed a lot of depth maps and have modelled a system in such a manner that, given any image, a depth map can be obtained. I built a simpler version of the same idea for my helicopter.

$\begingroup$if you have a fixed scene like a basketball court where you know the size of objects and the placement of lines, can you construct a depth map with one camera?$\endgroup$
– CrashalotNov 1 '16 at 0:55

Take a look at this paper.
You can calculate lengths using a reference object in the image. If you don't have a reference object then I think it is not possible by the way the images are taken today. Cameras today have a point of view. If you could invent a camera that captures plane of view you can use it for getting distances but the image probably will not be discernible to human eyes as human eyes use a point of view as well. So for today you need at least a pair of calibrated cameras or a pair of images from the same camera with the camera moved a little bit. That's the reason we have 2 eyes instead of 1.

I realize this is an old question. But this is better spot for this answer. It is related to a question I answered here.

You can include a mirror in your field of vision which has a reflection of the object you are trying to measure the distance to. If you know the location of the mirror relative to the camera, finding the distance becomes a trigonometry problem.