I'll have a few pictures that show the same scene, but only a few things move around a little bit. Now I don't have an information about any size in real life but I'd like to extract a scaling factor from these scenario.

I know when a 3D scene (with X,Y,Z coordinates) is captured in a 2D picture (with X/Z,Y/Z coordinates), it ain't easy to calculate the depth of each object/pixel but becomes easier if one size is given (e.g. diameter of a ball).

But is there still a way to determine the real size of the objects in the scene, without knowing a scaling factor ?

Thanks for any advises

[Update]:

Imagine a Ball is thrown into the scene. Since I know gravitation and the time difference between each pixel, could this be a way too? I mean the horizontal and vertical trajectory are independend from each other, that's why I could retrieve the way in direction of the gravitation axis. But the probleme I see here, is that when the ball hits the ground is then more upwards in the picture, the more he gets in the back - I believe this makes it hard to distingush the axis from each other again, and so to tell, which direction is now "to the ground". Or am I going wrong here?

$\begingroup$Scene reconstruction is a hard task. Search for "volumetric scene reconstruction" and you will find research on this topic, although it belongs more into computer science/computer graphics than into physics.$\endgroup$
– CuriousOneJun 25 '15 at 3:15

1 Answer
1

Generally and fundamentally, the answer to your question is no. Photography / 2D imaging is the prototypical example of the mathematical notion of projection, which, by its definition, destroys information by extracting "class" information from sets of objects: forgetting about fine differences and reporting only the class. Here, for example, if $z$ is the direction away from the plane of the photograph, a photograph can be roughly described as registering of the $(x, y)$ co-ordinates of each of the objects in the photo, but forgetting about the $z$. Two bright objects at a point $(x,y)$ will yield the same pixel, regardless of their $z$ co-ordinate.

Sometimes you can use further information to extract the $z$. Astronomers must do this all the time through the use of standard candles to infer a the distance to an object of known brightness from its apparent brightness, use of redshift and so forth. See the Cosmic Distance Ladder Wikipedia Page.

Othertimes the photographic method takes specific steps to encode the $z$ information in the 2D representation, and generally one must make an assumption about the kind of object being photographed: commonly that $z$ is a continuous function of $(x,\,y)$, such as when you're looking down from an aircraft onto the land's surface. For example, Structured 3D Light Scanning casts bright, straight fringes onto the object of known spacing and angle, the object's deviation from flatness warps the otherwise straight fringe and the $z$ co-ordinate can be calculated from this essentially triangulation data. Photogrammetry is somewhat similar in principle but very different in technique: it uses stereo photography and makes distance inference by triangulation with the different $(x,y)$ co-ordinates of the same object in two different photographs taken from different points.

Further Comments

User Floris makes the following comment:

Don't forget that focus can provide third dimension. Not very good unless the object is close and the depth of focus shallow (low f number). But then it works...

This is true, but telling whether something is in focus can be harder than you might think. There needs to be fine, known structure in the image for a definitive test, especially if you want the method to be an automated image processing procedure. And the deduction of the z information for everything in the image can only be done with a 3D stack of photos, which are then run through a point spread function deconvolution algorithm (this is the idea of "deconvolution microscopy". The use of coherent light, interferometry and phase reconstruction algorithms is another method to map surfaces with high precision, but this is limited to objects whose $z$ co-ordinate varies by fewer than about 100 wavelengths or so. The phase camera is a method in between straight interferometry and 3D stack deconvolution.

$\begingroup$Don't forget that focus can provide third dimension. Not very good unless the object is close and the depth of focus shallow (low f number). But then it works...$\endgroup$
– FlorisJun 25 '15 at 4:57

$\begingroup$@Floris you mean if an object is exactly in focus then it's distance can be calculated with the help of the pinehole camera and intercept theorem (By the values 1/f, distance from foto chip to the camera shutter, and the photo chip size) ?$\endgroup$
– user3085931Jun 25 '15 at 7:57

$\begingroup$Well I could do an edge detection of a given object (let's say a ball). The thinner the edges become (picture after picture) the more this ball enclosures the focus point - doesn't it ?$\endgroup$
– user3085931Jun 25 '15 at 10:23

$\begingroup$@user3085931 yes - if you are looking at an object with well defined edge you can indeed acquire shots at multiple focal settings; plotting a measure of edge "sharpness " as a function of this will allow you to find the approximated distance. You need to think about what to plot for the best result - I suspect there is some asymmetry that may be resolved by plotting a reciprocal.$\endgroup$
– FlorisJun 25 '15 at 11:41

1

$\begingroup$@user3085931 Your idea sounds very like deconvolution microscopy. Have a look at the Leica page I linked to for further info.$\endgroup$
– Selene RoutleyJun 25 '15 at 22:50