We can recognize the faces of our friends very quickly from just a snapshot. Within 150 milliseconds of being flashed a photo, brain signals respond differently to photos containing animals than photos with no animals. We can categorize scenes as “beach,” “forest,” or “city” when they are flashed for even shorter periods.

But we also get a great deal of information from the motion of people and animals. We can identify our friends and family members just from a point-light display of them walking. We can also detect the emotions of point-light faces, and even the species of point-light animals.

Fascinating as point-light displays are, however, we rarely see them in real life. Point-light displays suggest that motion gives us a great deal of information about the object we’re looking at, but we can’t be sure that real-world perception works the same way. A team led by Quoc Vuong has conducted a study to see if what we know about point-light displays transfers to real-world objects and scenes.

They constructed a set of composite movies like this one (QuickTime required):

In every clip, a machine was superimposed with either a walking human figure (like in this clip) or another machine. To make the task even more difficult, viewers were shown two movies simultaneously, for just two thirds of a second. Viewers had to determine if one of the two movies included a human. The relative visibility of the human figure was also varied.

Even more critically, half of the images they saw were animated, and half were still photos. Here are the results:

As you’d expect, the more visible the human, the higher the accuracy. But no matter the visibility level of the human figure, viewers were more accurate identifying humans when in the animated sequence compared to still photos.

One possible objection to these results is that the animated sequences are easier to process, whether or not they contain a human. Vuong’s team repeated the experiment, but instead of the completely still shots, they used movies of animated machines superimposed with still humans. Here are the results:

Viewers were better at identifying the still humans with animated machines, but they were still significantly better at spotting humans when both the machine and human were animated. These results also held when the experiment was repeated with upside-down movies, and with animals instead of machines, like in this movie:

So our impressive ability to identify point-light displays extends to more natural-looking movies. Vuong’s team believes that the upside-down experiment is an indicator that their results may apply to a range of different objects in motion, such as animals and even machines. If we can detect upside-down walking humans better than upside-down still humans, we’re probably also better at detecting walking mice, cats, or goats.

There’s a problem here that I think might be easily corrected: the animation probably contains more information (in the mathematical sense) than the still imagery. The easiest way to check this is simply to check the size of the MPG file that was used in the experiment. If there’s more information in the image, then we shouldn’t be surprised that the human is able to extract more information from it.

Of course, the problem here is that mathematical information content is not the same thing as human information content. A complex pattern of video snow may be rich in information, but the human eye will still see it as meaningless noise. The MPG compression algorithm is highly optimized, and that optimization process does take into account some of the visual primitives of the human visual system — but there are still lots of human visual primitives that we don’t yet understand. Tricky business here.

Good point. I’m sure there’s more information in the movies than the still shots, since the movies are composed of a bunch of still shots. I wonder if one could get around that problem by decreasing the resolution of the movies relative to the still shots.

I’m not sure file size is the best measure of how much information is contained in a movie, though — after all, different compression algorithms use different amounts of memory.

I think we do see in movies better because we are more able to detect “forms”(gestalt)than in the still image.
In a moving object, we do have more information that make me able to detect what “form” exactly it it, while in a still image the visual system has to “Guess” the construction

If you get a motion to the vase and make it to move upwards fro example, this form will dominate over the “faces” form, on the contrary if you get a motion to the faces for example make someone laugh, the “faces” form will dominate.

thats how i see it,

PS: I am still a student an considered “novice” in comparison to people here, correct me please is i have something wrong.