I want to see if it is possible to analyse openly available images (from google, flickr, etc.), and conclude something from them (the same way one might do this from census data but with images as data).

In my opinion online sources offer a wealth of data, and can easily be grouped by subject (for example: by searching for a specific term on google: "my house"). And i think averaging these images could tell us something about the way most people tend to take pictures of certain subjects (our pets, our family members, etc.). That in turn could possibly tell us something about the way we perceive those subjects, and hopefully even more...

What i am going to do is collect as much images as possible about several readily available, simple subjects (i.e. "house", "portrait", etc.). Resize them so they have equal sizes, and calculate the average of those images pixel by pixel (mean, median or mode, i am still deciding). Ideally i would also try to align the subjects (so that for the keyword "house", all the houses are directly on top of eachother, and the same size).

I am very excited about the possibilities, but am curious to know what the experts think. What are the possible pitfalls in doing this? Does this idea make sense to you? Any suggestions or critique?

@whuber, thanks for the link, but i already have a pretty good idea about the pattern/image recognition part of what i'm trying to do. I also want to expand beyond just using faces. I'm more interested in you thoughts on what i'm trying to do (instead of how to do it). Do you think anything sensible can be concluded from averaging images (given good placement, resizing and a decent sample size)? What do you think is something to watch out for, etc.
–
user9318Feb 20 '12 at 22:02

1

In general, approaches to tackling your question encompass many topics discussed in this forum so yes, very exciting, but covering quite a broad sprectrum. Image registration (or alignment as you call it) is a difficult problem to solve properly in its own right!
–
martinoFeb 20 '12 at 22:39

2 Answers
2

The idea is nice, but the main problem with averaging is that it works (i.e. removes noise so that you get the essence) only if you average the very similar representation of an exactly the same entity. To get it clear, imagine that you perform your process on a watermelon -- for a human, the word "watermelon" works for a full watermelon as well as for a halved watermelon or even a slice of watermelon, but you won't (even in an idealized world free of practical obstacles) find an algorithm which would manage to align them, and in return you'll get a sphere of a hue somewhere between brown and yellow.

Things get worse with less obvious terms -- for instance "house" actually means more less "a building which is/might be used by a small, preferably related group of people for living". This way the act of recognizing it on a picture involves a huge dose of imagination and may greatly depend on culture and environment -- this way you'll end up aligning igloo, mediterranean villa and yurt, which is again obviously impossible.

Thus, I think you should rather think of clustering pictures filed under certain subjects and analyzing centroids.

The topic is ultra broad, but some directions for start -- extracting interesting object from a picture (to scale it) is called ROI detection; for aligning you should start from keypoint recognition; similarity/classification can be usually done without explicit alignment via transformation-independent image features -- for instance PHOW/G or SIFT to name the basic ones.

Following on from my comment above, to get an idea of the broad number of topics that solutions to this question can touch on you just need to consider a simple example where you have images of cars and images of houses and you want to be able to identify existing and new images.

Let’s say images are not the same size. Then you will want to change the image size to some common size. New and existing pixels will have to be predicted from existing pixel values. You may use anything from a simple linear interpolation to a complex nonlinear transformation to determine new pixel values.

To determine if an image is a car, for example, or a house you will need to find certain characteristics of each that are maybe unique (unlikely when you scale the problem up to include more types of images) or that best separate the images. Now you are in the realm of classification. You might start off with a range of typical images, find values for a set of defined characteristics and then use machine learning to train a solution. Again, many choices exist.

When dealing with image the term features is used to describe actual regions on the image. Characteristics better describes image based measurements or statistics

Finding a set of features that best distinguish image types is another interesting problem. Given a super set of features, you may want to reduce the set to a more manageable size (feature selection) and use this reduced set to train and build a classifier. This can take you into areas such as bootstrapping or resampling or even evolutionary algorithms simply to find a best set that is robust from a number of competing sets of features.

Taking a step back, to find any features you will maybe have to investigate image segmentation or feature detection. You will have to detect corresponding features on all images and possible use a feature matching technique to match corresponding features across images. This in turn might act as a starting point for image alignment. Each area mentioned is a full of challenging problems and therefore very rewarding.

In short, your approach might be to

1/ find a set of interesting features or characteristics
2/ use these to align your images
3/ find a subset of interesting features that are robust when used for image classification
4/ Develop a learning model that can classify your images with high sensitivity and specificity.
5/ Apply the above steps to new images to classify as, in this example, a house or a car

Of course, this is but one approach or many possible but is does highlight many of the issues you will face when trying to solve problems such as that described in the question.