Saturday, May 29, 2010

While I have been coding some AI application I heard some mellow strains of a childish songstress coming from upstairs of the neighbours which they played repeatedly. It was sometimes hardly audible to catch the verses, but I managed to distinguish several characteristic phrases to have a look over some great web search engine (I like it, since it puts some of my codeproject code articles to first 1-2 pages of the search results). The only significant phrase from the song I submitted to the engine was (to prevent undue advertisment), say "фиолетовая паста" (violet paste). I expected it would have given scores of make up advertisments, but contrariwise, just one link from the first page of the search results among cosmetic industry spam pointed to some music web forum with exactly that phrase from the rhymes. The next click of mouse and second search over that engine gave me music group verses of the song, guitar tabs and put me to you tube so I was listening that marvelous music clip.

It is astounding how a person with permanent internet access can in few seconds, after having heard the music, be presented with the verses, group information and video clip to listen to. The process is described as searching on the media data content. As current web searches uses textual information to return results, consider you will be able to give it as a search query either audio, video or image sample the same way you submit your textual requests. Just as the computer was listening to some music it was able to present you the same information.

The concept known as Connected Visual Computing (CVC) is actively pursued by Intel. The CVC concerns the media data processing e.g. when in the field of view of your mobile phone cam emerges some object (ant for example) you can see on the screen its identification obtained by mobile analized its image, that it is say Camponotus herculeanus, or when you see some caption in the street on unknown language, you may view it through your mobile cam and it will display at the same location in the street the same caption but in your native language (augmented reality (AR), 2D/3D overlays), or the above presented example by the search using audio content. The market promises immense propagation. That introduced market will for the very long period of time keep the audience consuming modern hardware and software.

Here I'd like to present the general idia on how the computer may be used to desribe the image analyzing its pixel content known as the Automatic Linguistic Indexing of Pictures (ALIP). The approach is general and is always assumed to extract some descriptive features from the data and to use some rules to attibute the content to some category.

If you're intrested in the immediate applications you may contact the supporting firm System7 of the content based image recognition (CBIR) part of the project.

Using the application

In my ALIP experiment I decided to annotate the simple natural image categories. There are 5 ANN classifiers in the project corresponding to:

Pictures that might contain animals

Pictures that might contain flowers

Pictures that might contain landscapes

Pictures that might contain sunsets

Others pictures that do not contain the above categories or simply unknown image type

You need to use unknown category along with the others you'd like to classify to. As otherwise AI classifier would be able to identify only e.g. animals, flowers, landscapes, sunsets with every image you give. But in real world there are other types of images that do not fall into either of the above presented categories, so you will need to meddle with AI classification thresholds which is rather cumbersome and awkward. But having additional unknown category AI classifier the results of the image identification will be as either one of the known image categories or simply unknown image type the computer can not identify using its petty knowledge.

I adore the image databases, they contain shots from all over the world really nice to observe. I've got about 20000 images for designers bought from a DVD shop. I've taken image samples from the animals, flowers, landscapes, sunsets image types and added all other image categories that do not come from the 4 ones to have unknown image type.

Now the usage of the program is simple enough. Just run the alip.exe and it will load all necessary AI classifiers files (in case of error you will have a message box and will not be able to use it). Then click the [...] button and select the directory that presumably contains some *.jpg files. You may use the ones supported in this demo under pics directory. All the found files will be added to the list box, then just click them to watch in the right panel and see the proposed category in the top left panel. In theory it should be able to comment the image as presented below.

Methodology

Due to the competing intrests with the former organizations and the current one I work for, I will not be able to describe in minute details the methodology and feature extraction methods. I would rather present the general trend and categories of the features used for description of images. As searching over internet for corresponding feature computation will reveal all the necessary papers with particular formulae.

There are some demos availabe online e.g. ALIPr. They use hidden markov models HMMs and wavelet features from the images. You may try the pictures from that article using their methods or vice versa my application with their pictures and compare the annotation results.

As the AI approach is general and assumes some reduction of the original data dimensionality using either features extraction or PCA transform or both, all that is needed is to collect some data, extract the features and train AI classifiers. If you understand my face detection articles you will be able to repeat the experiment:

After you converted your raw image data to the features, just train some AI classifiers to discriminate desired positive category from negative ones.

ALIP features

Generaly they are divided into:

Color features

Texture features

Shape features

The Color features are simply the original raw image data, histogram of the image channels, image profile. Texture features are the known edge extraction methods, wavelet transforms, image statistics (e.g. 1st order: mean, std, skew; 2nd order: contrast, correlation, entropy...). And Shape features tries to estimate the object shapes found in the images. Just have a look at wiki for CBIR.

Typically the original image color space RGB is transformed to alternative spaces as YCbCr, HSV, HSI, CIEXYZ, etc... As alternative spaces might give better discrimination of the data, but you need to experiment with them anyway.

Thursday, May 27, 2010

Following on from a New Scientist article that was written a few days ago, I ended up on the website of Taeg Sang Cho -- a graduate student at MIT. He's been working on a bunch of advanced imaging algorithms -- with gifts and grants from big names like Microsoft, Adobe and Google. His recent work -- three research papers -- is all about content-aware manipulation of photos. I'm struggling to pick one because they're all awesome, so I'll just give you the highlights:

A probabilistic jigsaw puzzle solver -- this is the technology featured in the New Scientist article, so there's lots of dumbed-down details if you don't want to read the paper itself. In essence, it does exactly what a human does: matches edges, but it does it quickly and very accurately. Similar technology could be used in photo manipulation (and may indeed already be used by Adobe's Content-Aware Fill) -- the biggest give-away when you manipulate images are edges. This technology could magic away those edges!

A content-aware image prior -- this is a funky way of saying 'image restoration', and I wouldn't be surprised if this is a sneak-peek at the technology you'll see in Photoshop CS6! Look at the sample photo -- the results speak for themselves.

Motion blur removal with orthogonal parabolic exposures -- (phew, just typing that gave me a bit of a hard-on) -- in layman's terms, this is blur removal by taking two photos from slightly different viewpoints and then... performing some magic. Again, look at the sample images for some fantastic proof. I wouldn't expect to see moving lenses in still cameras any time soon though...

It's all very exciting stuff that we'll likely begin to see in consumer software in the next year or two. I just wish there was a video of the jigsaw puzzle solving!

New York invasion by 8-bits creatures ! PIXELS is Patrick Jean' latest short film, shot on location in New York. Written, directed by : Patrick Jean Director of Photograhy : Matias Boucard SFX by Patrick Jean and guests Produced by One More Production www.onemoreprod.com

Over the past year or so, Microsoft’s robotics group has been working quietly, very quietly. That’s because, among other things, they were busy planning a significant strategy shift.

Microsoft is upping the ante on its robotics ambitions by announcing today that its Robotics Developer Studio, or RDS, a big package of programming and simulation tools, is now available to anyone for free.

The Microsoft RDS supports a number of hardware platforms, including the Lego Mindstorms NXT, iRobot Create and Parallax Boe-Bot, and it provides a physics-based simulation environment to allow you to test your designs.

Sunday, May 16, 2010

I have a philosophical question and wait for your answer. How do you define the Term "Similar"?

When 2 images are considered as “Similar images”?

According to Google, as similar is defined:

marked by correspondence or resemblance; "similar food at similar prices"; "problems similar to mine"; "they wore similar coats"

alike(p): having the same or similar characteristics; "all politicians are alike"; "they looked utterly alike"; "friends are generally alike in background and taste"

like: resembling or similar; having the same or some of the same characteristics; often used in combination; "suits of like design"; "a limited circle of like minds"; "members of the cat family have like dispositions"; "as like as two peas in a pod"; "doglike devotion"; "a dreamlike quality"

Figures that have the same shape but not necessarily the same size. Similar polygons have corresponding angles congruent and corresponding sides in proportion. Congruent is a special case of similar where the ratio of the corresponding sides is 1-1. www.aug.edu/~lcrawford/Web_3242/GLOSSARY.htm

Denotes meaning similarity between words that cannot always be used instead of each other, for instance because they only share a part of their ... www.hyperdic.net/en/doc/word

Saturday, May 15, 2010

This year, TPAMI is celebrating its 30th anniversary. To mark this milestone, the IEEE Computer Society’s Publishing Services Department asked journal volunteers to submit their All-Time Favorite Top 10 list and explain their reasons for choosing the papers. Free, limited-time access is available to all of the papers on the list.

I'd like to say I learned about Thin-Plate Splines straight from the papers by Duchon or Meinguet, but I didn't. In fact, I found out about them from this excellent paper by Fred Bookstein. I remember very well punching in the coefficients of the numerical example in that paper into Matlab and realizing how helpful this approach would be to my work on shape matching.

This is the first TPAMI paper I ever read, and it is also the reason I chose to make computer vision my career. I was hooked from their very first example of steered first derivatives of Gaussians. I subsequently devoted several years of my life to studying low level feature extraction, including a pilgrimage to the Mecca of image filtering in Linköping, Sweden.

This was one of the first TPAMI papers whose formation I witnessed from start to finish, since Jianbo was my officemate. We all knew they had a hit on their hands with this one. We also knew that with the publication of this paper, our honeymoon phase with spectral clustering was over, and the nitty gritty phase was about to begin.

Who could forget this paper's dynamic mosaics made from footage of Arnold riding a Harley in Terminator 2. The things they were doing with optical flow at Sarnoff Research Center in the mid-90s were indistinguishable from magic.

Before the SVD mania of the 90s, and long before the boosting craze of the 00s, a handful of towering contributions in the areas of edge detection, optical flow and regularization theory were developed on the foundations of variational calculus. The Canny Edge Detector, developed in the early 80s, was one such contribution. 25 years later it is required learning in virtually every beginning course in computer vision. Not bad for a Master's Thesis!

Quantized tags, approximate geometric arrangements and randomized trees. There were no SIFT or HoG features back then, and the binary handwritten digits were a far cry from the sheep and motorbikes of PASCAL and MSRC, but the essential constellation based recognition approach proposed by this paper was brilliant and ahead of its time.

Spectral graph matching is the lesser known sibling of spectral clustering, but it is nonetheless filled with interesting theoretical nuggets, many of which I encountered for the first time in this paper. I fondly remember this as the paper that prompted me to check out a copy of Papadimitriou and Stieglitz to find out about this so called "Hungarian Algorithm."

Tuesday, May 11, 2010

My collaborators (Vicky and Dim) are working on new video summarization project based on multimodal data and fuzzy classifiers. The proposed technique automatically generates summaries from on-line videos (YouTube). Each frame may participate in one or more than one of the generated classes. The application, once more, will be open source.

Here is a screenshot. More details as well as the paper will be added soon.

The International Conference on Signal Processing (ICSP), sponsored by the IEEE Beijing Section, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied signal processing. ICSP 2010 will bring together leading engineers and scientists in signal processing from around the world. Research frontiers in fields ranging from traditional signal processing applications to evolving multimedia and video technologies are regularly advanced by results first reported in ICSP technical sessions.

Under the support of numerous reviewers and authors, ICSP has been holded for 20 years. In this session, as a celebration for ICSP, we will hold celebration events and awards, which include Outstanding Paper Award, Outstanding Student Paper Award, etc. For details, please visit http://icsp10.bjtu.edu.cn .

*Proceedings*

The proceedings with Catalog number of IEEE and Library of Congress will be published prior to the conference in both hardcopy and CD-ROM, and distributed to all registered participants at the conference. The proceedings will be indexed by EI.

*Paper Submission*

Prospective authors are invited to submit full-length, four-page, double-column papers, including figures and references, to the ICSP Technical Committee by June 15, 2010 at http://icsp10.bjtu.edu.cn. For questions about paper submission, please contact the technical program secretaries, Ms. TANG Xiaofang and Dr. AN Gaoyun at bfxxstxf@bjtu.edu.cn and gyan@bjtu.edu.cn .

Monday, May 10, 2010

A small application for unwrapping omnidirectional images using polar to Cartesian coordinate conversion. The size and aspect ratio of the produced images can be adjusted and the application performs bilinear or bicubic interpolation in order to improve the quality. The center of the omnidirectional image can be detected automatically using either a very simple and fast algorithm based on image thresholding or a slower but much more robust method based on edge detection and Hough transform. An example of using the second method is shown below.

Videos presenting sequences of unwrapped omnidirectional images taken from the COLD database can be downloaded here, here and here. Download, installation and usage instructions for both Linux and Windows can be found below. If you have any questions, you experience problems with the software or you have spotted a bug, please contact Andrzej Pronobis.

Wednesday, May 5, 2010

img Retrieval3.MPEG-7 Descriptors Fusion -Using HIS* -Download Empirical (Historical) Files From the WEB. -Using Z-Score -Using Borda Count -Using IRP -Using Linear Sum 4.MPEG-7 and CCD Descriptors Fusion -Using HIS* -Download Empirical (Historical) Files From the WEB. -Using Z-Score -Using Borda Count -Using IRP -Using Linear Sum 5.From now on we are using a Compact Version of the BTDH for indexing and retrieval 6.New descriptor (B-CEDD). During the search process, an image query is entered and the system returns images with a similar content. Initially, the similarity/distance between the query and each image in the database is calculated with the B-CEDD descriptor, and only if the distance is smaller than a predefined threshold,the comparison of their CEDDs is performed. 7.Now you can save the retrieval results in trec_eval format 8.Indexing is now working with *.bmp,*.jpg and *.png

SIMPLE Descriptors

A set of local image descriptors specifically designed for image retrieval tasks.

Compact Composite Descriptors

A set of global image descriptors for image retrieval tasks.

MPEG-7 Descriptors

Download the latest Version of MPEG-7 Descriptors for C#. The implementation of these descriptors is based on Lire image retrieval System (Lire). Download the Descriptors

The LIRE (Lucene Image REtrieval) library provides a simple way to retrieve images and photos based on their color and texture characteristics. LIRE creates a Lucene index of image features for content based image retrieval (CBIR). Three of the available image features are taken from the MPEG-7 Standard: ScalableColor, ColorLayout and EdgeHistogram a fourth one, the Auto Color Correlogram has been implemented based on recent research results. Furthermore simple methods for searching the index and result browsing are provided by LIRE. The LIRE library and the LIRE Demo application as well as all the source are available under the Gnu GPL license.

Img(Rummager)

Img(Rummager) software can be connected with a database and execute a retrieval procedure, extracting the necessary for the comparison features in real time. The image-database can be stored either in the computer where the retrieval is actually taking place, or in a local network. Moreover, this software is capable of executing retrieval procedure among the keyword-based results that FlickR provides. Read More

Several image processing and retrieval examples using c#

Caliph & Emir
Caliph & Emir are MPEG-7 based Java prototypes for digital photo and image annotation and retrieval supporting graph like annotation for semantic metadata and content based image retrieval using MPEG-7 descriptors
Read More