Content-based image retrieval and its benefits for the stock photography market

Author:

Padeste, Romano

Abstract:

The development of powerful low-cost desktop computer systems has
changed the pre-press business where tight deadlines must be met per
sistently. An increasing number of newspapers and magazines are acquiring,
handling, and storing images digitally while the use of hardcopies and
slides decreases.
Today's computers and high capacity storage-media enable stock pho
tography agencies to build digital image databases, giving users fast access
to large numbers of images. However, the transition from analog to digital
image archives imposes new problems: with thousands of images at hand,
the search for a particular image may turn into the search for the needle
in a haystack.
The first image Database Management Systems (DBMSs) were extended
text DBMSs, which stored the image data along with a set of manually
entered descriptive keywords. The major problem with this approach is that
there is no generally agreed-upon language to describe images. Even sophis
ticated DBMSs are unable to detect synonyms; hence, an image described
with certain properties such as
"curvy"
may not be found if a user enters "wavy"
as a search criterion. Furthermore, some image properties are hard
to describe with keywords. A search is likely to fail if properties were not
described at the database population stage when images are added to the
database. Finally, assigning a sufficient set of keywords to every image adds
a tremendous amount of labor to the population stage.
Research at many scientific institutions and companies is geared towards
overcoming the shortcomings of image DBMSs with keyword-based search
engines. Pattern recognition which allows for comparing images based on
their visual content is being introduced to image DBMSs, improving the
accuracy of search engines. Sketches, sample images, and other means of
describing the visual content of images may be used as search criteria in
addition to keywords. This thesis project summarizes the basics of pattern
recognition and its applications in image database management for contentbased
image retrieval.
The purpose of this thesis project is to determine the impact of contentbased
image retrieval on the stock photography market in the near future. In
order to obtain the necessary information, two different questionnaires were
sent out to a number of selected stock photography agencies, newspapers,
and magazines. The evaluation of the replies was conducted for the three
groups separately.
The replies from stock photography agencies showed a high interest in
digital image archives. They also showed concerns about increased overhead with digital archives. The estimated amount of work required for categoriz
ing images and assigning keywords ranged from fifty to ninety percent as
compared to ten to fifty percent for scanning. All survey participants agreed
that pattern recognition can improve the accuracy of keyword-based search
engines. However, they all denied that this approach would reduce the need
for assigning keywords.
Different needs could be determined for newspaper and magazines.
Newspapers rely heavily on keywords since images are often chosen based
upon the circumstances under which they were taken while their visual con
tent may be secondary. Therefore, newspapers' profits from content-based
image retrieval are minute. For magazines, the visual content of images
seemed to have a higher priority and the appreciation for corresponding
search capabilities was accordingly higher.
To summarize, users of digital image archives can profit from contentbased
image retrieval if the visual content is an important issue. For image
providers, there are a number of reasons that delay the transition to contentbased
image retrieval. Currently, there is only one shrink-wrapped commer
cial product available that meets the needs of stock photography agencies.
This product requires additional work for fully exhausting its capabilities.
Finally, many companies have already built their image database and the
transition to another system is time-consuming, expensive, and risky.