The goal in image segmentation is to label pixels in an image based
on the properties of each pixel and its surrounding region. Recently
Content-Based Image Retrieval (CBIR) has emerged as an
application area in which retrieval is attempted by trying to gain
unsupervised access to the image semantics directly rather than
via manual annotation. To this end, we present an unsupervised
segmentation technique in which colour and texture models are
learned from the image prior to segmentation, and whose output
(including the models) may subsequently be used as a content
descriptor in a CBIR system. These models are obtained in a
multiresolution setting in which Hidden Markov Trees (HMT) are
used to model the key statistical properties exhibited by complex
wavelet and scaling function coefficients. The unsupervised Mean
Shift Iteration (MSI) procedure is used to determine a number of
image regions which are then used to train the models for each
segmentation class.

Content-Based Image Retrieval is important for two reasons. First, the oft-cited growth of image archives in many fields, and the rapid expansion of the Web, mean that successful image retrieval systems are fast becoming a necessity if the mass of accumulated data is to be useful. Second, database retrieval provides a framework within which the important questions of machine vision are brought into focus: successful retrieval is likely to require genuine image understanding. In view of these points, the evaluatio- n of retrieval systems becomes a matter of priority. There is already a substantial literature evaluating specific systems, but little high-level discussion of the evaluation methodologies themselves seems to have taken place. In the first part of the report, we propose a framework within which such issues can be addressed, analyse possible evaluation methodologies, indicate where they are appropriate and where they are not, and critique query-by-example and evaluation methodologies related to it. In the second part of the report, we apply the results of this analysis to a particular dataset. The dataset is problematic but typical: no ground truth is available for its semantics. Considering retrieval based on image segmentation- s, we present a novel method for its evaluation. Unlike methods of evaluation that rely on the existence or creation of ground truth, the proposed evaluatio- n procedure subjects human subjects to a psychovisual test comparing the results of different segmentation schemes. The test is designed to answer two questions: does consensus about a `best' segmentation exist, and if it does, what do we learn about segmentation schemes for retrieval? The results confirm that human subjects are consistent in their judgements, thus allowing meaningful evaluation.