In this paper our goal is to employ human judgments of image similarity to improve the organization of an image database for content-based retrieval. We first derive a statistic, $\kappa_B$ for measuring the agreement between two partitionings of an image set into unlabeled subsets. This measure can be used both to measure the degree of agreement between pairs of human subjects, and also between human and machine partitionings of an image set. It also allows a direct comparison of database organizations, as opposed to the indirect measure available via precision and recall measurements. This provides a rigorous means of selecting between competing image database organization systems, and assessing how close the performance of such systems is to that which might be expected from a database organization done by hand. We then use the results of experiments in which human subjects are asked to partition a set of images into unlabeled subsets to define a similarity measure for pairs of images based on the frequency with which they were judged to be similar. We show that, when this measure is used to partition an image set using a clustering technique, the resultant clustering agrees better with those produced by human subjects than any of the feature space-based techniques investigated. Finally, we investigate the use of machine learning techniques to discover a mapping from a numerical feature space to this perceptual similarity space. Such a mapping would allow the ground truth knowledge abstracted from the human judgments to be generalized to unseen images. We show that a learning technique based on an extension of a Kohonen network allows a similarity space to be learnt which results in partitionings in excellent agreement with those produced by human subjects.