Literature review is a time-consuming burden because it is hard to find relevant articles. But literature review is so important
because it allows researchers to find solutions to their questions/problems from previous work already performed and
published by others. It is difficult to wade through documents quickly and assess their quality by only looking at their title,
abstract, or even full-text. The human visual system allows us to quickly glance at images and infer the main subject of an
article and decide whether we are interested in reading more. In some cases, such as biology articles for example, figures
showing photos of experimental results quickly allow a researcher in the literature review phase to determine the quality of
the work by its results. This work describes a system for literature review that uses content-based image retrieval (CBIR)
techniques to search for relevant documents using the content of figures in a document along with relevance feedback
refinement instead of keyword search guesswork. The long-term goal is to use it as a subsystem in a content-based
document retrieval system where the figures and their captions are combined with the document's body text. This paper
describes the processing of the documents to extract available raster graphics as well as text with its layout and formatting
information intact. The process of matching a figure to its caption using this layout information is then described. While
caption-based search is implemented but not quite merged into the system yet, the figure-caption matching is complete.
Two novel modified tf-idf measures that are being considered to take into account bold/italic text, font size, and document
structure as a way to infer text importance rather than just rely on text frequency is detailed mathematically and explained
intuitively. CBIR queries where there are multiple images that form the query are issued as separate queries and their
results are then merged together.