Indexing and retrieval of degraded handwritten medical forms

By Huaigu Cao, Faisal Farooq and Venu Govindaraju

Abstract

The tasks of indexing and retrieval are specifically challenging for the erroneous output of handwriting recognition (HR) systems. This paper proposes an approach of indexing and retrieving degraded documents with very low recognition rates. We present a modified version of the popular Vector Model in information retrieval (IR). Our model incorporates top n candidates from a HR system into the scheme of calculating the term frequency (tf) and the inverted document frequency (idf). Standardized IR Tests show that the proposed approach outperforms the retrieval of ordinary HR text in terms of mean average precision (MAP) and R-Precision.