Characterization and classification of printed text in a multiscale context

Abstract

In this paper, we present a new method for printed text characterization. This method is based on a text visibility and legibility criterion. The text is analyzed through its typographic form. So, we propose to label different kinds of text according to their visual aspect and their textural contents (especially their size, their density but also the line and letter spacing). A scale of legibility and of structural relief of forms has realized this discrimination. The texture is characterized with a statistical analysis of density, which is impervious (insensitive) to our multiscale approach. This statistical analysis is at the basis of the text labeling. This work is a part of a complete scheme of physical and logical document segmentation. It is dedicated to the classification of texts according to their eye-catching properties.