(Cat? OR feline) AND NOT dog?
Cat? W/5 behavior
(Cat? OR feline) AND traits
Cat AND charact*

This guide provides a more detailed description of the syntax that is supported along with examples.

This search box also supports the look-up of an IP.com Digital Signature (also referred to as Fingerprint); enter the 72-, 48-, or 32-character code to retrieve details of the associated file or submission.

Concept Search - What can I type?

For a concept search, you can enter phrases, sentences, or full paragraphs in English. For example, copy and paste the abstract of a patent application or paragraphs from an article.

Concept search eliminates the need for complex Boolean syntax to inform retrieval. Our Semantic Gist engine uses advanced cognitive semantic analysis to extract the meaning of data. This reduces the chances of missing valuable information, that may result from traditional keyword searching.

Line Segmentation Method for Documents in European Languages

Publishing Venue

IBM

Related People

Yamashita, A: AUTHOR

Abstract

This article describes an efficient method for segmenting character lines from a skewed document image. The method estimates base lines, font size, and degree of skew for each page, and all character lines are segmented on the basis of this information. This method can be used to segment characters with underlines and also characters in tables.

Country

United States

Language

English (United States)

This text was extracted from an ASCII text file.

This is the abbreviated version, containing approximately
52% of the total text.

Line Segmentation Method for Documents in European
Languages

This article
describes an efficient method for segmenting
character lines from a skewed document image. The method estimates
base lines, font size, and degree of skew for each page, and all
character lines are segmented on the basis of this information. This
method can be used to segment characters with underlines and also
characters in tables.

A page image
is vertically divided into several partitions like
those shown in Fig. 1. In each
partition, a horizontally projected
histogram is calculated. Image data are
horizontally processed byte
by byte, and the number of black pixels in each 8-bit byte of data is
counted and summed up in each partition. If all pixels in an 8-bit
byte of data are black, it may be a part of an underline or a
scaled line. Therefore, the number of
these black patters is also
counted. From the distribution of the
projected histogram,
rectangular data (line-components) that represent parts of character
strings are detected in each partition. An example of a
line-component and a local base-line is shown in Fig. 1.

A local
base-line is then estimated in each line-component.
Since the projected histogram has a maximum value around a base-line
in a line-component, it is investigated from the bottom of the
component, and if its value exceeds the threshold level for a
base-line, the position is recorded as a local base-line. However,
if parts of character strings connect an underline, the position of
the underline can be detected as a local base-line. In this case,
many 8-bit black patterns must be counted around the underline, and
therefore the investigation of the projected histogram continues
until the next maximum value is detected.
If no appropriate
candidate is found, the bottom line of the component is recorded as a
local base-line.

In order to
eliminate scaled lines connecting the tops of
characters, the projected histogram is investigated in the same way
from the top of each line-component. If
a scaled line is detected,
the boundary between the characters and the scaled line is estimated
by using the distributions of the projected histogram and black
patterns. Line-components that contain only underlines or scaled
lines are eliminated.

To complement the skew effect, the degree of
skew in a page is
estimated on the basis of local base-lines....