The vast majority of text freely available on the Internet is not available in a form that computers can
understand. There have been numerous approaches to automatically extract information from human-
readable sources. The most successful attempts rely on vast training sets of data. Others have succeeded
in extracting restricted subsets of the available information. These approaches have limited use and require
domain knowledge to be coded into the application.
The current thesis proposes a novel framework for Information Extraction. From large sets of documents,
the system develops statistical models of the data the user wishes to query which generally avoid the lim-
itations and complexity of most Information Extractions systems. The framework uses a semi-supervised
approach to minimize human input. It also eliminates the need for external Named Entity Recognition
systems by relying on freely available databases. The final result is a query-answering system which extracts
information from large corpora with a high degree of accuracy.

Access

Unrestricted;

Degree

M.S.;

Degree Program

Computer Science;

Department

Dept. of Computer Science;

Major Professor

Abdelguerfi, Mahdi

Advisory Committee

Richard III, Golden; Tu, Shengru

Date Degree Awarded

2008-12-19;

Format

PDF

URL

See 'reference url' on top or bottom navigation bars.

Rights

The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.