Description

The dissertation presents a number of novel machine learning techniques and applies them to information extraction. The study addresses several information extraction subtasks: part of speech tagging, entity extraction, coreference resolution, and relation extraction. Each of the tasks is formalized as a learning problem and appropriated learning algorithms are developed and applied to the problem The dissertation studies part of speech tagging as a multi-class classification problem, and applies the SNOW (Sparse Network of Winnows) learning system to learn a part of speech classifier. A Comprehensive experimental evaluation of the system confirms that it is appropriate for NLP applications. The dissertation addresses the problem of entity extraction is conjunction with coreference resolution. A Classification approach is presented for entity extraction, and coreference resolution is treated from the decoding perspective. The dissertation describes novel decoding algorithms that given local coreference decisions produce a global coherent interpretation of document entities. The dissertation studies the problem of relation extraction as a classification problem, and applies kernel methods to learn the relation classifiers. Novel kernels are defined in terms of shallow parses, and efficient algorithms are given for computing the kernels. The study evaluates the kernel approach experimentally, with positive results. The dissertation combines the constituent solutions to present a single coherent information extraction system and concludes that machine learning is a viable methodology for designing natural language processing applications.

You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).