Prerequisites

Proficiency in at least one programming language. Students should have
taken LIN 350 (Words in a Haystack: Methods and Tools for Working with
Corpora, Introduction to Computational Linguistics), or CS 310 and CS
315, or obtain consent from the instructor.

Exams and Assignments

There will be one mid-term exam and one final exam. The midterm will
consist of the material covered in the first half of the class, and the
final will consist of material from the entire course.

Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule
page. Readings and exercises may change up one week in advance of their
due dates. Programming assignments must be completed in either Python
or Java.

Attendance is not required. However, given that homeworks and the exams
address the material covered in class, good attendance is essential for
doing well in this class.

Philosophy and Goal

The foremost goal of this course is to expose the student to advanced
techniques and applications of natural language processing (NLP),
especially those involving statistical approaches. The course will
address both theoretical and applied topics.

Some specific goals of the course are to enable students to:

understand core algorithms and data structures used in NLP

utilize corpora and annotations added to them

build statistical NLP components,
such as n-gram language models, text classifiers and part-of-speech
taggers, that learn from such corpora

evaluate the merits of different machine learning methods for given NLP tasks

appreciate the relationship between
linguistic representations and computational applications

This course presents an opportunity for students to gain experience with
models and algorithms used in computational linguistics that underly
practical applications while gaining an appreciation for the theoretical
questions of the field. It will thus help prepare the student both for
jobs in the industry and for doing original research in computational
linguistics.

Content Overview

Natural Language Processing (NLP) is concerned with automatically
processing human language. Applications include machine translation,
search, automatic summarization, and dialog systems. NLP has proved to
be a hard task, among other things because of the complexity of the
structure of human language, and because of the massive amount of world
knowledge that humans use in language understanding.

The field of computational linguistics has experienced significant
growth in the last ten years. In addition to the hard work of
researchers in the field in general, some of the most important factors
behind this include the use of statistical techniques, the availability
of large (sometimes annotated) corpora (including the web itself), and
the availability of relatively cheap and powerful computers. Together,
these factors have played a major part in making computational
linguistics very relevant in applied settings.

This course provides a broad introduction to NLP with a particular
emphasis on core algorithms, data structures, and machine learning for
NLP.
Techniques we will study include

using corpora

n-gram language models

hidden markov models

distributional models

topic models

probabilistic classifiers

experimental methodology in NLP

Applications discussed in the course will include

sentiment analysis

part-of-speech tagging

spelling correction

word sense disambiguation

machine translation

With respect to content, the goal of this course is to give the student
an appreciation for the broad research topics currently being pursued in
the field of computational linguistics. By the end of the course, the
student should be able to

identify and discuss the characteristics of different NLP techniques

identify and discuss the characteristics of different machine learning techniques used in NLP

implement a naive Bayes classifier

implement the forward-backward algorithm for part-of-speech tagging

understand what constitutes a probabilistic language model and understand the difference in assumptions between different types of such models (e.g. bag-of-words, n-gram, HMM, topic model)

create features for probabilistic classifiers to model novel NLP tasks

Course Requirements

Assignments (60%): A series of six assessed, equally-weighted assignments will be given out during the semester. Most of these assignments will have a programming component---these must be completed using the Python programming language.

Mid-term Exam (15%): There will be a mid-term exam over the material covered during the first half of the semester.

Final Exam (25%): There will be a final exam covering all course material.

Attendance is not required, and it is not used as part of determining
the grade.

Extension Policy

Homework must be turned in on the due date in order to receive credit. Penalty-free extensions will be considered on a case-by-case basis and only if the
student asks for the extension before the deadline. In most cases they
will not be granted.

Points will be deducted for lateness. By default, 10 points (out of 100)
will be deducted for lateness, plus an additional 5 points for every
24-hour period beyond 2 that the assignment is late. For example, an
assignment due at 11am on Tuesday will have 10 points deducted if it is
turned in late but before 11am on Thursday. It will have 15 points
deducted if it is turned in by 11am Friday, etc.

Late submissions will not be accepted if they are more than one week past the deadline. No points will be received in this case.

The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all
written work must be your own. If in doubt, ask the instructor.

Students who violate University rules on academic dishonesty are subject
to disciplinary penalties, including the possibility of failure in the
course and/or dismissal from the University. Since such dishonesty harms
the individual, all students, and the integrity of the University,
policies on academic dishonesty will be strictly enforced. For further
information please visit the Student Judicial Services Web site: http://deanofstudents.utexas.edu/sjs.

Notice about students with disabilities

The University of Texas at Austin provides appropriate accommodations
for qualified students with disabilities. To determine if you qualify,
please contact the Dean of Students at 512-471-6529 or UT Services for
Students with Disabilities. If they certify your needs, we will work
with you to make appropriate arrangements.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project
due to the observance of a religious holy day will be given an
opportunity to complete the work missed within a reasonable time after
the absence, provided that he or she has properly notified the
instructor. It is the policy of the University of Texas at Austin that
the student must notify the instructor at least fourteen days prior to
the classes scheduled on dates he or she will be absent to observe a
religious holy day. For religious holy days that fall within the first
two weeks of the semester, the notice should be given on the first day
of the semester. The student will not be penalized for these excused
absences, but the instructor may appropriately respond if the student
fails to complete satisfactorily the missed assignment or examination
within a reasonable time after the excused absence.