Facebook

The Center for Language and Speech Processing at Johns Hopkins University is offering a unique summer internship opportunity, which we would like you to bring to the attention of your best students in the current junior class. Only two weeks remain for students to apply for these internships. This internship is unique in the sense that the selected students will participate in cutting edge research as full members alongside leading scientists from industry, academia, and the government. The exciting nature of the internship is the exposure of the undergraduate students to the emerging fields of language engineering, such as automatic speech recognition (ASR), natural language processing (NLP), machine translation (MT), and speech synthesis (ITS).
We are specifically looking to attract new talent into the field and, as such, do not require the students to have prior knowledge of language engineering technology. Please take a few moments to nominate suitable bright students who may be interested in this internship. On-line applications for the program can be found at http://www.clsp.jhu.edu/ along with additional information regarding plans for the 2002 Workshop and information on past workshops. The application deadline is
February 15, 2002.
If you have questions, please contact us by phone (410-516-4237), e-mail (sec@clsp.jhu.edu) or via the Internet http://www.clsp.jhu.edu

Sincerely,
Frederick Jelinek
J.S. Smith Professor and Director

Project Descriptions for this Summer
1. Weakly Supervised Learning For Wide-Coverage Parsing
Before a computer can try to understand or translate a human sentence, it must identify the phrases and diagram the grammatical relationships among them. This is called parsing.
State-of-the-art parsers correctly guess over 90% of the phrases and relationships, but make some errors on nearly half the sentences analyzed. Many of these errors distort any subsequent automatic interpretation of the sentence.
Much of the problem is that these parsers, which are statistical, are not „trained” on enough example parses to know about many of the millions of potentially related word pairs. Human labor can produce more examples, but still too few by orders of magnitude.
In this project, we seek to achieve a quantum advance by automatically generating large volumes of novel training examples. We plan to bootstrap from up to 350 million words of raw newswire stories, using existing parsers to generate the new parses together with
confidence measures. We will use a method called co-training, in which several reasonably
good parsing algorithms collaborate to automatically identify one another’s weaknesses (errors) and to correct them by supplying new example parses to one another. This accuracy-boosting technique has widespread application in other areas of machine learning, natural language processing and artificial intelligence.
Numerous challenges must be faced: how do we parse 350 million words of text in less than a year (we have 6 weeks)? How to use partly incompatible parsers to train one another? Which machine learning techniques scale up best? What kind of grammars, probability models,
and confidence measures work best? The project will involve a significant amount of programming, but the rewards should be high.

2. Novel Speech Recognition Models for Arabic
Previous research on large-vocabulary automatic speech recognition (ASR) has mainly concentrated on European and Asian languages. Other language groups have been explored to a lesser extent, for instance Semitic languages like Hebrew and Arabic. These languages possess certain characteristics, which present problems for standard ASR systems. For example, their written representation does not contain most of the vowels present in the spoken form, which makes it difficult to utilize textual training data.
Furthermore, they have a complex morphological structure, which is characterized not only by a high degree of affixation but also by the interleaving of vowel and consonant patterns (so-called „non-concatenative morphology”). This leads to a large number of possible word forms, which complicates the robust estimation of statistical language models.
In this workshop group we aim to develop new modeling approaches to address these and related problems, and to apply them to the task of conversational Arabic speech recognition. We will develop and evaluate a multi-linear language model, which decomposes the
task of predicting a given word form into predicting more basic morphological patterns and roots. Such a language model can be combined with a similarly decomposed acoustic model, which necessitates new decoding techniques based on modeling statistical dependencies between loosely coupled information streams. Since one pervading issue in language processing is the tradeoff between language-specific and language-independent methods, we will also pursue an alternative control approach which relies on the
capabilities of existing, language-independent recognition technology.
Under this approach no morphological analysis will be performed and all word forms will be treated as basic vocabulary units. Furthermore, acoustic model topologies will be used which specify short vowels as optional rather than obligatory elements, in order to facilitate the
use of text documents as language model training data. Finally, we will investigate the possibility of using large, generally available text and audio sources to improve the accuracy of conversational Arabic speech recognition.

3. Generation from Deep Syntactic Representation in Machine Translation
Let’s imagine a system for translating a sentence from a foreign language (say Arabic) into your native language (say English). Such a system works as follows. It analyzes the foreign-language sentence to obtain a structural representation that captures its essence, i.e. „who did what to whom where,” It then translates (or transfers) the actors, actions, etc. into words in your language while „copying over” the deeper relationship between them. Finally it synthesizes a syntactically well-formed sentence that conveys the essence of the
original sentence. Each step in this process is a hard technical problem, to which the best-known solutions are either not adequate for applications, or good enough only in narrow application domains, failing when applied to other domains. This summer, we will concentrate
on improving one of these three steps, namely the synthesis (or generation).
The target language for generation will be English, and that the source language to the MT system a language of a completely different type (Arabic and Czech). We will further assume that the transfer produces a fairly deeply analyzed sentence structure. The incorporation of the deep analysis makes the whole approach very novel – so far no large-coverage translation system has tried to operate with such a structure, and the application to very diverse languages makes it an even more exciting enterprise!
Within the generation process, we will focus on the structural (syntactic) part, assuming that a morphological generation module exists to complete the generation process, and will be added to the suite so as to be able to evaluate the final result, namely, the
goodness of the plain English text coming out of the system. Statistical methods will be used throughout. A significant part of the workshop preparation will be devoted to assembling and running a simplified MT system from Arabic/Czech to English (up to the
syntactic structure level), in order to have realistic training data for the workshop project. As a consequence, we will not only understand and solve the generation problem, but also learn the mechanics of an end-to-end MT system, creating the intellectual
preparation of team members to work on other parts of the MT system in the future.

4. SuperSID: Exploiting High-level Information for High-performance Speaker Recognition
Identifying individuals based on their speech is an important component technology in many application, be it automatically tagging speakers in the transcription of a board-room meeting (to track who said what), user verification for computer security or picking out a known terrorist or narcotics trader among millions of ongoing satellite
telephone calls.
How do we recognize the voices of the people we know? Generally, we use multiple levels of speaker information conveyed in the speech signal. At the lowest level, we recognize a person based on the sound of his/her voice (e.g., low/high pitch, bass, nasality, etc.). But we also use other types of information in the speech signal to recognize a speaker,
such as a unique laugh, particular phrase usage, or speed of speech among other things.
Most current state-of-the-art automatic speaker recognition systems, however, use only the low level sound information (specifically, very short-term features based on purely acoustic signals computed on 10-20 ms intervals of speech) and ignore higher-level information. While
these systems have shown reasonably good performance, there is much more information in speech which can be used and potentially greatly improve accuracy and robustness.
In this workshop we will look at how to augment the traditional signal-processing based speaker recognition systems with such higher-level knowledge sources. We will be exploring ways to define speaker-distinctive markers and create new classifiers that make use
of these multi-layered knowledge sources. The team will be working on a corpus of recorded telephone conversations (Switchboard I and II corpora) that have been transcribed both by humans and by machine and have been augmented with a rich database of phonetic and prosodic
features. A well-defined performance evaluation procedure will be used to measure progress and utility of newly developed techniques.

http://www.clsp.jhu.edu/ws2002/application/

No limitation is placed on the undergraduate major. Only relevant skills, employment experience, past academic record and the strength of letters of recommendation will be considered. Students of Biomedical Engineering, Computer Science, Cognitive Science, Electrical Engineering, Linguistics, Mathematics, Physics, Psychology, etc. may apply. Women and minorities are encouraged to apply. An opportunity to explore an exciting new area of research.

An opportunity to explore an exciting new area of research.
A two-week tutorial on speech and language technology.
Mentoring by an experienced researcher.
Use of a computer workstation throughout the workshop.
A $4500 stipend and $2128 towards per diem expenses.
Private furnished accommodation for the duration of the workshop.
Travel expenses to and from the workshop venue.
Participation in project planning activities.
The eight-week workshop provides a vigorously stimulating and enriching intellectual environment and we hope it will encourage students to eventually pursue graduate study in the field of human language technologies.