What is Computational Linguistics?

Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation and, due to the political situation at the time, almost exclusively on the translation from Russian into English. Considerable resources were dedicated to this task, both in the U.S.A. and in Great Britain, during the fifties and sixties. Other countries, mainly in continental Europe, joined the enterprise, and the first systems ("SYSTRAN") became operational at the end of this period. However, the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high-quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/or post-editing is still required in all cases.

Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e., speech understanding and speech generation. Potential applications go from help for the handicapped (e.g., text-to-speech systems for the blind) to telephony based information systems (e.g., inquiry systems for train or plane connections, telebanking) and further on to office dictation systems (as offered by several vendors). Several text-to-speech systems are commercially available, and are in daily use in many places. The difficulties of speech understanding are much greater than those for speech generation yet some of the speech understanding systems are also entering the marketplace.

An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word-based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e., you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. However, the creation of texts is also becoming a problem. Manuals of complex technical systems (airplanes, computers etc.) are constantly out of date as the systems themselves are upgraded ever faster. Writing manuals by hand is thus getting ever more expensive and unreliable, and if manuals have to be maintained in different languages, manual production becomes increasingly unmanageable. If different versions of the manuals have to be written (for service users, for technicians, for auditors etc.), things get out of hand altogether. The automatic creation of manuals from a common knowledge base, in different languages and for different types of readers is a possible solution of this cluster of problems. The creation of natural language texts has always been a bit of "poor cousin" in the field of Computational Linguistics. The situation described is about to change this in a fundamental manner.

Another topic that might come to the forefront of research in Computational Linguistics is the presentation of textual information. Traditionally, text generation systems have created standard, i.e., linear, text. If the amount of text is large, and/or if different types of readers must be addressed, hypertext is a better medium of presentation. The automatic creation of hypertext from an underlying knowledge base calls for an extension of this traditional approach.

What are the main application areas of Computational Linguistics?

Computational Linguistics tries to solve problems in the following areas:

How is the Computational Linguistics job market?

Many people with a degree in Computational Linguistics work in research groups in universities, governmental research labs, or in large enterprises. For example in Sweden Computational Linguists work in research groups at the various universities that offer courses in linguistics (like Göteborg or Uppsala), at research labs like SICS (The Swedish Institute of Computer Science), or for companies like Telia or IBM.

In addition there are development groups working on commercial products. These range from software houses like Microsoft, that employs Computational Linguists for their work on Grammar Checkers and Automatic Summarization, to the Munich based SailLabs, that develops a machine translation system, to Caterpillar which employs Computational Linguists for translations of technical manuals.

In recent years the demand for Computational Linguists has risen with the increase of language technology products in the Internet. Job offers come from developers improving Internet search engines with linguistic means, or facilitating the user interface with lingubots. Others are integrating speech recognition with language processing techniques.

In general one can say that currently the job market for Computational Linguists is good.

Where and how can I study Computational Linguistics?

Numerous European universities offer degree programs and/or courses in CL. The ACL distributes a "Directory of Graduate Programs in CL". CL is mostly offered as a minor either supplementing a major in Computer Science or in some Linguistics or Language Science.

What are the main professional organizations in Computational Linguistics?