Artificial Intelligence and Expert Systems

Sunday, January 20, 2008

There was a hope that databases could be controlled by natural languages instead of complicated data retrieval commands, this was a major problem in the early 1970s since the staff in charge of data retrieval could not keep up with demand of users for data.

LUNAR system was the first such interface built by William Woods in 1973 for NASAMannedSpacecraftCenter, this system was able to correctly answer 78% of the questions such as: “What is the average modal plagioclase concentration for lunar samples that contain rubidium?”

Other examples of data retrieval systems would include:

CHAT system

developed by Fernando Pereira in 1983

similar level of complexity to LUNAR system

worked on geographical databases

was restricted

question wording was very important

TEAM system

could handle a wider set of problems than CHAT

was still restricted and unable to handle all types of input

Text Interpretation

In early 1980s, most online information was stored in databases and spreadsheets

IR systems do not attempt to understand all of the text in all of the documents, but they do analyze those portions of each document that contain relevant information

relevance is determined by pre-defined domain guidelines which must specify, as accurately as possible, exactly what types of information the system is expected to find

query would be a good example of such a pre-defined domain

documents that contain relevant information are retrieved while other are ignored

Example: Commercial System (HIGHLIGHT):

It helps users find relevant information in large volumes of text and present it in a structured fashion.

It can extract information from newswire reports for a specific topic area - such as global banking, or the oil industry - as well as current and historical financial and other data.

Although its accuracy will never match the decision-making skills of a trained human expert, HIGHLIGHT can process large amounts of text very quickly, allowing users to discover more information that even the most trained professional would have time to look for

Roughly we can break the process down into the following five components;

Morphological Analysis: Individual words are analyzed into their components and non-word tokens, such as punctuations, are separated. Phonetics is considered for spoken language at this phase.

Syntactic Analysis: Linear sequences of words are transformed into structures that show how the words relate to each other. Some word sequences may be rejected if they violate the language’s rules for how words may be combined.

Semantic Analysis: A mapping is made between the syntactic structures and objects in the task domain. Structures for which no such mapping is possible may be rejected.

Discourse Integration: The meaning of an individual sentence may depend on the sentences that precede it. In this phase, the meaning of a sentence is analyzed depending on the information that precede it, e.g, in “John wanted it.”, “it” depends on the prior discourse context. Such as, “He always had.” would require information about previous sentences.

Pragmatic Analysis: The structure representing what was said is reinterpreted to determine what was actually meant. For example, the sentence “Do you know the rout?”.

Practical Applications

We are going to look at some practical applications of natural language processing;

Machine Translation

Voice Interface for Humanoids

Database Access

Text Interpretation

information retrieval

text categorization

extracting data from text

Machine Translation

Correct translation requires an in-depth understanding of both natural languages since structure of expressions varies in every natural language

analysis by humans of messages relies to some extent on the information which is not present in the words that make up the message

“The pen is in the box”

[i.e. the writing instrument is in the container]

“The box is in the pen”

[i.e. the container is in the playpen or the pigpen]

Examples of poor machine translations would include:

"the spirit is strong, but the body is weak" was translated literally as "the vodka is strong but the meat is rotten”

"Out of sight, out of mind” was translated as "Invisible, insane”

"hydraulic ram” was translated as "male water sheep”

These do not imply that machine translation is a waste of time

some mistakes are inevitable regardless of the quality and sophistication of the system

one has to realize that human translators also make mistakes

There is a substantial start-up cost to any machine translation effort to achieve broad coverage, translation systems should have lexicons of 20,000 to 100,000 words and grammars of 100 to 10,000 rules (depending on the choice of formalism)