"Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language and that can explain the functional role of each word in a given sentence," said Slav Petrov, Google senior staff research scientist. The project arose out of Google's pondering of how computers can read and understand human language in order to process it in intelligent ways.

Accessible on GitHub, SyntaxNet serves as a framework for a syntactic parser, a key first component in many NLU systems, Petrov said. "Given a sentence as input, [the parser] tags each word with a part-of-speech tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question."

Parsey McParseface can analyze a sentence and understand its complexity. On a standard benchmark consisting of English newswire sentences, Parsey McParseface recovers individual dependencies between words with better than 94 percent accuracy.

Parsing is difficult for computers due to the ambiguity of human languages. A moderate-length sentence of 20 to 30 words could have as many as tens of thousands of possible syntactic structures. "A natural language parser must somehow search through all of these alternatives and find the most plausible structure given the context," said Petrov. SyntaxNet uses neural networks to tackle the ambiguity problem.

This story, "Google’s machine learning gains natural language understanding" was originally published by
InfoWorld.