A: This is
a prototype Contextual Thesaurus developed by Microsoft Research. Actually,
itfs quite a bit more than that: itfs an English-to-English machine translation
system that employs the same architecture that the Microsoft Translator uses
when translating different languages. To the best of our knowledge, this is the
first large-scale paraphrasing system anywhere.

Q: What do you mean gContextual Thesaurush?

A: An
ordinary thesaurus provides synonyms and near synonyms, usually only for single
words, often without offering much information about when to use these terms.
Try looking up the word gbreakh in a conventional thesaurus. Then look up
gbusinesses are asking for tax breaksh in the Contextual Thesaurus. You will
see the difference.

Q: How do I use it?

A: Type a
short phrase into the input box. Then click the Submit button (the arrowhead in
an orange circle) or hit the Enter key on your keyboard. The system
accepts only one sentence at a time. Some suggestions:

·Limit your input to 4-8 words. The system is capable
of generating paraphrases much longer than that, but results will usually be
more varied and interesting if you type in fewer words rather than more. Even
two or three words will sometimes be enough to retrieve a useful set of
equivalents.

·Formal language works better than colloquial language.
Because our training data consists mostly of documents in the business,
government, or technology domains, the system performs better on input related
to these domains than it does on song lyrics or first-person blog posts.

·Click one of the paraphrases to highlight the path
through the graph taken by that sentence.

·If you click on a word in the graph, the top-ranked
paraphrase containing that term will be highlighted.

·If you click the check mark beside a paraphrase, the
text will be moved into the input box in order to be paraphrased. This way you
can round trip your paraphrases to see more alternatives.

Q: When I type [favorite phrase] it doesnft show me [favorite
paraphrase]. Why donft you have this obvious pair?

A: Our
English-English translation model is learned from large amounts of text found
on the web. The system may not find some perfectly good expressions that donft
occur often enough in our data for them to surface. On the other hand, because
we are using real data that reflects real usage, you probably wonft see too
many out-of-date expressions of the kind that you would find in a conventional
thesaurus.

Q: It makes a lot of grammatical errors.

A: Yes it
does. The system has no knowledge of grammar, and the kinds of errors it
produces are typical of machine translation systems. It doesnft do well on
pronouns and function words, and tense and number often suffer badly. As we
improve our algorithms, over time we expect grammatical quality to get better.
In the meanwhile, non-native speakers of English might wish to use the system
with caution.

Q: When I type in a long sentence, everything in the output
seems pretty much similar. Why is this?

A: This is
because of the way the algorithm selects what it thinks are the best options.
Shorter phrases (4-8 words) generally produce results that are more varied.

Q: The first few suggestions seem OK, but there is a pile of
real junk in there.

A: What
you are seeing is the ranked output of the algorithm. Most translation systems
donft show users what is happening under the hood. In general, the best
suggestions will be found towards the top of the list. But there may still be
gems to be found even among the lower ranked items.

Q: Ifve found an offensive result. Why does this happen? And who
can I tell about it?

A: We do
try to filter out the most obviously offensive terms. However, because much of
our data has been scraped off the web, inappropriate material may occasionally
slip through. In addition, the system can sometimes create inappropriate
juxtapositions even when the input is innocuous. If you do find something
inappropriate or offensive, please report it via the Feedback
link, giving both
the input and output so that we can address the issue.

Q: What is this good for?

A: We
expect that the system will prove useful in many applications that need to
recognize or generate semantically similar words and phrases. The following are
just a few examples, in no special order: writing assistance, document
simplification, document style adaptation, in-house style enforcement, grading
of essays and short answers, language learning, plagiarism detection, steganography,
document fingerprinting, summarizing and abstracting, question answering,
conversational agents, interaction with game characters, search and information
extraction and retrieval, search engine optimization, and command and control.
(Contrary to rumor, we have not yet trained it to wash the dishes.)

Q: Is there an API?

A: We are preparing to make
this available as an API, using this page to collect thoughts and feedback.

Q: If I paste a really large block of text three or four times
into the input box, it hangs my browser.