Classroom exercises by students of Modern Languages at University of Deusto (Spain)

Author: Yera Espinosa

There are dozens of machine translators on the web, but probably none of them is as used as Google Translate. This does not necessarily mean that this machine translation is the best one created, but it is a good choice for someone who is looking for a free translator. Of course, we should always bear in mind that a machine is never as precise as a human, so we can never totally rely on the translation given. There are always quite a lot of mistakes, especially in long sentences and texts.

To begin with this article, I think it is quite interesting to know a little bit more about machine translation, so before continuing reading, you should take a look at this article I wrote some time ago. As I assume that you already know a little bit about machine translation, I will start talking about Google Translate.

I am going to write this article about the British National Corpus, but as I’m sure many people won’t know what a corpus is, I think it is important that I give an explanation. That is why I am going to start by writing a few lines on corpora in general, and then I will focus my article on the British National Corpus, trying to explain how it works.

CORPUS

What is a corpus?

According to the Oxford Dictionary, a corpus is “a collection of written or spoken material in machine-readable form, assembled for the purpose of linguistic research”.

The plural word to corpus is usually “corpora”.

What are they used for?

They are used to store words, whose features can be analyzed by means of tagging and use of concordancing programs, and they help studying linguistic competence. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules on a specific universe.

WordReference is a free online dictionary used by thousands of people all around the world as it involves some of the most important languages in the world: English, Italian, Spanish, French and Portuguese. They are divided into the pairs English-French, English-Italian, English-Spanish, Spanish-Portuguese and English-Portuguese.

Although it might seem that these are not many languages, in fact French, Italian, Spanish and Portuguese represent around 93% of the Romance language speakers in the world, which, as far as I am concerned, is quite a lot.

In 2009, more language pairs were added: English-German, English-Russian, English-Romanian, English-Polish, English-Czech, English-Greek, English-Turkish, English-Chinese, English-Japanese, English-Korean and English-Arabic, but they are still in progress of being finished.