Tatoeba is a project that aims to collect lots of sentences translated in several languages. In this blog you will find, among other things, news and documentation about it.

Friday, June 11, 2010

Tatoeba update (June 12th, 2010)

What's new

I am glad to announce that we are finally introducing... tags!! :D

This will provide a way for people to add meta-data to sentences. For instance "proverb", "formal", "informal", "male", "female", etc. Such information can be very useful for language learners because they cannot necessarily guess such things just by reading the sentence.

Tags will be restricted for a short period of time. Only trusted users will be able to add tags, but everyone can see the tags associated to a sentence. When we feel the feature is ready for everyone, we will allow everyone to add tags.

People will be free to tag sentences with whatever they want. We don't really have any strict rules yet because tags are still new, and we want to see how people use them. But I can at least suggest some basic tags:

proverb, archaic, slang

formal, informal

male, female (to indicate whether the sentence is said by a man or a woman)

to delete, to correct, checked (I will talk more about these)

controversial, unsafe (to mark sentences that can cause problems, are not suitable for kids, etc).

easy, intermediate, difficult (to indicate the level of difficulty of a sentence)

So these are only my suggestions. Again, the tag feature is new, so we will necessarily go through a phase of experimentation before we can clearly set any rule. We count on everyone to try and help us figure out what works best. Feel free to discuss about issues related to tags on the Wall.

A few more things you need to know about tags:

You can see the list of sentences associated to a certain tag by clicking on the tag.

You can remove a tag from a sentence only if you were the one who added it.

Moderators can remove any tag.

It's not possible to add twice a same tag for a sentence. If someone has already added "proverb", you can't re-add "proverb".

"to delete" tag

Those tags will help moderators in their work. At the moment, in Tatoeba, only moderators can delete sentences. The traditional way of requesting a deletion was to add a comment to it, and point out that it should be deleted (and explain why). But the flow of comments has increased a lot and it's less easy for moderators to keep track.

So if you come upon a sentence that you feel should be deleted, then tag it with "to delete" so that moderators can easily find them and clean Tatoeba from entries that are not valid. Anything that is gibberish is not valid. Anything that is not a complete sentence is not valid. But then again, we haven't decided what exactly is a "sentence" so it's debatable.

"to correct" tag

In Tatoeba, it is not possible to modify a sentence that doesn't "belong" to you. These sentences are typically sentences that you have added yourself. No one (or almost) can touch them besides you. If someone sees a mistake in your sentence, all they can do is post a comment, and you have to correct it.

But certain members contribute sentences with mistakes and never come back. And for now, no one can correct their mistakes... except moderators. So if you want to help moderators, whenever you come across a sentence that needs to be corrected, that has a comment asking for correction, but even after two weeks, it was still not corrected, then you can tag the sentence with "to correct".

"checked" tag

Before I explain further, I must stress that this tag is experimental. Many times people have asked for a way to tell whether a sentence can be trusted or not. Okay, so now we can tag a sentence as "checked" to indicate that it has been proofread and validated as a correct sentence.

Of course, this raises some of course problems...

What if a user tags a sentence as "checked" just for the fun of it?

What if a user tags a sentence as "checked" but was tired and overlooked a mistake?

Well, we can't guarantee 100% accuracy. A sentence that is tagged "checked" will simply have a higher reliability rate than one that doesn't, but it won't be 100% (no one can guarantee that anyway).

What's next

We will make tags available to everyone.

We will add a page that lists all tags, to enable people to easily browse by tags.

No they are not logged. Well, actually tags additions are logged, because we save the date when it's added. But removals are not logged.

To become a trusted user, it's simple.1) Provide a certificate that you have an IQ higher than 200.2) Win the Eurovision Contest in 2011.3) Train with the Shaolin monks for several years and become a kung fu master.

But if you feel it's too easy and want more challenge, you can instead:1) Tell me that you want to become a trusted user.2) Read entirely the guide of the good contributor (http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html)3) Wait until I trust you.

a little suggestion about tags should be to enable "faceted" tags, after the "faceted classification" concept : http://en.wikipedia.org/wiki/Faceted_classification.For example, such facets could be "author:" (author:Churchill), "country:", "context:" (context:school), "form:", "license:", etc.Not all tags need to be faceted, but it helps to sort between them.To be discussed.