Tatoeba is a project that aims to collect lots of sentences translated in several languages. In this blog you will find, among other things, news and documentation about it.

Monday, January 24, 2011

Stats for Tatoeba day #2

The theme for Tatoeba day #2 was quality. For this day we wanted encouraged people to adopt, check, correct sentences, rather than adding lots of sentences and translations. So here are the stats to get an idea of how much has been done :)

Adoptions

Shortly before the start of Tatoeba day, we updated the site and made available a page that lists sentences without an owner. The number of orphan sentences at the beginning of Tatoeba day was 254779. At the end, it was 252331. So an additional 2448 sentences had a home at the end of the day :)

By language

I'm not going to publish the number of orphan sentences for each language. I'll only show the number of adoption for each language on Tatoeba day.

Languages which need adoption the most are Japanese (148,000+ orphans), English (89,000+ orphans) and French (13,000+ orphans).

Russian, Vietnamese, Esperanto, Spanish and Dutch need a bit of attention too, but they have a low population of orphans (less than a few hundreds).

By user

24 users have been adopting.

CK is definitely our most active adopter with 873 adoptions that day. He's like the proof-reading master of English sentences coming from the Tanaka Corpus.

He's followed by szaby78 with 419 adoptions. But szaby78 has adopted all the orphan Hungarian sentences.

Then in 3rd position we have Guybrush88, with 194 adoptions for Italian.

Validations

There were 1,370 sentences tagged 'OK' on Tatoeba day. Mostly by CK, for English sentences.

CK (1184)

LaraCroft (74)

Guybrush88 (56)

arcticmonkey (48)

xtofu80 (3)

Zifre (2)

fucongcong (1)

Pharamp (1)

Shishir (1)

Corrections

There has been a total of 422 sentences corrected. Szaby78 has been the most active in trying to correct sentences.

szaby78 (51)

Shishir (33)

Nero (23)

ludoviko (22)

jakov (19)

qdii (19)

zipangu (18)

Zifre (17)

GilHut (14)

CK (11)

martinod (10)

xtofu80 (10)

Eldad (10)

Dejo (10)

U2FS (9)

GrizaLeono (9)

Esperantodan (8)

Hans07 (8)

Archibald (8)

Guybrush88 (8)

Pharamp (8)

LaraCroft (7)

Esperantostern (6)

nickyeow (5)

JimBreen (5)

Farkas (5)

Riskemulo (4)

arcticmonkey (4)

ventana (4)

sysko (4)

Vortarulo (4)

esocom (4)

landano (4)

MUIRIEL (4)

ivanov (4)

rado (3)

kebukebu (2)

mamat (2)

Alois (2)

Muelisto (2)

darinmex (2)

ismailzali (2)

excaelestis (1)

shanghainese (1)

kolonjano (1)

catakaoe (1)

brauliobezerra (1)

kurteago (1)

sigfrido (1)

jxan (1)

sacredceltic (1)

pandark (1)

boracasli (1)

TRANG (1)

pqs (1)

pohli (1)

autuno (1)

manuk7 (1)

MikeMolto (1)

fucongcong (1)

Comments

There has been 503 comments posted, almost half of them by CK who was mostly pointing out duplicate sentences.

We haven't decided yet what the theme will be, but the banners mini-contest deadline is delayed to that date since I received only 3 submissions. In any case I will write another blog post about it when the time comes. Thank you to everyone who contributed to this 2nd Tatoeba day :)