About this collection

This collection demonstrates Greenstone's language identification ability.
It is based on a collection of Japanese folktales that have been translated into Chinese,
English, French, German, Italian, Japanese and Spanish (note that Greenstone also supports many other languages).

Greenstone automatically attempts to extract the language (and encoding too) of each source document at collection build time. To do this it uses a modified version of TextCat by Gertjan van Noord (vannoord@let.rug.nl). You can learn more about TextCat, and download the source code, from http://odur.let.rug.nl/~vannoord/TextCat.

How to find information in the folktales: language extraction demo collection

There are 3 ways to find information in this collection:

search for particular words that appear in the text by clicking the Search button