about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

benny335 wrote:about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!

benny335 wrote:about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!

Oh my!So basically i could find a U.N. or some long document and translate it into my language using colloqueolism (I know it isn't spelled right sorry) context and all that. Right. If so would that be the base of it?