Issue 8397:
Russian spell-checking issues in new dictionary

Issue description

Moved from intern bug b/1417901, originally reported by Ekaterina
Kamenskaya.
I have installed new dictionary for Russian, and checked spell-checking.
Now it's working better than previously, though there are some minor issues
found:
1) Doesn't know some words, for example:
борсетка, парусинник, микрокосмос, микроволновка, персонализация,
обучаемость, стушевать, креативность, употребимый, бесприставочный,
скачивание, наидобрейший....
- including many professional terms:
транслитерировать, топоним, криптоним, диалектизм, номинализация,
семантически (linguistics, literature),
аксельрод, анорексия, аутизм (medical)
идеомоторика, нистагм, парабиоз, хроматопсия, психогенетика,
психофармакология (psychology)
многоплатформенная, проактивная, микродвижение, дискриминантный...
- including slang (and new words):
апдейт, граффити, онлайн, броузер (браузер), коннект, хостинг, киберпанк,
андеграунд, офшор, кайфоломщик, динамщик, жор....
- including some popular names (and foreign names written in Russian):
Мэри (Mary), Марго, Каринэ, Эдип, Ярило....
2) Some nouns aren't accepted when they are in plural genetive form (though
they are accepted in plural or singular nominative case):
очередностей (очередности is OK), совокупностей (совокупности is OK),
соотнесенностей (соотнесенности is OK), переворачиваний (переворачивания is
OK), снисхождений (снисхождения is OK).
3) Doesn't accept words (nouns, verbs, adjectives, atc.) which are written
with Russian letter “ё” (or in capital case - “Ё”):
ёжик, идёт, ёлка, отвёртка, бочёнок, ёмкость, нырнёшь, начнём, всё ....
Note: Some Russian words may be in general written with letter “ё” or “e”.
If the word is written with original “ё” (the same word may be also written
with “е”, but many texts contain words with “ё”), this should be accepted.
Spell-check should have the list of words which may be written with both
“е” or “ё”.
(See some info about this letter here
http://en.wikipedia.org/wiki/Yo_(Cyrillic))
Comments from Jungshik:
---------------------------------
How about getting some help from (Russian) 'stemming' team to enhance
Chrome's spellcheck dictionaries (for Russian and other highly inflective
languages)?
Comments from Hironori:
---------------------------------
It is a great idea.
Once they enumerate words to be added to our spell-checker, I'm going to
write a program which maps declension rules of Russian to suffix rules of
hunspell.
Comments from Hironori:
---------------------------------
Sorry for the lack of my responses.
I'm now writing a fix for the issue (3), but, I'm a little wondering how I
should implement suggestions for words which include 'ё'. For example, when
we make a typo "ёлак" while typing a "ёлка" and need to diplay a suggestion
list, there are three options as listed below:
1. Display both "ёлка" and "eлка";
1. Display only "ёлка", or;
2. Display only "eлка".
Unfortunately, each option requires its own implementation and we need to
choose the best one from the options above. So, I would like to ask native
Russian the best option.

About "Ё".
http://en.wikipedia.org/wiki/Yo_(Cyrillic) says about it:
<<<
Except for a brief period after World War II, the use of <ё> was never obligatory in standard
Russian orthography. By and large, it is only used in pedagogical literature intended for
children and foreign language learners, and in dictionaries. Otherwise <е> is used, and <ё>
occurs only when it is necessary to avoid ambiguity (for example, to distinguish between все and
всё when it is not obvious from the context which is meant) or in words (principally proper
names) the pronunciation of which may not be familiar to the reader. It is however perfectly
permissible under the current standard to mark <ё> whenever it occurs, and this is the
preference of some Russian authors and periodicals.
>>>
As a result, Firefox just has two dictionaries.
I would say, that dictionary may contain both forms, as both form can be considered correct.
But, from perfectioninst point of view, it is user preference.

Though use of letter "ё" might be considered not obligetory according to formal rules,
many people consider it as important as any other letter and refuse using it is
unaccepatable.
I don't see the reason for having two dictionaries. The solution might be to have the
dictionary with "ё", but accept any other words spelled with "е" instead of "ё". In
case of misspelled words, the variant with "ё" should be suggested in my humble
opinion.

"The fact that <ё> is frequently replaced with <е> in print often causes some
confusion to both Russians and non-Russians, as ____it makes Russian words and names
harder to transcribe accurately_____." (enwiki, [[Yo (Cyrillic)]])

Yes, "ё" is very important letter in russian. If we use "е" instead of "ё", the word
will sound completely different and this may lead to confusion in some cases.
Though it is permitted by russian orthography to use "е" instead of "ё", this
definitely shouldn't be the default option.
I will agree with comment #5: the dictionary should contain words with "ё" and the
spell checker should permit using of "е". But what about suggestions?... It will be
nice if Chrome allows a user to tune the behaviour: whether use "ё" or "е" in suggestions. Or it is possible to display both variants (why not?...).

Sorry for my slow updates.
The latest Chromium snapshots (*1) now includes the Hungarian and the Romanian
spellchecker dictionaries. Also it updates the Russian dictionary to add words. It is
definitely helpful to try the snapshots and give us your feedback.
(*1) http://build.chromium.org/buildbot/snapshots/
Regards,

Choose as a base Firefox dictionary:
https://addons.mozilla.org/en-US/firefox/addon/3703/
It supports nicely and logically letter YO.
Also I would add, that there is a difference, if you write word with YO or with YE.
For example:
vsJO and vsJE are two different words.
So the logic can not be based just on user's experience. For me it is very strange, that Ekaterina says "Some Russian words may be in general written with letter “ё” or “e”.". There may be words, which have both spelling forms, but also may be words, which don't have both forms.

to deman...
"For me it is very strange, that Ekaterina says "Some Russian words may be in general written with letter “ё” or “e”."."
Yes, for written text in Russia it is allowed to use “e” instead of all “ё”. For most words it is clear, what letter is in original word. For words, which are differ by this letter (“всё” or “всe”), meaning is also clear from the context in most cases. But still.
There is “ё” in my surname, so I insist on proper using “ё” in all cases. :)

Addition to comment 22 above...
The recognition between the words with «ё» as opposed to the similar words with «e» may be critical for correct understanding of the text. Take, for example, the verb «распознавать» — literally means „to identify, to recognize”, as in optical character recognition. The form «распознаёт» is present progressive time third person singular form of the verb („he / she is recognizing right now”). However, the same verb written with «e» in place of «ё» («распознает») stands for future time third person („he / she will take an effort to recongize at some unspecified time in the future”). This may change the whole meaning of the phrase, especially when reporting on a progress in some task.
Therefore please make an effort to add the spelling check for words with YO.
In another spell checking system, "Myspell", there are two different dictionary files for Russian language, one with YO and another with YE. A word is identified as misspelled if both dictionaries report false.

I always write "ё" when appropriate. Why not? There is a button for that letter. The text is more readable with it. It's quite annoying that Chrome marks all those words as misspelled. In fact, it's the biggest annoyance in Chrome compared to Mozilla Firefox. If only I could install the Mozilla dictionary in Chrome!
Regarding comment 27, let me just explain it for those who don't speak Russian. That's the case when "ё" changes the meaning on the world, and so it's needed. Chrome suggests changing "recognizes" to "will recognize". Just imagine if some spell checker had such problem with text in English!

Problem itself is simple: there's lots of words in Russian, that are correctly spelled with "ё", but Chrome marks them as incorrect, suggesting another spelling, with "е" (which is also correct). That is, Chrome marks correct spelling for known words as incorrect - and this behavior does not seem to be correct to me :-)
Moreover, spelling with "ё" is not only "correct", it is the "primary" correct spelling for these words - if you will look in any serious, "academic" dictionary or encyclopedy, you will find these words spelled with "ё", not "е"! Spelling these words with "е" is considered "also grammatically correct", and thou it is very common - that is technically not "completely correct original form of the word", but just "also acceptable in most cases".
That means that Chrome:
1) marks correct spelling of known words as incorrect, which obviously is an error
2) prefers "also correct spelling" over "primary spelling".
But it seems to me - nobody cares: we have this issue for more than an year, and nothing changes...

Letter Ёё is essential for proper modern Russian grammar.
This is an axiom, end of discussion!
It is better not to use any dictionary at all, than to use an incomplete or incorrect one, which is any Russian dictionary that does not support this letter.
Try to imagine an English dictionary without a Yy letter.
(After all, it is similar to Ii, isn't it?)
Whi not?
Wouldn't that simplifi the English grammar?
Hei, iour readers will get the idea aniwai. :)))))

I think Chrome needs a smart dictionary that would contain words only with the "ё". Words written with "е" instead of "ё" would be accepted, but not the other way around. That should make most users happy, both the fans of "ё" and its haters. That's how the Mozilla's spellchecker appears to work (the one installed as an addon).
Better yet, there should be a choice between "relaxed ё" (as described above) and "strict ё", but that would require a GUI change, and only the hardcore "ё" fans would want it.

Moved from intern bug b/1417901, originally reported by Ekaterina
Kamenskaya.
I have installed new dictionary for Russian, and checked spell-checking.
Now it's working better than previously, though there are some minor issues
found:
1) Doesn't know some words, for example:
борсетка, парусинник, микрокосмос, микроволновка, персонализация,
обучаемость, стушевать, креативность, употребимый, бесприставочный,
скачивание, наидобрейший....
- including many professional terms:
транслитерировать, топоним, криптоним, диалектизм, номинализация,
семантически (linguistics, literature),
аксельрод, анорексия, аутизм (medical)
идеомоторика, нистагм, парабиоз, хроматопсия, психогенетика,
психофармакология (psychology)
многоплатформенная, проактивная, микродвижение, дискриминантный...
- including slang (and new words):
апдейт, граффити, онлайн, броузер (браузер), коннект, хостинг, киберпанк,
андеграунд, офшор, кайфоломщик, динамщик, жор....
- including some popular names (and foreign names written in Russian):
Мэри (Mary), Марго, Каринэ, Эдип, Ярило....
2) Some nouns aren't accepted when they are in plural genetive form (though
they are accepted in plural or singular nominative case):
очередностей (очередности is OK), совокупностей (совокупности is OK),
соотнесенностей (соотнесенности is OK), переворачиваний (переворачивания is
OK), снисхождений (снисхождения is OK).
3) Doesn't accept words (nouns, verbs, adjectives, atc.) which are written
with Russian letter “ё” (or in capital case - “Ё”):
ёжик, идёт, ёлка, отвёртка, бочёнок, ёмкость, нырнёшь, начнём, всё ....
Note: Some Russian words may be in general written with letter “ё” or “e”.
If the word is written with original “ё” (the same word may be also written
with “е”, but many texts contain words with “ё”), this should be accepted.
Spell-check should have the list of words which may be written with both
“е” or “ё”.
(See some info about this letter here
http://en.wikipedia.org/wiki/Yo_(Cyrillic))
Comments from Jungshik:
---------------------------------
How about getting some help from (Russian) 'stemming' team to enhance
Chrome's spellcheck dictionaries (for Russian and other highly inflective
languages)?
Comments from Hironori:
---------------------------------
It is a great idea.
Once they enumerate words to be added to our spell-checker, I'm going to
write a program which maps declension rules of Russian to suffix rules of
hunspell.
Comments from Hironori:
---------------------------------
Sorry for the lack of my responses.
I'm now writing a fix for the issue (3), but, I'm a little wondering how I
should implement suggestions for words which include 'ё'. For example, when
we make a typo &quot;ёлак&quot; while typing a &quot;ёлка&quot; and need to diplay a suggestion
list, there are three options as listed below:
1. Display both &quot;ёлка&quot; and &quot;eлка&quot;;
1. Display only &quot;ёлка&quot;, or;
2. Display only &quot;eлка&quot;.
Unfortunately, each option requires its own implementation and we need to
choose the best one from the options above. So, I would like to ask native
Russian the best option.

Moved from intern bug b/1417901, originally reported by Ekaterina
Kamenskaya.
I have installed new dictionary for Russian, and checked spell-checking.
Now it's working better than previously, though there are some minor issues
found:
1) Doesn't know some words, for example:
борсетка, парусинник, микрокосмос, микроволновка, персонализация,
обучаемость, стушевать, креативность, употребимый, бесприставочный,
скачивание, наидобрейший....
- including many professional terms:
транслитерировать, топоним, криптоним, диалектизм, номинализация,
семантически (linguistics, literature),
аксельрод, анорексия, аутизм (medical)
идеомоторика, нистагм, парабиоз, хроматопсия, психогенетика,
психофармакология (psychology)
многоплатформенная, проактивная, микродвижение, дискриминантный...
- including slang (and new words):
апдейт, граффити, онлайн, броузер (браузер), коннект, хостинг, киберпанк,
андеграунд, офшор, кайфоломщик, динамщик, жор....
- including some popular names (and foreign names written in Russian):
Мэри (Mary), Марго, Каринэ, Эдип, Ярило....
2) Some nouns aren't accepted when they are in plural genetive form (though
they are accepted in plural or singular nominative case):
очередностей (очередности is OK), совокупностей (совокупности is OK),
соотнесенностей (соотнесенности is OK), переворачиваний (переворачивания is
OK), снисхождений (снисхождения is OK).
3) Doesn't accept words (nouns, verbs, adjectives, atc.) which are written
with Russian letter “ё” (or in capital case - “Ё”):
ёжик, идёт, ёлка, отвёртка, бочёнок, ёмкость, нырнёшь, начнём, всё ....
Note: Some Russian words may be in general written with letter “ё” or “e”.
If the word is written with original “ё” (the same word may be also written
with “е”, but many texts contain words with “ё”), this should be accepted.
Spell-check should have the list of words which may be written with both
“е” or “ё”.
(See some info about this letter here
http://en.wikipedia.org/wiki/Yo_(Cyrillic))
Comments from Jungshik:
---------------------------------
How about getting some help from (Russian) 'stemming' team to enhance
Chrome's spellcheck dictionaries (for Russian and other highly inflective
languages)?
Comments from Hironori:
---------------------------------
It is a great idea.
Once they enumerate words to be added to our spell-checker, I'm going to
write a program which maps declension rules of Russian to suffix rules of
hunspell.
Comments from Hironori:
---------------------------------
Sorry for the lack of my responses.
I'm now writing a fix for the issue (3), but, I'm a little wondering how I
should implement suggestions for words which include 'ё'. For example, when
we make a typo &quot;ёлак&quot; while typing a &quot;ёлка&quot; and need to diplay a suggestion
list, there are three options as listed below:
1. Display both &quot;ёлка&quot; and &quot;eлка&quot;;
1. Display only &quot;ёлка&quot;, or;
2. Display only &quot;eлка&quot;.
Unfortunately, each option requires its own implementation and we need to
choose the best one from the options above. So, I would like to ask native
Russian the best option.

I think Chrome should always suggest the spelling with "ё" only. If somebody tries to avoid "ё", that user can "correct" the spelling provided by Chrome. Only those users will be annoyed who
1) Insist on avoiding "ё" (as opposed to being too lazy to care)
2) Make many typos
3) Rely on the right click to fix those typos
4) Are too lazy to replace "ё" with "е" manually in the words they mistyped
It's hard for me to believe that somebody would admit to satisfying all of the above conditions at once.

Please, exclude the letter "x" from you dictiniaries. There are so few fords with it. And you better can use "ks".
Is's the same you talk about, then excluded "ё". You have no order to force people write incorrect words with mistakes and forget their language.

The new dictionary with 'Ё' is working, but
1) There's no verbs in the future in it. 'Создаёт', 'передаёт'.
2) There's no pronouns. 'Твоё, моё, всё'.
3) There's no particles. 'Ещё'.
How to contribute to the dictionary?

Many words from the original bug report are in the dictionary now, but some still need to be added. Regarding the "ё" problem, "нырнёшь" and "начнём" are still not recognized.
Both the "е" and the "ё" versions are suggested, but they are not always adjacent, and the "е" version is shown above, which is probably wrong. In some cases, the "ё" version doesn't fit the list, which is bad. Try e.g. "чорный" - "черный" is suggested, but "чёрный" is not. Generally, replacing "о" -> "ё" should be preferred over "о" -> "е" based on pronunciation.

"Ё" is the one of 33 alphabet letters of my native language.
There's no reason to discriminate it.
Going such simplifying way is like living in english dictionary only "u" and "ur" instead "you" and "your".
It makes language meagerly niggardly.

3 years peaple ask for `ё` support and there is no reaction from Google. Should the goverment block google all the time they have start to react non-US laws only, especially if they are a global company and have the representative and offices in other countries?

If you unzip it and take a look at russian-aot-ieyo.dic, then it has "eлка", but does not have "ёлка". That's probably not the right one, unless it somehow converts "e" into "ё" for these words in .aff file.

In that case let's go with one of Lebedev's original dictionaries. I think it's better than AOT, because AOT does not have many of the words in the original report: борсетка, парусинник, микрокосмос, микроволновка, персонализация, обучаемость, стушевать, креативность, употребимый, бесприставочный, etc.. In contrast, Lebedev's dictionary does not have many fewer words: употребимый, аксельрод, номинализация, хроматопсия, психогенетика, многоплатформенная, проактивная, etc..
Technically, we can keep these in the Chrome-specific file ru_RU.dic_diff, but please petition the dictionary maintainers to add the words you need. That way OpenOffice, Firefox, Thunderbird, and Chrome will all benefit from updated dictionaries, not just Chrome alone.

Guys, why do you think it shouldn't find "ёлка" when searching "елка" or vice versa? It's the same word, that's the reason search works like this.
Also, you are taking about searchm but this is issue for spell-cheking. It's two different parts of te problem.

Unaccenting (removing diacritic signs from lexemes) is a common practice. In Russian most of search engines will normalize the input, replacing all “Ё” with “Е”. The popular Russian SE yandex.ru will create almost identical output for «ёлка» and «елка». The reason is obvious — due to the historical reasons a lot of people are using “Е” instead “Ё”, and if we will treat them as separate letters we will miss “misspelled” words.