Description:

This project is an evolution of the spell
checking engine project I submitted earlier. This project includes
numerous enhancements to the core spelling engine plus the addition of a
"check-as-you-type" edit control and the related support dialogs (see
above).

This project is not complete, it is a
work-in-progress. There are numerous issues with the current version which
need to be addressed. My long term goal is to develop this to
"commercial quality".

I am going to continue to improve this engine
toward my goal. I will continue to post updates as I feel necessary.

Classes:

CFPSSpellCheckEngine

This is the core spelling engine. It is intended to
be language independent. (not currently). This engine encapsulates
the functionality of managing dictionaries, making suggestions (through
dictionaries), and maintaining spelling options.

CFPSSpellCheckEngineOptions

Support class for CFPSSpellCheckEngine which implements
support for storing, saving and loading spell checking options.
Currently, this uses a serialized file to store options, but could
easily be changed to INI file or registry.

CFPSDictionary

Base dictionary class. Defines a set of virtual
functions generic to all dictionaries. Also, provides base
implementation of all virtual functions based on current
requirements. This class uses a defined file structure w/ a file
header and any number of dictionary records. Future derivations of
this class will provide language specific support.

CDlgSpellChecker

CDialog derived class which implements the spell checker
dialog. Currently the undo support is based on edit control undo
support, not spell checker undo. Need to improve this further.

This function is called when a
user presses the F7 (or configured) hot key from within the
"check-as-you-type" edit control. It displays the
CDlgSpellChecker dialog box.

NOTE: This function is not currently being used because the rich edit
control is not complete.

int EditDistance(constchar *szWord1, constchar *szWord2)

This function is passed in 2
words and returns an approximation of the minimum number of changes a
user would need to make to make the 2 words match. This function
is not a true edit-distance algorithm, but is a customized algorithm for
this spell checking application.

void MetaphoneEx(constchar *szInput, char *szOutput, int
iMaxLen)

This function is passed a word
and it returns (through the szOutput parameter) a modified-metaphone
representation of the word. This is a variation on the algorithm
originally wrote by Lawrence
Philips.A newer version of his algorithm
(double-metaphone) is also available. I have tested this algorithm
with the spell checking engine and was not impressed with the
results. It does provide fast results and a high hit-rate, but it
also returns far too many results (on average). However, I am
considering using it in conjunction with the EditDistance
algorithm and will further review this.

void SortMatches(LPCSTR lpszBadWord, CStringList &Matches)

This function sorts a list of
word suggestions based on the approximate edit-distance between the
words in the list and the misspelled word based in as lpszBadWord.

Architecture:

CORE ENGINE

The core spell checking engine consists of the three classes: CFPSSpellCheckEngine,
CFPSSpellCheckEngineOptions and CFPSDictionary. These classes provide
support for dictionary related functions such as add a word, remove a word,
ignore a word, load dictionary, save dictionary, Is a word in the dictionary,
suggest possible matches, etc.

The core engine is implemented as a strict
back-end engine. It has no user-interface components. Most of the
functions exposed by these classes where an error might occur return an int
return code. These return codes are defined in 1) FPSSpellCheckerInclude.h
and 2) the header file for a given class. The return codes should always
be examined to determine the completion status of these functions.

Special care has been taken to insure that
these classes are very stable and robust. Also, performance
considerations weigh heavy on the implementation of these classes. Very
little MFC code is used in these classes and functions.

CHECK-AS-YOU-TYPE EDIT CONTROL

The check-as-you-type edit control is contained in the CFPSSpellingEditCtrl
class. It is derived off of CEdit and works by subclassing an existing
edit control through the AttachEdit function.

To improve performance, this control implements
a timer and whenever there is no user activity (typing, mouse clicking,
scrolling, etc) checks the spelling of the displayed portion of the edit
control. The function RedrawSpellingErrors is called to perform the
checking. It checks only the displayed portion of the edit control and
calls DrawSpellingError for each displayed word. If a word is not found
in the dictionary, this function calls DrawSquiglyIto draw the squigly
underline for the word. DrawSquigly creates a structure of type FPSSPELLEDIT_ERRORS
and adds it to the m_SpellingErrors member list.

The OnRButtonDown function checks the m_SpellingErrors
to determine when to display the normal popup menu and when to display the
spell check popup menu. Suggestions returned from the core engine are
sorted using the SortMatches function to display them in order of
edit-distance.

The PreTranslateMessage checks for a hot key
(defaults to F7). This can be customized by calling the SetHotKey
static member function. When the hot key is pressed the CheckSpellingEdit
function is called to display the spell checking dialog box.

SPELL CHECK DIALOG BOX

The spell checking dialog box is implemented in the CDlgSpellChecker
class. This is a standard CDialog derived class based on the IDD_SPELL_CHECK
dialog resource.

The spell checking dialog is modelled after the
Microsoft Word implementation of spell checking. It is laid out the same
and functions (for the most part) the same. This dialog searches an edit
control (or rich edit control) for sentences misspelled words and displays the
sentence with the misspelled word highlighted.

Suggestions returned from the core engine are
sorted using the SortMatches function to display them in order of
edit-distance.

How to use the demo:

Unzip the provided file into a directory (be
sure to extract the sub directories.)

Make sure that the USMain.dic file is in the
\Release directory.

Make sure that the USCommon.dic file is in the
\Release directory.

Execute the FPSSpellChecker.exe from the
\Release directory.

How to incorporate the spell checker into an
application:

In your applications InitInstance function,
add a call to CFPSSpellingEditCtrl::InitSpellingEngine(NULL) static member
function; OR, instead of NULL, pass in a string containing a fully qualified
path to a spell checking engine options file.

In your applications ExitInstance function,
add a call to CFPSSpellingEditCtrl::Terminate static member function

Add the following files to your
project.

DlgSpellChecker.cpp

DlgSpellChecker.h

DlgSpellingEditCtrl.cpp

DlgSpellingEditCtrl.h

FPSDictionary.cpp

FPSDictionary.h

FPSSpellCheckEngine.cpp

FPSSpellCheckEngine.h

FPSSpellCheckEngineOptions.cpp

FPSSpellCheckEngineOptions.h

FPSSpellCheckerInclude.cpp

FPSSpellCheckerInclude.h

FPSSpellingEditCtrl.cpp

FPSSpellingEditCtrl.h

PrPgeSpellOptions_Common.cpp

PrPgeSpellOptions_Common.h

PrPgeSpellOptions_General.cpp

PrPgeSpellOptions_General.h

PrPgeSpellOptions_User.cpp

PrPgeSpellOptions_User.h

PrShtSpellOptions.cpp

PrShtSpellOptions.h

Copy the following resource items to your
project.

IDD_SPELL_CHECK

IDD_SPELL_OPTION_COMMON

IDD_SPELL_OPTION_GENERAL

IDD_SPELL_OPTION_USER

Include the "FPSSpellCheckerInclude.h"
file in your stdafx.h file.

#include "FPSSpellCheckerInclude.h"

Place a standard edit control on a form or
dialog resource and give it a unique control id (ie. ID_TEST_EDIT)

Add a member variable of type CFPSSpellingEditCtrl
to the dialog/form class file (ie. m_editTest)

In the OnInitDialog function, call the
AttachEdit member function of CFPSSpellingEditCtrl (ie. m_editTest.AttachEdit(this,
ID_TEST_EDIT);

Known Issues

Performance is still not as good as it needs
to be.

Language support is limited to US English.

The EditDistance function needs work.

The MetaphoneEx function needs work.

There is a painting problem with the edit
control when scrolling the control while the spelling error "squigly"
lines are displayed.

No complete support for rich edit control.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

There's a newer algorithm by the author of Metaphone known as the Double Metaphone. You can find it at http://www.cuj.com/archive/1806/feature.html. I don't know how it will compare to your modified Metaphone, but you shoud check it out.

Thanks for the URL. Actually, I have already been researching the double-metaphone algorithm and I intend to implement it into the spelling engine. Some other things I have learned, though:

1) Some commercial spell checkers also use a word-reduction algorithm (which keeps some vowels) to augment their search results. I have been looking at how to implement such a routine as well.

2) At least MS Word (and probably others) also have developed a database of human-created word-reduction and metaphone outputs. These human-created outputs are used in thier dictionaries as opposed to the computer-created ones to provide a better output. I have already gone through my USEnglish dictionary and hand-coded many words with the letter 'G' in them.

The matching engine uses numerous "english specific" algorithms to enhance the reuslt list. I do not know much German, so I am not sure how well the engine will map to German. It would be worth a try, though. Progably the #1 function in the whole class which would need modification for various languages is the MetaphoneEx function. I think this function could be modified to work for German.

finish english first, but know:
english is too easy comparing to german and especialy (eastern)-europe/slavic-and-others languages

generaly one big difference is in english is one word for all circumstancies
in german there are 4, we (s) 7 object-word sub-kinds;
in english you say: of word, about word, with word
we say: zo slova, o slove, so slovom

similar for another word kinds (i do not try name them in english):
in english you have green, in mine: zeleny (he), zelana (she), zelene (it), ... (similar in german: something like gruner, grune, grunes)
in english more green/greener (?, stupidity, take as example only), we have zelensi; most green - najzelensi
and combinations: about green word - o zelenom slove (german: um grune wort (?!))

I guess english is the only western language (the only ones I can talk about) where a spell-checker with out grammer-check makes sense.
In all other languages you would probably first reduce a word to its pre-/suffixless root(s) spellcheck root(s), check if root(s) support the pre-/suffixes and then recombine.
Multiple roots occur in languages like german (which allows allmost free combination of many words into one, a feature which is very commonly used up to three words (the combinatorics start numbers getting big here )).
Roots would in generally not be unique (suffixes like -s -es, prefixes like a- an-).
The suffixes are mostly grammer implied and make for a good part of the spelling errors.
Suffixes of different words must match (or rather the implied grammatic entities).
Grammar only can decide if a specific word is noun adjective or verb (with nouns capitalized in german).

So an 'english' spellchecker could be used to check the roots, with some code added for the rest .

I think your project have big potential and lots of us are willing to help you to create something really big from it.
Also I have lot of wordlists of different languages, so contact me if you are interested to publish them.

OK, so the problem is not the Spell Checker, but the languages. As I'm coming from Holland, I know a saying in English: Double Dutch, so it is. Like the big brother of Dutch: German, Dutch has words which are male, female, multiple or no-gender. In German, you've got articles like:

Der, Des, Dem, Den, Die which all do mean: THE
and
Das, Der, Die which all do mean: IT

In Dutch it's more English-Like: "De" and "Het" for IT and "De" only for THE, but if you whant to use a prefix, to get a word have a more tiny sound, you must use "Het" in any case, even if it has a gender.

I think, if someone wants to create a new language, the english classes are obsolete. I think the solution is to create different classes with words in it like:

CGenderMale, CGenderFemale, CMultiple and CNoGender. Also CNoun, and CSuffix (Whick can be language specific, e.g.

Get the idea? Now, if a word has only one part, only Suffixes are displayed e.g. Cool (Cooler and Coolest). For a word with two parts the Suffixes and Prefixes are shown e.g. Crazy (Crazier but olso More Crazy) and the last, when a words has 3 parts or more, only Prefixes are shown e.g. Pathetic (More pathetic, Most pathetic)

Get the drill?

And there are numerous classes needed for feeding information about whether the word is Irregular or not, if it's a verb or not and so on. So I think a wide discussion is needed to get the "Perfect" Spell Checker.

Alright, this is great so far. I'd love to use this in our commercial project, I'd love to help develop the software at no cost providing we can us it in our commercial software. BUT, it's lacking language support, as mentioned. Now, previous writers wants some European languages, even eastern European, but we'd require world wide support including Thai, Chinese etc and preferrably also some functionality for translation dictionary (?). Meaning there is an english text and you want to get suggestions for translated words in a second language.

How does this sound? Currently I think we would not want to dig into this because it's too far off at the moment.

Any support on this project is appreciated. I have been working on it now for about two months in my spare time and there is still a great deal to do. I have been doing some research on non-english language support, and I have learned a great deal about it. I am comfortable stating that when finished it will support multiple european languages. I have not researched languages like chineese but I know that there would be signifigant requirements to make it work.

That said, I am a professional deverloper doing this on the side and I would not feel good recomending this project for a commercial product at this time. The amount of time required to complete it and the probably availibility of comparable existing products would probably lead me to look for an off-the-shelf solutions.

The concept of suggesting words in another language has come up before. From what I have learned, it is doable, but requires very good and exhaustive dictionaries with information on word usage, sentence patterning, etc.