AutoReplace

I was trying to think of some useful hotstrings to make an auto-replace script, when I remembered that Microsoft Word 2003 has a feature called AutoCorrect (I'm not sure if it's in all versions). So I made a little script to take all the entries from MS Word and create my auto-replace script for me.

Note: I wondered why you used the edit controls instead of scanning directly the ListView below, but it seems that Microsoft thought it was too simple to use standard controls (RichEdit for one-line Edit control?!) and their "bosa_sdm_Microsoft xxx" control cannot be controlled from AHK.

OK, since I see an (deactivated on my version) option for rich text corrections (hotstrings), perhaps there was a need for more sophisticated controls, but they should have made their bosa stuff to answer standard ListView messages...

Sawfoot sent me 900 additional misspellings, mostly based on MS Word's AutoCorrect as described above. I've included these new misspellings in the main AutoCorrect script at http://www.autohotke... ... orrect.ahk

Description: This script corrects about 4700 common English misspellings on-the-fly via hotstrings. It also includes a Win+H hotkey to make it easy to add more misspellings.

Yeesh, looks like you and I have duplicated a lot of work: I, too, got the wiki list, but got the autocorrect list from OpenOffice instead of MSOffice, and a few other sources, like a press release by texttrust, and my own typos.

Here are a few I use in mine, as separate code blocks. The first one corrects the pernicious "-ign" instead of "-ing" ending.

Next, all the word stems and tails. I expected these to slow down my machine a lot, but have not noticed any significant increase in CPU usage when typing.

This has a double advantage: it significantly reduces the size of the wordlist, and it also means that it catches words that I would not normally have bothered to put in the list, like "abbreviatedly".

Some of them are also doing stuff that just wouldn't be possible to fix any other way.

Finally: handy sorting tip. I use unix' sort to sort my wordlist by the correctly-spelled word, then the incorrectly spelled one, and strip duplicates. To do this I use the command:

sort -u -t: -k5 < infile > outfile

Please do feel free to use any of the lists in this post: as mere lists, they are not subject to copyright. If there is sufficient creativity in any part of the code to be copyrightable, I release it to the public domain, and you may do as you wish with it, without attribution or limits.

Sorry for the delayed reply. Thanks for posting your lists and additions. If you or anyone else would like to take charge of maintaining the master spelling-correction script, that would be helpful. However, it seems best to omit the really obscure/personal misspellings to prevent the size from ballooning too much. This is because the size affects performance: hotstrings weren't really designed to support thousands of abbreviations (but apparently it performs well enough, especially with Moore's Law as the wind in our sails).

Happy to maintain it, if you tell me what needs doing and where I should stick the uploaded version. In the absense of guidance, I'll create a wiki page similar to http://www.autohotke... ... er_Control and upload to ~anonymous in the same way as on that page.

Things I'd probably do, unless told not to:

1) Personally, other than sections which need to be separated because they perform different tasks that users may want to disable en masse (the -ing fix, diacritics) I'd be inclined to merge the lists from various sources in order to prevent duplicates, and because the source isn't relevant to the user. Sources should probably be credited in the comments at the top of the script instead. This would prevent failure to credit sources for words that had been removed from their section due to duplication, and would mean that a single alphabetised list would be considerably easier to search.

Most importantly, it'll be less work for me. So let me know if you definitely want the sections kept separate.

2) I hate global hotkeys, especially where they are hardcoded. I know, I know, a heretical idea for an AHK user. But, for example, I have three applications that try to make ctrl-alt-I do stuff even if they are not the focused window, and I hate the clashing mess it creates.

So to globally grab Win+H for a function that some people will not even use seems intrusive, and will break any program that wants to use that hotkey. So I'd be inclined to instead have "add an entry" as a rightclick context menu thing, with an optional user-configurable hotkey.

3) I would probably retain all typos from all current lists. While I agree that "it seems best to omit the really obscure/personal misspellings to prevent the size from ballooning too much", figuring out which words those are would be awfully tricky and time consuming, unless there's a simple way that I'm missing. I have tended to only add terms where they're regular typos, where they're included in "most frequently misspelled" lists (eg the TextTrust press release), or where they're included in "autocorrect" lists. This prettymuch guarantees that there'll be a fair number of people who'll make each typo.

The only two exceptions are the wikipedia list (which is still made up of fairly common typosthat have been made by wikipedia contributors, but many are not critically frequent), and the diacritics list. My reasoning for including the latter is that creating diacritics in most keyboard layouts is complex enough that most people will not know how to create, say, an umlaut. So, while it contains uncommon words, the diacritics section is valuable because it provides correct spellings for even the rare words which would be otherwise impossible for people to type.

In the extreme case, it could keep a count of which typos people make and how often, and auto-upload that anonymously every month or so. That'd show which corrections are not being used. But the CPU taken by the monitoring would considerably outweigh any saving made.

4) An application exclusion list is important, I feel: some applications should probably be excluded, such as those with their own autoreplace systems, programming IDEs, etc. The diacritics section probably needs a longer list, and should have a context-menu option to disable it completely.

5) Either maintain two lists (commonwealth and US spellings), or have a toggle to switch between them. Though where possible, most corrections should probably just avoid the distinction, eg correcting "collour" to "colour", and "collor" to "color", reguardless of their locale preferences.

==Sadly, I am largely unaffected by Moore's law, so as a test I took your list and appended my code blocks to the end. According to the task manager, on my 500MHz machine, cpu usage hit 4% if I typed frantically. If I just dragged my fingers across the keyboard, I got it to hit 9%.

On my 1.2GHz machine, I could not get higher than 0% CPU by typing, or 4% by sliding my hands up and down the keyboard frantically.

These seem reasonable figures, comparable to Notepad's own CPU usage from the same input. They would be very slightly improved by removing duplicates.

Incidentally I found a (probably-known) bug in the forums "copy" link while doing that test: if you click "copy" to copy the diacritics block, all non-ascii characters are replaced by "?" but if you manually highlight and copy, this does not happen.

Happy to maintain it, if you tell me what needs doing and where I should stick the uploaded version. In the absense of guidance, I'll create a wiki page similar to www.autohotkey.net to provide best performance). In other words, I prefer no intermediate page unless you think some kind of introduction or documentation becomes necessary.

1) Personally, other than sections which need to be separated because they perform different tasks that users may want to disable en masse (the -ing fix, diacritics) I'd be inclined to merge the lists from various sources in order to prevent duplicates, and because the source isn't relevant to the user. Sources should probably be credited in the comments at the top of the script instead. This would prevent failure to credit sources for words that had been removed from their section due to duplication, and would mean that a single alphabetised list would be considerably easier to search.

That sounds great.

2) ...So to globally grab Win+H for a function that some people will not even use seems intrusive, and will break any program that wants to use that hotkey. So I'd be inclined to instead have "add an entry" as a rightclick context menu thing, with an optional user-configurable hotkey.

I believe the hotkey was originally Jim Biancolo's idea. From a benefit/cost point-of-view, having some kind of global hotkey enabled by default probably does more good than harm, especially when you take into account existing users (i.e. backward compatibility). On the other hand, existing users would have to explicitly upgrade to the new script you create, so maybe disabling the hotkey by default would be acceptable.

3) I would probably retain all typos from all current lists. While I agree that "it seems best to omit the really obscure/personal misspellings to prevent the size from ballooning too much", figuring out which words those are would be awfully tricky and time consuming, unless there's a simple way that I'm missing. I have tended to only add terms where they're regular typos, where they're included in "most frequently misspelled" lists (eg the TextTrust press release), or where they're included in "autocorrect" lists. This prettymuch guarantees that there'll be a fair number of people who'll make each typo.

That sounds reasonable. I just wanted to avoid having more than 30% of the list comprised by off-the-wall misspellings that most people would frown upon if they actually saw them.

...the diacritics list. My reasoning for including the latter is that creating diacritics in most keyboard layouts is complex enough that most people will not know how to create, say, an umlaut.

I'm wary of including that due to possible interference with programming languages such as AutoHotkey, which uses backtick as an escape symbol. However, I'm not sure exactly what you have in mind, so maybe there's no cause for concern.

In the extreme case, it could keep a count of which typos people make and how often, and auto-upload that anonymously every month or so. That'd show which corrections are not being used. But the CPU taken by the monitoring would considerably outweigh any saving made.

That would be an interesting project; but I agree that it's too costly.

4) An application exclusion list is important, I feel: some applications should probably be excluded, such as those with their own autoreplace systems, programming IDEs, etc. The diacritics section probably needs a longer list, and should have a context-menu option to disable it completely.

If it's verified that the autocorrect script actually interferes with a particular app, I agree that it would be nice to exclude it. Perhaps GroupAdd and #IfWinNotActive ahk_group will be sufficient.

5) Either maintain two lists (commonwealth and US spellings), or have a toggle to switch between them. Though where possible, most corrections should probably just avoid the distinction, eg correcting "collour" to "colour", and "collor" to "color", reguardless of their locale preferences.

Although that's pretty ambitious, I'm sure many users would welcome it if you have the time.

Sadly, I am largely unaffected by Moore's law, so as a test I took your list and appended my code blocks to the end. According to the task manager, on my 500MHz machine, cpu usage hit 4% if I typed frantically. If I just dragged my fingers across the keyboard, I got it to hit 9%.

Ah, it's good to know the performance is acceptable even on older hardware. Thanks for the test.

Incidentally I found a (probably-known) bug in the forums "copy" link while doing that test: if you click "copy" to copy the diacritics block, all non-ascii characters are replaced by "?" but if you manually highlight and copy, this does not happen.

I just thought having a wiki page as well would be nice. Though thinking about it, I can't think of much to say about it, so maybe I'll save myself the trouble :)

existing users would have to explicitly upgrade to the new script you create,

That's a good point, and could be an issue for the future. People will have their own wordlists and will find it irritating to repeatedly have to cut-n-paste their wordlists into the updates. Should personal wordlists be a separate #included file?

Also, people will always disagree with some of the autocorrections, and will remove or modify them. But when they upgrade, going through the new list to try to remember which ones they removed or changed, to do it again, would be really irritating. Perhaps I should also provide each update as a diff to the previous version, for those who want to upgrade existing modified lists.

maybe disabling the hotkey by default would be acceptable.

I'm not fussed either way: I'll have it on by default, but will try to provide the option turn it off or remap it, and to access the same functionality through rightclick for when they can't remember the hotkey :)

I'm wary of including that due to possible interference with programming languages such as AutoHotkey, which uses backtick as an escape symbol. However, I'm not sure exactly what you have in mind, so maybe there's no cause for concern.

Possibly not - by "the diacritics list" I meant the "Next is all (well, a good portion of) the accented words in English" block from my post above.

I'm inclined not to include the zodiac signs block just above that: including that opens the door to titlecasing all sorts of other proper nouns, like people's names, places, weekdays, months, SI units and so on. And that's a project in itself. It's also the sort of thing "that most people would frown upon if they actually saw them."

That would be an interesting project; but I agree that it's too costly.

I may be wrong about the CPU cost though, since the monitoring (incrementing a counter) would only happen when a hotstring triggered - it would not be something that ran during normal use. I still think it's too costly in terms of user privacy (and developer time, and lines of code, etc), though.

If it's verified that the autocorrect script actually interferes with a particular app, I agree that it would be nice to exclude it. Perhaps GroupAdd and #IfWinNotActive ahk_group will be sufficient.

Yup, that's what I was thinking. I was also optimistically imagining something where people could say "exclude THIS window" with a hotkey or something.

Although that's pretty ambitious, I'm sure many users would welcome it if you have the time.

I have mostly maintained my list that way, but made some concessions to the fact that I live in the UK. I can just search through for "ou" and other common commonwealth forms and figure out if they need to be in the "specialist" section. Finding the americanisms will be harder. But I guess its a "living document", so I can fix 'em as I find 'em :)