respell version 0.1

A tool to convert English text from one spelling system to another.
At present, there are spelling files for american, british, and
canadian spellings.

System requirements

Perl 5, and ispell
version 3.2.06.epa1 or later. This is an unofficial release of
ispell made to incorporate the new -e5 expansion option: the code will
be merged back into the main ispell tree when the maintainer has time.

How it works

Simply having a lookup table from one spelling convention to another is
not enough. Often there are two words which, because of differences
in meaning or in pronunciation, are spelt differently in one system
but the same in another. This is most notable moving from british to
american spelling: for example cheque/check -> check, curb/kerb ->
curb, and many others. But there are examples in the other direction
too, for example vice/vise -> vice, and analyses/analyzes -> analyses
(where the difference is in pronunciation as well as meaning).

Instead we create one lookup table for each language from 'words' to
one or more 'spellings' for each word. A 'word' is an uppercase key
like ANALYZE or CHEQUE, and two words are separate if there is any
spelling convention which assigns them different spellings. Then to
convert from one spelling convention to another, we do a reverse
lookup in the source spelling's table from each character string to
its corresponding 'words' (which may be more than one), and then in
the target's table we find the most common spelling for each word.
When more than one word is involved, and the most common target
spelling for these words differs, the user must be asked what the
intended meaning (or pronunciation) was.

For example, suppose we wanted to translate 'prophesy' from american
to british. Looking up in american reveals two words which could use
that spelling:

PROPHECY: prophesy
PROPHESY: prophesy

Now looking up the two words (PROPHECY and PROPHESY) in british gives:

PROPHECY: prophecy
PROPHESY: prophesy

So the two possible choices are 'prophecy' and 'prophesy'. It's the
user's job to pick between them.

In fact, the spelling files give several 'words' on each line, by using
ispell-style expansion flags. The IspellExpand.pm module runs ispell
with the new -e5 option to convert these to several words. This is
why version 3.2.06.epa1 or later of ispell is needed.

'Universal' spellings

It is possible to combine two or more spelling files to produce a single
spelling which can be converted to any of them without loss of
infomation, a kind of 'universal donor' spelling. For example it would
have the prophecy/prophesy distinction, but also analyses/analyzes.
It turns out that canadian spelling is fairly close to being 'universal'
for english, but it needs some tweaking. The best universal spelling
comes from combining canadian, british and american in that order
(so the canadian spellings are listed first, where possible) and
is generated by 'make' in the file 'ucba'.

How to use it

Three spelling files are provided: american, british and canadian.
These can be loaded by the Spelling.pm module and conversion tables
can be built. Then there are two executables:

respell

The program 'respell' is a filter. Give it two spelling files (from
and to) and it will convert text from one to the other. When more
than one possible output choice is possible, the several choices are
included in the output inside square brackets. For example,
[ prophecy prophesy ]. You can disable this, and just
pick the most common target spelling, with the -f option.

Words which don't need changing, and nonword characters, are passed
through unchanged. By default, respell will only deal with lowercase
words. The -i option tells it to handle Capitalized words, and -I
handles UPPERCASE words.

Finally, the -q flag suppresses most of the chatter.

respell.cgi

A slightly more sophisticated interface to respelling documents. For
speed, this doesn't use the Spelling module but instead prebuilt
lookup files. You can build these files with 'make'.

You need to install respell.cgi on your web server together with the
data files. There may be a live demo at the website for this project
(see below).

Download

If you want anonymous CVS access, ask and I might be able to arrange it.

Demo

Sorry, since the move to a new web server the live demo no longer
works. I hope to have it back up soon.

Installing

Currently, there is no 'make install' mechanism. You can either run the
programs from the directory where they were unpacked, or copy the
executables and .pm files somewhere suitable.

'make' will build some data files needed for respell.cgi. 'make test'
will check that the conversion tables are as expected. There are
corresponding 'make full' and 'make test_full' targets for an
exhaustive set of files converting every possible spelling to every
other.

Future plans

Better packaging and documentation, including manual pages.

Convert Unix message catalogues - that was the original purpose of
writing these tools but I haven't had time to do it yet. Ideally
the message file would be written with 'universal' spelling and
variants for en_US, en_GB and en_CA generated automatically.

It would be nice to write the spelling files in a more compact format,
as a set of 'policies' - so that every -ize word doesn't need to be
listed separately, for example. Or to give one spelling as a delta
against another.

Related projects

This tool doesn't handle the various spelling reforms proposed for
English, which are much more wide-ranging than the small differences
between US and UK spelling. The semi-free program BTRSPL converts
between standard English spelling and one of three spelling reform
proposals.

The varcon table has
the same purpose as this project, but it's inadequate because it
doesn't handle one spelling mapping to two or more. It should however
be possible to generate a new varcon list from this project's data files.