@codes = soundex_nara(@names);# Get the list of codes for a list of names.

# Redefine the value that soundex() will return if the input string

# contains no identifiable sounds within it.

$Text::Soundex::nocode = 'Z000';

DESCRIPTION

Soundex is a phonetic algorithm for indexing names by sound, as
pronounced in English. The goal is for names with the same
pronunciation to be encoded to the same representation so that they
can be matched despite minor differences in spelling. Soundex is the
most widely known of all phonetic algorithms and is often used
(incorrectly) as a synonym for "phonetic algorithm". Improvements to
Soundex are the basis for many modern phonetic algorithms. (Wikipedia,
2007)

This module implements the original soundex algorithm developed by
Robert Russell and Margaret Odell, patented in 1918 and 1922, as well
as a variation called "American Soundex" used for US census data, and
current maintained by the National Archives and Records Administration
(NARA).

The soundex algorithm may be recognized from Donald Knuth's
The Art of Computer Programming. The algorithm described by
Knuth is the NARA algorithm.

The value returned for strings which have no soundex encoding is
defined using $Text::Soundex::nocode
. The default value is undef,
however values such as 'Z000'
are commonly used alternatives.

For backward compatibility with older versions of this module the
$Text::Soundex::nocode
is exported into the caller's namespace as
$soundex_nocode
.

In scalar context, soundex()
returns the soundex code of its first
argument. In list context, a list is returned in which each element is the
soundex code for the corresponding argument passed to soundex()
. For
example, the following code assigns @codes the value ('M200','S320')
:

@codes = soundexqw(Mike Stok);

To use Text::Soundex
to generate codes that can be used to search one
of the publically available US Censuses, a variant of the soundex
algorithm must be used:

EXAMPLES

Donald Knuth's examples of names and the soundex codes they map to
are listed below:

Euler,Ellery-> E460

Gauss,Ghosh-> G200

Hilbert,Heilbronn-> H416

Knuth,Kant-> K530

Lloyd,Ladd-> L300

Lukasiewicz,Lissajous-> L222

so:

$code = soundex'Knuth';# $code contains 'K530'

@list = soundexqw(Lloyd Gauss);# @list contains 'L300', 'G200'

LIMITATIONS

As the soundex algorithm was originally used a long time ago in the US
it considers only the English alphabet and pronunciation. In particular,
non-ASCII characters will be ignored. The recommended method of dealing
with characters that have accents, or other unicode characters, is to use
the Text::Unidecode module available from CPAN. Either use the module
explicitly:

Since the soundex algorithm maps a large space (strings of arbitrary
length) onto a small space (single letter plus 3 digits) no inference
can be made about the similarity of two strings which end up with the
same soundex code. For example, both Hilbert
and Heilbronn
end
up with a soundex code of H416
.

MAINTAINER

This module is currently maintain by Mark Mielke (mark@mielke.cc
).

HISTORY

Version 3 is a significant update to provide support for versions of
Perl later than Perl 5.004. Specifically, the XS version of the
soundex() subroutine understands strings that are encoded using UTF-8
(unicode strings).

Version 2 of this module was a re-write by Mark Mielke (mark@mielke.cc
)
to improve the speed of the subroutines. The XS version of the soundex()
subroutine was introduced in 2.00.

Version 1 of this module was written by Mike Stok (mike@stok.co.uk
)
and was included into the Perl core library set.