Spellchecker for Xaraya or other UTF-8 XML files

This Perl script is an integration of other 2 scripts, allows users to spellcheck UTF-8 encoded XML files. The script is designed to spellcheck Xaraya CMS translations, but you can use it for other UTF8-XML files too. The source code is pre-configured for checking Xaraya files (e.g. default XML node names).

Spellchecking Xaraya:

If you use your national Xaraya site (NN.xaraya.com) for translating the system, spellchecking Xaraya is done by spellchecking a downloaded local copy of the language pack, and fixing the errors online. You must run this program on those local files and fix the errors in the Translations module online on the NLS site.

This may sound a bit odd, but this maintains the advantages what you already had on the NLS site, like co-operation, BitKeeper push and so on. Believe me, the work is very simple and quick this way.

To help the translation, the temporary TXT files use a filename which refers to the real template file, thus you can identify which page to load in the Translations module. The name of the file is always displayed on the top of the spellchecker window (assuming you use ispell).

To spellcheck a module run a "find" command on your Linux, because this "lazy" script can spellcheck only a single file at a time (the unix philosophy..).

find modules/articles -name \*.xml -exec xml_utf8_spellcheck.pl {} \;

Theoritically you could also save the changes back to the file direcly, if you want to fix a local copy of the language pack.

Download

Original man page

(99% of it was written by the author of xml_spellcheck):

NAME

xml_utf8_spellcheck

SYNOPSIS

xml_utf8_spellcheck [options] <files>

DESCRIPTION

xml_utf8_spellcheck lets you spell check the content of an XML file. It extracts the text (the content of elements and optionally of attributes), decodes utf8 to latin1/2, call a spell checker on it and then recreates the XML document.

OPTIONS

Note that all options can be abbreviated to the first letter These are the original options of xml_spellcheck.pl.

--conf <configuration_file>

Gets the options from a configuration file. NOT IMPLEMENTED YET.

--spellchecker <spellchecker>

The command to use for spell checking, including any option. By default "ispell -d magyar" is used

--backup-extension <extension>

By default the original file is saved with a ".bak" extension. This option changes the extension

TODO

PRE-REQUISITE

SEE ALSO

XML::Twig

COPYRIGHT AND DISCLAIMER

This program is Copyright 2005 by Ferenc Veres Original xml_spellcheck is Copyright 2003 by Michel Rodriguez

This program is free software; you can redistribute it and/or modify it under the terms of the Perl Artistic License or the GNU General Public License as published by the Free Software Foundation either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MER- CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

If you do not have a copy of the GNU General Public License write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Exisitng editors for text data DjVu files are quite limited, like for example DjVuSmooth. So I've implemented a new editor in JavaScript, that allows editing both the strucutre of the text (paragraphs, lines, words,...) and the coordinates of the text boxes by simply dragging with the mouse, features like create, delete, merge are also available.