Running as a Windows application

Detagger can be invoked as a normal Windows application. On start-up
you will be presented
with the main window. This consists of a menu bar across the top of the window, and some
data
entry fields in the main body of the window.

Main Dialog

To convert your files, take the following steps :-

Select the files to be converted using the input selection options.
You can type in a wildcard and - if you want - ask the software
to search in sub-folders.

Select the conversion type you want. You can convert to text, or
selectively remove markup.

Choose the type of output you want. You can output the results to
files, or to the clipboard. If you've selected multiple files
(e.g. by supplying a wildcard) you can choose to have all the results
concatenated into one big results file.

Choose the Output Directory you want. You can output to the same
directory as the input files, or to a directory of your choosing. If you
are converting files in sub-folders, then you can choose to replicate
the folder structure under the output directory.

Alternatively, if you've previously saved some conversion settings to
a "Policy File", then you can re-load it. If you've loaded the policy
file before, it will be accessible via the drop-down list.
(For a fuller description see policy files)

Once you've made all your selections, press the Convert File(s) button
at
the bottom. The status window will briefly appear whilst the
conversion proceeds, displaying messages, and a results viewer may be
launched to display the results. _(You can control this behaviour using
the settings menu)_.

If you've fine tuned your conversion and want to save the options
for use next time, save them to a policy file by using the
Save Policy File option on the Conversion Options menu.

Input selection

Normally you need simply select the input file(s) using the Browse
button,
and the rest of the fields will be set to default values. If you want to use
wildcards, type the file specification in the file(s) box directly

If you check the Search Sub-folders option, the program will look for
matching files in sub-directories.

Conversion type selection

The program supports a number of conversion types. You should select
the one that you want to perform.

Convert to text

The output will be a plain text version of the
input file. You can fine tune the conversion to
text by using the text conversion policies

Selectively remove markup

The output will be HTML, with some of the tags
selectively removed. You can control which by
using the Markup removal policies.

If you're removing markup (as opposed to creating text files), then you
will typically be creating HTML files from HTML files. This means you
run the risk of asking the program to overwrite the input file with the
results. If this is what you want, you will need to check the May overwrite
existing files option.

Output selection

When you select your input, the output will default to being a file in the same
directory. However, there are a number of options available to you.

You can select the output type. The default is to file(s), but you can also
output to the clipboard. When converting wildcards the clipboard option is
only useful if you have a clipboard manager in place, otherwise only the
last file will be held there.

When converting to file, you can select the output filename. This is
not
a sensible choice when using wildcards to select the input file(s). If you
don't change this field the output file will match the input filename,
but my have a different extension.

When converting to file, you can select the output directory. If you
are
converting wildcard files and including sub-directories, this option will
put all the files in the one directory. There is no option at present
to output to a parallel directory structure.

Output types

The program supports a number of output types that determine where
the output should go. You should select the one that you want to
perform.

Output text to file
The output will be a file. Depending on the type of conversion
performed, the file will either be a HTML file or a plain text file

Output to the clipboard
You can use the Conversion Type to select the option of placing the generated
output onto the Windows clipboard, ready for use in other Windows applications.

Using Detagger in this way can be a very powerful technique which
allows you
to merge converted text with more traditionally authored content.

This approach becomes even more powerful if you use a Clipboard extender like
ClipMate (see www.thornsoft.com) to remember and organise everything to the
clipboard. You could convert a few files, and then use ClipMate to
recall
the pasted text at your leisure for insertion into your other files.

Concatenate results into one file
When you select a conversion type of Concatenate results into one file,
all your results will be added together in one big results file.

Converting to text
When you're converting to text, the output will be one big text
file, with the results from converting each input file added
after each other.

Removing markup
When selectively removing markup the output will be one big HTML
file. This file will inherit the <HEAD> and <BODY> tags (which
will include any <TITLE> from the first input file). All other
<HEAD> tags will be discarded. Because of this many of the results
properties (e.g. style sheets and <META> tags) will be whatever was
in the first file.

The validity of the resulting HTML file will depend entirely on how well
the markup of the multiple files goes together. It's a classic case of
garbage in, garbage out

File separators
In the output file a separator can be added between the results of one
input file and the next. These are defined using the text fragment
feature.

When creating a text file, the separator fragment is called TEXT_SEPARATOR,
and when creating a HTML file it's HTML_SEPARATOR.

In the registered version both separators are absent by default unless
you choose to define these fragments. In the 30-day trial, both separators
contain short messages.

Output Directory

When outputting to file, there are a number of options for choosing
where the output files will go.

The default is to output the files to the same directory as the input files.
When the conversion type is set to selectively remove
markup this has the potential to overwrite the original files. For this reason
you have to select the 'May overwrite the input files' option.

Alternatively you can choose to output files to a different directory. In
this case the program will overwrite any files already there because these
are not the input files.

Finally, if you have selected the 'Search sub-folders' option, you can elect
to replicate the input directory structure under the output directory, rather
than have all the files found placed in just the one directory.

Menu Bar

The main menu bar appears at the top of the main screen. It has the
following options:-

Configuration Files menu

The Config File Location menu allows you to specify the location of
additional configuration files. The locations you select will
be stored in your policy file, so in a sense these files act as
extensions of the policy file, but by being stored in separate files
the same configuration files can be shared by multiple policy
files.

Selecting the Text Commands File

Selecting the Text fragments File

Load policy file

Detagger has many program options known as "policies" to help you
tailor the
conversion process. These policies can be saved in a policy file for later
re-use in future conversions. This dialog screen is primarily intended to
allow you to load a previously saved policy file

Load policies from an existing policy fileSave policies to a policy file for later re-useReset all policies to their default values

Save Policy File

This window is displayed whenever you wish to save your policies to
a file, usually for use in later conversions.

To save the file, simply select the policy file name, usually
with a .pol extension.

This window contains a radio button with two options:

Save only those policies that have changed

If this option is selected, then only those policies that have
been loaded from an existing file and/or been edited during the
current session will be saved.

This is the recommended option, as it will exclude all policies
that have been set up correctly automatically.

Save all policies

If this option is selected, that all policies are written to file.
This is a good way of documenting the policies used, but is usually
too restrictive to be loaded as input into conversions of other files.

The saved file is a text file designed so that it may be manually
edited and reloaded. If you do so, take care not to change the key
phrases at the start of each line.

Note: If you find that conversions that used to work "stop working" it's
possibly because you're using a complete policy file. If you find this happens,
try creating a new policy file from scratch, or manually removing options
from your existing policy file.

Resetting policies to default values

This option will reset all conversion options ("policies") to their
default values. If a policy file has been loaded, it will be unloaded.

Settings menu

The program settings menu allows you to customise the way Detagger executes
each time it is invoked.

In addition to the above sun-menus, this menu allows you to toggle the
following options, indicated by tick marks.

Show Tool Tips

If checked tool tips will be available to offer
help on the controls on each dialog screen

Show Status Dialog

If checked the Status Window will show
during the conversion, showing messages describing
how the conversion is going.

Automatically view results

If checked a file viewer will be launched
after the conversion to view the results. This
will either be a HTML browser of a text editor
depending on the type of conversion being done.
See results viewers settings

Remember settings on exit

If checked the program will remember
the selected files and conversions details
for next time

If selected the 'Tip of the Day' screen is shown
and you can choose whether or not this should
also be displayed on startup.

Diagnostic settings

These options allow you to set the level of error reporting, or to
suppress messages of various types from being displayed during
conversion.

The types of messages include :-

INFO messages

Informational messages. These convey information
telling you what was been done and why.

WARNING messages

Warning messages. These tell you that something
you have requested has not been done, or something
has been done which may not be correct. It's possible
you may be able to take corrective action.

TAG ERROR messages

Tagging errors. Only occur when you use the
preprocessor in-line tags and directives.

PROGRAM ERROR messages

Program errors. The program has detected it
has done something wrong. The conversion may still
be successful, but there is nothing you can do about
such messages except report them to the program's
author at info<at>jafsoft.com

Drag and drop settings

These options specify the behaviour of Detagger when invoked via drag and drop (i.e. by
dropping a file icon on the program's icon).

Show the status screen

The status dialog, showing messages
reporting how the conversion is going
should be shown.

View results in browseronce complete

The selected viewer (browser) for the
results files should be invoked on the
last file converted once conversion is
complete

Start program afterconversion

The program should be launched in Windows
mode once the conversion is completed.

Results viewers settings

This identifies the viewers to be used whenever Detagger launches an
application to view a results or documentation file. Viewers may be
required for both HTML (when detagging) and TEXT (when converting to text)
files.

Automatically view results files

You can elect to have results viewed automatically after
each conversion. This will normally result in the named
application being launched to view the last file converted.

Command used to view HTML files

For HTML, you can elect to use Dynamic Data Exchange (DDE) to
have the results displayed in a currently active browser. This can
be quicker and more efficient that launching a new instance of
the browser each time. You should ensure your DDE browser
matches the program named as the default browser so that if
not already active, the program can start a fresh instance.

When DDE is used the results will vary from browser to browser.
IE for example will come to the front, whereas Netscape will not,
and if it is minimised you won't see the results until you
maximise the browser again.

NOTE: On some systems problems can occur with DDE that will
cause the program to hang whenever it attempts to display
a HTML file. When this happens the program will need to
be stopped via the task manager. The next time the program
runs it will detect that this problem has occurred and
disable the use of DDE.

Add "file://localhost/" prefix

For HTML files viewed from your local hard drive the prefix
"file://localhost/" should be used in place of the "http:/" used
for Internet access.

Unfortunately some browsers (take a bow IE 3.0) do not support this,
so the addition of this prefix may be disabled if you're using
such a browser.

Command used to view TEXT files

For TEXT files, DDE is not currently available, so you simply provide
the command to view TEXT files (usually just a text editor or NotePad).

Use of policy file settings

Using a default policy file

This determines which policy file, if any, is to
be used by default when the program is first invoked. The actual policy
file used can, of course, be changed via the policy dialogue.

The default policy file will also be used if the program is invoked via
drag'n'drop. This avoids the need for creating batch files with
the policy file name on the command line.

Always reload policy file during conversion

This specifies that the current policy file should be reloaded every
time the conversion is done. If the file is large, and you are
repeatedly converting using the same policy file, then this can
slow you down. On the other hand if you are editing the policy
file by hand outside the program between conversions then you will
want this option enabled.

Tip of the day

The "Tip of the Day" screen is shown by default each time you start up
the program. This behaviour can be disabled by clearing the checkbox
on the screen.

The tip shown will change each time the screen is displayed, and in addition
you can review all the tips available by using the buttons marked "<<"
and ">>" to go to the previous and next tips. The number of each tip is
shown in case you should want to revisit it at a later date.

The Tip of the Day screen can be shown at any time by selecting the
option on the Settings menu.

At present all tips are only available in English.

Language menu

It is possible to change the user interface to the language of your choice.
Translations are provided by a number of volunteers who help converting the
menu, dialog, and ToolTips text. The message and documentation text remains
in English for the time being. As such these don't offer a full translation,
but will hopefully be of some use to those whose first language isn't
English.

At any given time you may still find English translations, especially in the
messages displayed, and in the help and documentation files, but it is hoped
that the efforts of these volunteers will make the program easier to use
for non-English speakers.

Supported languages

At present work is under way on

Spanish

Gonzalo San Martin is undertaking the Spanish translation.
Gonzalo operates a highly popular Real Madrid fan page (in
Spanish and English) which you can visit athttp://members.bigfoot.com/~G.SanMartin/
Gonzalo can be contacted at G.SanMartin<at>bigfoot.com

Italian

The Italian translation is being undertaken by
Gianluigi Pizzuto who can be contacted at gibly<at>libero.it and
has a web page at http://web.tiscalinet.it/fotone

Swedish

The Swedish translation is being undertaken by Dan Svarreby
who can be contacted at dan.svarreby<at>home.se.

French

The French translation is being undertaken by Andre Martinez.

Russian

The Russian translation is being undertaken by
Alexander (aka J-34) at j34<at>mail.ru

Dutch

The Dutch translation is being undertaken by Jurrien Dokter,
who can be contacted at info<at>axswebsolutions.nl and runs
the web site at http://www.axswebsolutions.nl/

If you would like to volunteer to help with this effort, please email
info<at>jafsoft.com (replace "<at>" by "@") or visit the web page at

View menu

View the messages window with messages
generated in the last conversion by bringing
back the Status window

Results of last conversion

View the last file converted in your
preferred browser

Results of last conversion

Once you've converted a file, you can view the results in the browser of your
choice. Detagger will detect the default browser used on your system. If
you wish you can change this through the settings menu

You can view results in the selected browser by selecting the option on the
view menu or by pressing the View results button on the main screen.

Detagger can also be configured to automatically review results when run from
the command line or in drag'n'drop operation.

Help menu

The help menu has the following options:-

Contents

Brings up the contents page of this help file. Help
can be brought up anywhere in the program by
pressing F1

Register (online)

This options will take you to the registration page,
or - if you have already registered - to the updates
page

HTML doco (offline)

Brings up the local copy of the HTML
documentation in your preferred browser

HTML doco (online)

Brings up the Internet copy of the HTML
documentation in your preferred browser.

Other products

Links to web pages for JafSoft and their various
software products.

About

Shows the program version and other details.
Includes buttons to take you to the home page etc
on the web.

Update menu

The update menu has the following option

Check for newer versions

This option will take you to the web site,
where a check will be made to tell you if this
is still the latest version of the software.

Status window

The status window is displayed whenever a conversion is in progress. It
displays messages showing how the conversion is progressing. You can also
bring up this window by selecting the "messages from last conversion" option
on the View menu. You can prevent this behaviour by selecting the
option from the Settings menu

The messages displayed are usually just informational messages telling you
what Detagger is doing. You should review these messages and check they
don't indicate an error in conversion.

Once conversion is complete you can dismiss the window. You can automate
this by ticking the "dismiss on completion" box.

Should you wish to you can use the save to file button to save the
messages displayed to file. This can be useful for reviewing messages,
extracting URLs reported by the software (if showing URLs is enabled), or
for sending details when requesting support.

Console version

In addition to the Windows version of Detagger, there is a console
version. This can be invoked from the command line, and is thus
well suited to use in batch and automated conversions.

The console version is free to users who register the Windows version.
A trial copy of the console version can be obtained by visiting

Concatenate the results into one file
Write output direct to console
Selectively remove HTML markup
Display this useful list of commands ("/?" also works)
Generate a .log file
Filespec for output file(s)
May overwrite input files with the output
Document policies used in a .pol file
Suppress all output messages (except these :-)
Process files that match the filespec in sub-folders as well
Place output files in parallel folder structure to input files

Qualifiers are case insensitive and may be reduced to shortest unique
name (e.g. "/lo" for "/log")

Most of the configuration options are passed using a "policy file". This
is most easily created by running the Windows version, selecting the options
you want and then saving those to a policy file.

The policy file itself is just a text file, with one policy per line (hard break). If
you look at the list of policies in the documentation you can edit this by hand,
but usually it's just simpler to use the Windows version.

The /CONCAT command line qualifier

When present this qualifier states that all the results should be
output to a single file. This only makes a difference if you've
supplied multiple filespec's on the command line, or used a wildcard.

The /CONSOLE command line qualifier

When present this specifies that the output should be written to
the console window. This might be useful in piping operations.

The /DETAG command line qualifier

When present this specifies that Detagger should selectively remove
HTML markup and create a HTML output file. The default behaviour
otherwise is to convert the file to text.

If you want to specify which removal options should apply you'll
need to create a policy file and add that to the command line.

The /HELP command line qualifier

Displays the list of supported qualifiers

The /LOG command line qualifier

When present this specifies that Detagger will create a .log file
listing all the actions it takes and any messages created

The /OUTPUT command line qualifier

When present this will tell Detagger where the output should be
placed. If omitted the default is to output the results in the
same folder as the source file, with an extension (.txt or .html)
appropriate to the type of conversion being attempted

Examples :-

c:> h2acons input.html /out="c:\my files\output.txt"

File is output to "c:\my files\output.txt". Because there
is a space in the directory path the filename needs to be
in quotes

c:> h2acons in*.html /out=c:\output\

All the files in*.html will be converted and placed in the
directory "c:\output\"

c:> h2acons in*.html
/concat/detag/out=c:\output\bigfile.html

In this case the /concat/detag means that Detagger will
selectively remove markup and concatenate the results in the
single file "c:\output\bigfile.html"

The /OVERWRITE command line qualifier

When the /DETAG qualifier is specified then by default the output
file will be a HTML file in the same directory as the source file.
In this case Detagger could end up replacing the original file by
the output file. That is only allowed if the /OVERWRITE qualifier
is present. If it isn't, an error message is generated.

An alternative to using the /OVERWRITE qualifier is to use the
/OUTPUT qualifier to direct the output to a different folder, or
to a different name in the same folder.

The /POLICY command line qualifier

When present Detagger will create a .pol policy file listing all
the policies used in the conversion and their values. You should
not normally want to do this unless you want to create a policy
file to edit. or want to check that your policies are being
used.

To pass in a set of policies, just list the policy file on the
command line. It must have a .pol extension. For example the
command

c:> h2acons in*.html input.pol /policy=output.pol

will read the policies in "input.pol", use those in the conversion,
and then create a file "output.pol" listing the policies used, which
will be a mixture of default values and those loaded from "input.pol".

The /SILENT command line qualifier

When present all the messages usually displayed to the console window
are suppressed.

The /SUBFOLDERS command line qualifier

The /TREE command line qualifier

When present the software will place output files in a directory
structure that matches the input structure. This will only apply when
using the /SUBFOLDERS and /OUTPUT options as well. So for example
the command

c:> h2acons c:\input\a*.html /output=d:\new\ /subfolders/tree

Would look for all files a*.html in the folder c\input\ and its
sub-folders. The output files will be placed in d:\new\ and sub-folders
of that, so for example converting c:\input\sub\answer.html would
be converted to d:\new\sub\answer.txt. If it didn't already exist,
the sub-folder d:\new\sub\ will be created.

Running from the 'SendTo' menu

Detagger can make a useful addition to your "Send to" menu
(available when you right-click on a file in explorer).

To add Detagger to this menu, simply add a shortcut to your
Send To shortcuts directory. Under Windows 9x this is

/Windows/SendTo

under Windows XP this is

/Documents and Settings/<Your_User_Name>/SendTo

If you want to use a standard policy file (e.g. with a particular
colour scheme), then change the properties of the shortcut so that
the command is

Detagger %1 standard.pol

Working with Unicode

Detagger was not originally designed with Unicode in mind, and as
a result support for Unicode text has been gradually added over time,
with the result that earlier versions of Detagger may not support all
the features described in this manual. If in doubt, please contact
JafSoft for details.

What is Unicode?

Traditional single-byte character sets interpret the 8-bit
character values (128-255) as special characters. So on a Russian
machine this would be interpreted as Cyrillic, but on a different
machine this could be read (wrongly) as Arabic (and vice versa). On
most English-based PCs, the 8-bit characters are used for accented
character used in certain European languages, so a Russian text would
appear to have lots accented 'i's, 'e's and 'a's.

Unicode is a way of implementing text that supports multiple types
of character sets at the same time so that - for example - it is possible
to display Chinese and Cyrillic on the same page unambiguously. It does
this by allocating each character in each language a unique code value,
so that codes used for Cyrillic characters no longer overlap and conflict
with those assigned to Arabic.

However, these code values are in most cases larger than can be represented
in a single byte. As a result a way has to be chosen to represent each
character by one or more bytes.

The following Unicode representations are commonly used

UTF-8
Each character is represented by 1, 2 or 3 bytes, depending
on the which range the Unicode code value falls into. This has the
advantage that all ASCII characters are a single byte, so for
example all the HTML tags in a document are represented by
a single byte each. This also means there are no null bytes
contained in the text, which can make programming software to
work with this text easier.

UTF-16
Each character is represented by a 2-byte pair (future characters
may require 2 such pairs). The 2-byte pair is just the numerical
representation of the Unicode value of each character. This makes
the files easier to interpret, but also means that the byte order
depends on how the machine stores its bytes - i.e. is the machine
big-endian or little-endian. Because ASCII characters have a Unicode
value less than 255 the ASCII characters map onto a byte pairs in
which one of the bytes is null. Because each character requires
two bytes, a single byte wrongly inserted into a UTF-16 stream will
render all text that follows is as gibberish.

Unicode Byte Order Marks (BOMs)

Files that contain Unicode identify themselves by inserting a "Byte Order
Mark" (BOM) at the top of the file. This is a two-byte marker for UTF-16
files and a three-byte marker for UTF-8 files. Modern applications will
test for this byte marker and if present will then know how to interpret
the contents of the file. For example Notepad as supplied with Windows
XP can do this, whereas Notepad as supplied with Windows 98 could not.

In UTF-16 each character is represented by two bytes, and computers can
store a two-byte value in different ways (known as "big-endian" and
"little-endian"). Each operating system uses one method or another and
it isn't usually an issue, but when Unicode files get passed from one
machine to another, this becomes important. The BOM allows the two forms
of UTF-16 (known as "UTF-16BE" and "UTF-16LE") to be distinguished.

Auto-detecting Unicode input

The software has some ability to auto-detect Unicode text, and will
generally do so under the following circumstances

a 3-byte Byte Order Mark (BOM) is detected at the top of a
UTF-8 input file

a 2-byte Byte Order Mark (BOM) is detected at the top of
a UTF-16 input file

the input HTML contains an HTML entity that maps onto a Unicode
code value which can't be converted into an ANSI or ASCII equivalent, In
this case although the input HTML may not have been encoded as
Unicode, the output will need to be in order to correctly display
the Unicode character.

Creating Unicode output

The software will create Unicode output whenever it detects that the
input files were Unicode, or wherever Unicode characters have been
detected in the HTML entities of the original.

At present all Unicode output files will be UTF-8.

Controlling Unicode handling through use of policies

The following policies can be used to control the handling of
Unicode during the conversion :-

When Unicode is detected in the source the software will output
the text as UTF8 and optionally add a file marker that will
label the file as "Unicode" in a way that most applications
that can cope with Unicode will recognize.

Certain common HTML entities don't have a single ANSI character
but have common ASCII representations. If you enable this policy
you tell the software to use ASCII/ANSI alternatives where possible,
thereby reducing the chance of Unicode being necessary for the
output file.