$_$_TITLE The JafSoft text conversion FAQ
$_$_CHANGE_POLICY document subject : The JafSoft text conversion FAQ
$_$_CHANGE_POLICY Expect contents list : No
$_$_CHANGE_POLICY background colour : ffefff
$_$_CHANGE_POLICY headings colour : dd00dd
$_$_CHANGE_POLICY LINK Definition : "[a2h man]" = "AscToHTM Manual" + "http://www.jafsoft.com/doco/a2hdoco.html"
$_$_CHANGE_POLICY LINK Definition : "[a2r man]" = "AscToRTF Manual" + "http://www.jafsoft.com/doco/a2rdoco.html"
$_$_CHANGE_POLICY LINK Definition : "[pol man]" = "Policy Manual" + "http://www.jafsoft.com/doco/policy_manual.html"
$_$_CHANGE_POLICY LINK Definition : "[tag man]" = "Tag Manual" + "http://www.jafsoft.com/doco/tag_manual.html"
$_$_CONTENTS_LIST
$_$_BEGIN_IGNORE
** Master copy on VMS **
$_$_END_IGNORE
1.0 Introduction
================
This FAQ is clearly a work in progress. Many of the subjects have no answers
as yet. Nevertheless I intend fleshing this out as and when I get time, and
I welcome new questions (or prompts to write the answers to questions listed
here) for all users.
Direct all correspondence to *infojafsoft.com*
1.1 Document conventions
Often the answer to a question involves setting a policy value (see the
"[Pol man]" for more about policy files). The policy involved will be
displayed as
:
The is the text that will appear in the policy file. This
must be *exactly* as shown, no variability in the spelling will be tolerated
by the program. If you misspell the policy text (or if it's been changed in
a new version), the program will complain that it doesn't recognize the
policy.
In addition to adding lines to your policy file by hand, the Windows version
allows *most* (not all) policies to be set via property sheets. You'll
need to locate the equivalent policy on property sheets.
More details on policies can be found in the Policy Manual which is included
in downloads, but may also be found online at
http://www.jafsoft.com/doco/policy_manual.html
1.2 Finding JafSoft software on the web
1.2.1 The home page
Currently http://www.jafsoft.com/. Each product has it's own page, e.g.
http://www.jafsoft.com/asctohtm/
http://www.jafsoft.com/asctortf/
http://www.jafsoft.com/asctotab/
http://www.jafsoft.com/addlinx/
These are are listed on the products page
http://www.jafsoft.com/products/
There is also a .co.uk mirror site.
1.2.2 Online documentation
Currently http://www.jafsoft.com/doco/docindex.html. Documentation is
usually included with all downloads, either as HTML or as ready-to-convert
text.
In Windows this will usually be found in the folder c:\Program Files\JafSoft\AscToHTM
Documentation available includes :
- [a2h man]. Describes the text-to-HTML converter AscToHTM
- [a2r man]. Describes the text-to-RTF converter AscToRTF
- [pol man]. Describes the use of policy files by the software
- [tag man]. Describes the use of a preprocessor and tagging system by
the software
- This FAQ.
If you plan to read one or more of these manuals you'd be best advised to
download one of the documentation .zip files.
1.2.3 Keeping track of updates
There are update pages at
http://www.jafsoft.com/asctohtm/updates.html and
http://www.jafsoft.com/asctortf/updates.html
Registered users get update notifications by mail. To date all updates have
been free to registered users, but we can't guarantee that will always be
the case.
1.2.4 Who is the author?
1.2.4.1 John A Fotheringham
That's me that is. The program is wholly the responsibility of John A
Fotheringham, who maintains it in his spare time. He doesn't make enough to
make a living from it (in case you were wondering).
1.2.4.2 JafSoft Limited
Although authoring shareware doesn't earn enough that I can give up my day job,
I have created a separate company to handle AscToHTM, AscToRTF and all
the shareware and other services I have to offer.
The company is called JafSoft Limited, and the web site is
http://www.jafsoft.com/
1.2.4.3 Contacting the author
Correspondence should be via email to *infosupport.com*. Priority is given
to registered users and people who want to pay for development [ :) ], however
all correspondence will be answered.
1.2.5 Reporting errors and bugs
Despite the best of intentions, bugs do happen, and we're always grateful for
anyone who takes the time to report them to us.
Please feel free to report all errors and bugs to *infosupport.com*. When
you do so please include
- a clear description of the problem
- which version of the software you are using
- a copy of the offending source file (if not too large <50k)
- a copy of any policy file being used.
- a copy of any .log file generated (save the status messages to file)
Please keep any source files small. If the source file is large, try to
generate a smaller file that exhibits the same problem.
1.2.6 Requesting changes to the software
Feel free to send suggestions for enhancements/changes to *infosupport.com*.
A surprising number of features have been added this way although, naturally,
I'm happy for people to think these were all my own ideas.
Minor changes may slip into the next release if I think they enhance the
product.
Major changes to the software can be undertaken on a commercial basis by
contracting my services from Yezerski Roper Ltd. This option is not for the
faint hearted. Don't let the software's $40 price tag persuade you that
that's anything but a bargain, my hourly rate is more than that amount,
although I can do quite a lot in one hour :-)
1.3 Registration and updates
1.3.1 Registration
Registration can be completed online by visiting
http://www.jafsoft.com/asctohtm/register_online.html or [[BR]]
http://www.jafsoft.com/asctortf/register_asctortf.html
Registration is usually completed via a third party registration service
(I use a couple) and an on-line download. The registration service will
take your payment and then send you download instructions for a fully registered
copy. The registration companies can accept payments using a number of
methods, but the commonest is credit card.
We do not ship software on media at this time. We'd have to double the price
and stop our free upgrade policy if we did. That said, one of the registration
companies will put the software onto CD and ship it to you for an extra
charge. As yet I haven't set this up, but if interested email *infosupport.com*
with details.
1.3.2 Update policy
To date all updates have been free to registered users. This has been true
for both minor and major updates. Over time the price of the software has
risen, but no-one has ever had to pay extra.
I'd like to continue this policy, but I'm unable to actually guarantee this,
especially since I've discovered old registered versions circulating on the Net.
1.4 Other related products by the same author
1.4.1 AscToTab
[AscToTab] is a subset of AscToHTM which is dedicated to creating tables
from plain text and tab-delimited source files. The software is offered
as freeware under Windows and OpenVMS.
1.4.2 AscToRTF
AscToRTF is a text-to-RTF converter which uses the same analysis engine as
AscToHTM, but which creates Rich Text Format (RTF) files instead.
RTF is a format better suited for import into Word and other word processors.
[AscToRTF] was released early in 2000 and has received a number of 5-star
reviews.
1.4.3 AddLinx
A registered user (see "[[GOTO requesting changes to the software]]") contacted
me and asked if I had a program that could add hyperlinks to an *existing*
HTML file. At the time I didn't, but on examining the software it seemed
I had all the bits and pieces necessary to construct such a tool. Within 24
hours I sent him a first attempt at such a utility, and within a few
weeks [AddLinx] was born.
It's a very rough utility that I haven't spent much time on. It's available
as postcard ware.
1.4.4 API versions of the software
For those wanting to programmatically integrate the conversion software into
their own products, an API has been produced and is available under license.
AscToHTM and AscToRTF are written in C++, and an API is available which
provides a C++ header file defining the functions available. The software
is then provided as a Windows library to be linked against.
In the past clients have successfully integrated this with their Java software,
on Windows, Linux and Solaris platforms.
Although I'm not a Visual Basic programmer myself I'm less sure of how the
software could be integrated with VB, although I presume this can be done.
Contact *infojafsoft.com* if interested.
1.4.5 Linux versions of the software
Linux versions of all programs are planned. The core conversion software
is developed as a command line utility, and in this form it ports to Linux
reasonably easily. I plan to offer AscToHTM and AsctoRTF as Linux shareware
in the near future.
1.5 Document conversion consultancy
1.5.1 Do you offer consultancy?
We always like to offer a little help to users just starting out. Once you
register you are free to send a typical sample file to the author, who
will offer some advice on problems you might encounter and policies you may use.
However, for people wanting to do larger conversions (see
"[[GOTO What's the largest file anyone's ever converted with AscToHTM?]]") or
wanting significant amounts of our time, you will need to buy assistance at
consultancy rates. Regrettably this is not cheap, although we feel it's good
value for money :)
Contact *infojafsoft.com* with details. See also "[[GOTO requesting changes to the software]]"
1.6 Y2K Compliance
From time to time I get asked if my products are Y2K compliant.
The short answer is "yes it was" :-)
1.7 Status of this FAQ
Clearly it's not finished yet. You might even say it's "under construction" :)
I've decided to put this on the web in "unfinished" form so that it may be
of *some* benefit to people as soon as possible.
If you've a particularly urgent need for a question to be answered contact
*infosupport.com*, and don't be surprised if your answer ends up in
this document.
2.0 Getting the best results
============================
2.1 General
2.1.1 Three words: consistency, consistency, consistency
The software works by analysing your document to determine what "rules" you've
used for laying out your file. On the output pass these "rules" (also known
as "policies") are used to determine how to categorize each line, and
inconsistencies can lead to lines being wrongly treated because they "fail
to obey policy".
You can greatly help this analysis by being consistent in your formatting.
Many of the decisions the software makes can be overridden by changing the
"analysis policies" (see "[[GOTO using policy files]]"), but if this becomes
necessary it can quickly become hard work (if only because you need to
familiarize yourself with these policies), so it's better to avoid this
if possible.
If you're writing a document with text conversion in mind, bear in mind
the following
- *use of white space* (see "[[GOTO white space is your friend]]"). In
general white space can be used to separate paragraphs, tables and
diagrams from normal text and columns of data from each other inside
tables.
The software *likes* white space :)
- *use of tabs*. The software will convert all tabs to spaces on input, assuming
that one tab = 8 spaces. This will work fine provided this tab size
is correct, or your use of tabs and spaces is consistent. It may not
work otherwise, in which case you'll need to tell the software what your
tab size is via an analysis policy.
- *use of indentation*. The software will calculate the pattern of indentation
used in your file, and will output text accordingly. If your use of
indentation is inconsistent, then paragraphs will be wrongly broken
and headings may not be correctly recognized.
- *use of numbering*. The software can spot numbered headings and numbered
lists. To avoid confusing the two, the indentation of a given type of
heading is tested (although you can disable this test), together with
the numbering sequence. The software can tolerate small gaps in
numbering, but large gaps will confuse it.
- *use of line lengths*. The software will attempt to determine your "page
width" and text justification. These are then used to spot short lines
(which get a added) and centred text. The centred text algorithm
has problems and so is disabled by default.
Try to avoid really long lines, or highly variable line lengths. If you
don't, the software is liable to insert where you don't want them,
unless you set the "page width" and "short line length" analysis
policies to correct this behaviour.
- *avoid confusing the program*. Numbered lists inside numbered sections
all at the same level of indentation is a good example. The numbers
become ambiguous and errors start to occur. If you must have this, try
to set the numbered list at a small offset to the heading, so that the
indentation position will distinguish the two.
2.1.2 Make sure your files are "line-orientated"
The software reads files line-by-line. On the first pass it will analyse the
distribution of line lengths to determine the "page width" of your file.
This in turn is used to detect certain features such as centred text and
"short lines".
Some files, especially those created on PC, do not include line breaks,
instead they only have a single break after each paragraph of text.
Whilst not a problem in itself, it does somewhat handicap the software's
ability to analyse the file.
Where possible, you should attempt to save files "with line breaks" to give
the software the best chance of understanding how your file is laid out.
2.1.3 Make sure your use of tabs is consistent
The software converts all tabs in your source document on the assumption
that one tab equals 8 spaces. In fact, the actual tab size is irrelevant
provided your use of tabs and spaces is consistent. If it isn't, you may
find tables aren't being analysed correctly.
You can set the actual Tab Size used in your documents vie the policy line
Tab Size: n
where n is the number of spaces per tab.
2.1.4 White space is your friend
The software attempts to categorize each line into one of a number of types
(e.g. heading, bullet point, part of a table etc).
Often this analysis is influenced by adjacent lines. For example a line
of minus signs can be interpreted as "underlining" a heading, or perhaps
as part of a table or diagram.
Confusion can occur where different features are close to each other (e.g.
an underlined heading immediately followed by a table).
In most cases the ambiguity can be reduced or eliminated by adding 1 or 2 blank
lines between the objects being confused.
The same argument applies to table columns. If two columns get merged
together, try increasing the "white space" between by moving them apart.
In almost all situations, adding white space to your document will help
reduce the likelyhood of analysis errors.
2.1.5 Use a simple numbering system
I've seen documents with section numbers like "Section II-3.b". I'm sorry,
but at present the software can't recognise such an exotic numbering
system. Equally it can't cope Appendices line A-1 etc. [*]
If possible, change your section numbers to numbers (like this document),
or "underline" all your headings with a row of dashes or equal signs on
the next line. The software will understand that much better.
[*] From version 4 onwards, there is the ability to recognise headings that
start with the same word or phrase (such as Chapter, Appendix, Section
etc), so this may offer a solution to you.
2.1.6 Save policies into a policy file
The program offers a large number of "policies" to customize the conversion.
These policies can be saved in a "policy file", which is simply an ordinary
text file (which you may edit by hand if you like).
By saving policies into files, you can reload these files the next time
you do a conversion, which means you won't need to adjust all the settings
again. You can create multiple policy file for different conversions or
conversion types.
Policy files are described at length in the "[Pol man]".
2.1.7 Add preprocessor commands to your source file
The program has it's own built-in preprocessor. This allows you to
add special "directives" and "tags" into your source file which tell the
program to perform special functions. Examples include the addition of
include files into the source, the insertion of contents lists, adding
hyperlinks to sections and much much more.
An example is the following hyperlink, whereby
[[OT]]GOTO Using preprocessor commands[[CT]]
is used to provide the link to the named section, such as the one that appears
in the next sentence. For more details see "[[GOTO using preprocessor commands]]"
The preprocessor is described at length in the "[Tag Man]".
2.2 Using policy files
2.2.1 Saving "incremental" policies
When you choose to save your policies to file you will be asked whether
you want to save "incremental" policies, or "all" policies.
"Incremental" means only those policies loaded from file, or manually
adjusted will be written to file. This is recommended as it leaves the
program free to make all other adjustments itself.
"All" means that all policies will be written to file. This is useful
if you want to document or review the policies used, but it is less useful
if you want to reload this policy file, as it will fully constrain the
program's behaviour. While this may not be a problem when reconverting
the same file, it may well be unsuitable when converting new files.
2.2.2 Editing policy files by hand
Policy file are just text files with a ".pol" extension. If you think of
them like the old Windows .ini files you'll get the idea. This has been
done deliberately so that these files can be manually edited in a normal
text editor.
OpenVMS users actually have no other way of creating policy file, but Windows
users can change most (but not all) policies via the GUI. However I
recommend that anyone who comes to regard themselves as a "power" user
learns how to edit these files.
The policy file consists of one policy per line, usually in the form
:
e.g.
Document title : Here's my favourite URLs
When entering policy lines you must use the *exact* indicated
in the documentation for the policy to be recognized. If I've misspelt
anything then tough, you'll have to follow it (but tell me anyway). The
one exception to this rule is I've allowed both British and American spelling
of colour/color.
The allowed will vary from policy to policy. Most policy lines
accept a value of "(none)" effectively negating that policy.
The order of lines in the file is largely unimportant. If you're editing
a .pol file generated by the program (see "[[GOTO generate a .pol file]]") then
you'll notice section headings of the form
[Hyperlinks]
These are purely decorative. That is, they have no significance, and you
can ignore them and move the policy lines around, there's no concept of
having to place policy lines in the "right" section.
As new versions of the software are released policies are moved from one section
to another as different grouping expand and appear. As explained above,
this usually has no effect on the validity of the .pol file.
2.2.3 Using include files in policy files
Policy files may include other policy file as follows
include file : ..\policies\Other_policy_file.pol
This can be useful if you have multiple policy files but want certain features
to be the same. For example I use this to introduce the same link dictionary
commands into all my policy file. You could equally put all your colour policies
into one file.
The "include file" line will have to be manually edited into the .pol file using
a text editor.... there is no support currently for setting this via the program
itself.
NOTE: If you "save" a policy file that has been loaded, then the include file
structure will be lost, and all the policies will be output into a single
file.
2.2.4 Using a default policy
You can make the program use the same policies by default each time it runs.
To do this select the policies you want, and then save these to a policy file.
Next select the _Settings->Use of Policy Files_ menu option. Check the
"Use a default" flag, and select the file you just created.
Next time you run the program these policies will be loaded and used for
your conversions. Note, you can still reset the policies or load a different
file using the options on the Conversion options menu.
To stop using a default just clear the "Use a default" flag (you don't need
to clear the policy file name).
2.3 Using preprocessor commands
2.3.1 What is the preprocessor?
The program has a built-in preprocessor. This will recognize special commands
inserted into the source file. These commands can be used to correct analysis
errors (e.g. to correctly delimit a table), or to add to the output. For
example the TIMESTAMP tag can cause the text
"this document was converted on [[OT]]TIMESTAMP[[CT]]"
to be output as
"this document was converted on [[TIMESTAMP]]".
preprocessor commands are of two types
*Directives*. These begin with "$_$_" and must be on a line by themselves
with the "$_$_" being at the start of the line (i.e. there can be no leading
spaces).
*Tags*. These take the form [[OT]]TAG [[CT]] and may occur anywhere
within your text, but cannot be split over two lines.
Some commands may be expressed as either directives or tags. A "[Tag Man]"
is also available.
2.3.2 Delimiting tables, diagrams etc
The program will attempt to detect tables and diagrams, but sometimes it
gets the wrong range for the table, and also diagrams may be interpreted
as tables and vice versa.
To correct such mistakes, you can bracket the source lines as follows :-
$_$_BEGIN_PRE
$_$_BEGIN_TABLE
...
$_$_END_TABLE
$_$_END_PRE
or
$_$_BEGIN_PRE
$_$_BEGIN_DIAGRAM
...
$_$_END_DIAGRAM
$_$_END_PRE
2.3.3 How do I add my own HTML to the file?
You can embed raw HTML in your text file in one of three ways using the
preprocessor
a) Insert a one-line HTML as follows
$_$_HTML_LINE
The HTML_LINE and it's arguments must all be on one line.
b) Insert a HTML tag as follows
[[OT]]HTML [[CT]]
The HTML tag must all be on one line.
c) insert a section of HTML between two directive lines
$_$_BEGIN_PRE
$_$_BEGIN_HTML
...
lines of HTML, e.g. custom artwork or tables
...
$_$_END_HTML
$_$_END_PRE
For example to enter a anchor point in your text so that you can link to
it try
$_$_BEGIN_PRE
$_$_HTML_LINE
$_$_END_PRE
To embed an image with a hyperlink you might try
$_$_BEGIN_PRE
$_$_BEGIN_HTML
$_$_END_HTML
$_$_END_PRE
$_$_BEGIN_HTML
$_$_END_HTML
The "$_$_" has to be at the beginning of the line, i.e. not indented as I've
shown above. If you look at the program's HTML documentation, and the text
used to create it you'll see examples of this and other preprocessors.
Indeed if you look at the [[SOURCE_FILE]] for this document you'll see that's
exactly how the image on the right was added to *this* document.
Future versions of the software will introduce in-line tagging so you can do
place LINKPOINTs anywhere in your text. Check your program's documentation
for details.
2.3.4 Using standard include files
The preprocessor command INCLUDE can be used to include standard pieces of
text into your source files. For example
$_$_INCLUDE ..\data\footer.inc
will include the file "footer.inc" into your source file at this location.
Note that the path given must be correct relative to the source file being
converted.
The contents of the include file simply get "read into" the source. As such
they get included in the analysis of the whole document.
Include files can be useful to include standard disclaimers or navigation
bars to all your pages. For example you could embed HTML to link back to your
home page (see "[[GOTO how do I add my own HTML to the file?]]")
Of course the same effect could be achieved by using a HTML footer file
(see "[[GOTO adding headers and footers]]") or by defining a "HTML fragment"
called HTML_FOOTER (see "[[GOTO customizing the HTML created by the software]]").
2.3.5 Adding Title, keywords etc
If you want to add title, keywords and descriptions to your HTML you can do this
by embedding special commands in the source file as follows
$_$_BEGIN_PRE
$_$_TITLE This is the title of my HTML page
$_$_DESCRIPTION This page is a wonderful page that everyone should visit
$_$_KEYWORDS wonderful, web, page, full, of keywords, that
$_$_KEYWORDS everyone, will, want, to search, for
$_$_END_PRE
The "$_$_" must be the first characters on the line. You can spread the keywords and
description over several lines by adding extra $_$_KEYWORD and $_$_DESCRIPTION lines.
Note: Most of these commands have equivalent policies, allowing you to
set title etc through an external policy file should you prefer.
2.3.6 Adjusting policies for individual files or parts of files
You can, if you wish, create one policy file for each file being converted,
however this is liable to become a maintenance nightmare.
If you don't want to maintain multiple policy files, or if you simply want
to adjust a few policies for a given source file, you can use
the $_$_CHANGE_POLICY command.
The effect will vary according to the type and position of the command.
Some policies will affect the whole document, others will only affect
the document from that point onwards... it depends on the nature of
the particular policy. See the "[Pol man]" for details.
For example placing
$_$_CHANGE_POLICY background colour : #FF0000
$_$_CHANGE_POLICY text colour : White
will change the document background colour to be red, and the text to be white
throughout the whole document.
2.4 Making the program run faster
You can make the program run faster in a number of ways by disabling features
that you know you don't want.
2.4.1 Review the "look for" options
As of V3.1, AscToHTM has a number of "look for" options, stating what the
program is looking for. Disable the ones you don't want, although most of them
will not make a major difference to the program speed.
2.4.2 Don't convert URLs
Probably the single most expensive function is the search for URLs to convert
into hyperlinks. Every word (and every word fragment) has to be checked
individually. The problem isn't helped by having to distinguish URLs with
commas in them from comma separated lists of URLs.
If you know your document has *no* URLs to be converted, disable this feature
and watch the software run 10-20% faster. However this is one feature of the
software that people like most.
2.4.3 Don't generate tables
The software will attempt to convert regions of pre-formatted text into tables.
This can take a lot of analysis even if eventually it decides "it's not a table
after all!".
This only comes into effect if the program detects preformatted text, so
you should only disable this feature if your pre-formatted text is largely
non-tabular. If that's the case you probably want to disable this anyway
as the tables created may be inappropriate.
3.0 Conversion Questions
========================
3.1 General
3.1.1 How do I get rid of the "nag" lines?
Easy. You register the software (see "[[GOTO registration and updates]]"), or
you remove them by hand. "Nag" lines only appear in unregistered trial
copies of the software. If you register, these are removed.
3.1.2 My file has had it's case changed and letters replaced at random by
numbers. How do I fix that?
Easy. You register the software (see "[[GOTO registration and updates]]").
The case is only adjusted in unregistered trial copies of the software, either
after the line limit is reached, or after the 30 day trial has expired.
The case is adjusted so that you can still evaluate the conversion has produced
the right type of HTML, but since the text is now all in the wrong case
and had letters substitutes the HTML is of little use to you.
This is intended as an incentive to register.
That said, you *will* find pages on the web that have been converted in this
manner.
3.1.3 Why do I sometimes get

markup? How do I stop it?
The program is detecting a "definition". Definitions are usually
keywords with a following colon ":" or hyphen "-", e.g. "text:"
You can see this more easily if you go to Output-Style and toggle the
"highlight definition term" option... the definition term (to the left of
the definition character) is then highlighted in bold.
If the definition spreads over 2 "lines", then a definition paragraph is
created, giving the effect you see.
If you have created your file using an editor that doesn't output line breaks
then only long paragraphs will appear to the program as 2 or more "lines".
In such cases only the longer paragraphs will be detected as "definition
paragraphs", the rest are detected as "definition lines", even though they're
displayed in a browser as many lines. If you view the file in NotePad
you'll see how the program sees it.
To stop this you have a number of options.
(1) _Analysis policies -> What to look for -> Look for definitions
switch this off. This will stop *all* attempts to spot "definition"
lines
(2) _Analysis policies -> Analysis -> recognize colon (:) characters_
switch this off. This will stop anything with a colon (:) being
recognized as a definition.
(3) _Output policies -> Style -> Use

markup for paragraphs_
disable this. The definitions will still be recognized, but the

markup won't be used.
3.1.4 Why are some of my words being broken in two?
Sometimes AscToHTM will produce HTML with words broken - usually over two
lines. This can happen if your text file has been edited using a program
(like NotePad) that doesn't place line breaks in the output.
AscToHTM is line-orientated (see 2.1.2). Programs like NotePad place an entire
paragraph on a single "line", or on lines of a fixed length (e.g. 1000
characters).
AscToHTM places an implicit space at the end of each line it reads. This
is to ensure you don't get the word at the end of one line merged with that
at the start of the next.
However, in files with fixed length "lines", large paragraphs will be broken
arbitrarily, with the result that a space (and possibly a ) will be
inserted into the middle of a word.
You can avoid this by breaking your text into smaller paragraphs, passing
your file through an editor that wraps differently prior to conversion, or
selecting any "save with line breaks" option you have.
3.1.5 Why am I getting line breaks in the middle of my text?
The software will add a line break to "short" lines, or - sometimes - to
lines with hyperlinks in them.
You can edit your text to prevent the line being short, or you can use
policies to alter the calculation of short lines. Use the "[Pol man]"
to read about the following policies
- "Add to lines with URLs"
- "Look for short lines"
- "Short line length "
- "Page Width"
3.1.6 Why isn't the software preserving my line structure?
Do you mean line structure, or do you really mean paragraph structure?
The program looks for "short lines". Short lines can mark the last line
in a paragraph, but more usually indicate an intentionally short line.
The calculation of what is a short line and what isn't can be complex, as
it depends on the length of the line, compared to the estimate with of the page.
You have a number of options :-
- enable the "Preserve line structure" policy. This will cause your
output to exactly match the line structure of your input.
- disable the search for short lines using the
_Analysis policies -> What to look for_ tab
- explicitly set the page width and/or short line length using the
_Analysis policies -> analysis_ tab.
See also "[[GOTO how do I preserve one URL per line?]]"
3.1.7 Why am I getting lots of white space?
Usually because you had lots of white space in your original document. If
that is the case, then you can set the policy
Ignore multiple blank lines : Yes
to reduce this effect.
Some people complain that there are blank lines between paragraphs, or between
changes in indentation. Often this is the vertical spacing inserted by
default in HTML. This can only be controlled in later versions of HTML
which support [[TEXT HTML 4.0]] and Cascading Style Sheets (CSS)
Occasionally certain combinations of features lead to an extra line of space.
3.1.8 What's the largest file anyone's ever converted with AscToHTM?
Well, at time of writing, I know of a 56,000 line file (3Mb) which was converted
into a single (4Mb) HTML file. Of course, it was also converted in a suite
of 300 smaller, linked, files weighing in at 5Mb of HTML.
This file represented 1,100 pages when printed out.
I *do* sometimes wonder if anyone ever reads files that big though.
3.1.9 Does the software support hebrew letters / Japanese / Right to Left Alignment ?
Since version [[text 4.1]] the short answer is "probably".
Although the software has no ability to *understand* documents written this
way, and was designed to cope with the ASCII character set, from version [[text 4.0]]
onwards it is possible to manually set the "charset" used. This tells the
HTML browser how to interpret the characters. Whether or not you see the
page correctly then depends on the browsers and fonts installed on the viewer's
machine.
In version [[text 4.1]] some auto-detection of character sets has been added.
This can usually detect which character encoding is being used. You can switch
this behaviour off should you wish, and you can also set the correct charset
by hand.
See the policies "Character encoding" and "Auto-detect character encoding".
3.1.10 Why does the program hang after a conversion?
Under Windows the software usually tries to display the results files in
your browser or viewer of choice. To prevent multiple instances of the
browser being launched, DDE is used. DDE is a Windows mechanism that allow
requests to be passed from one program to another, in this case the software
is asking the browser to display the HTML just created.
Some users have reported problems with DDE - especially under Windows
Millenium. When this occurs any program - including AscToHTM - will hang
whenever it attempts to use DDE... you notice it first with AscToHTM because
it uses DDE all the time. When this happens you will need to use
the Task Manager to kill the program.
You can solve this problem by using the _Settings -> Viewers for results_
to disable the use of DDE.
From version 4 onwards the software will detect when this has happened, and
will disable its use of DDE next time it is run. You can re-enable this
(e.g. after a reboot has cleared the problem) under the _Settings->Viewers_
menu option.
Note, this is a workaround and not a solution. When DDE stops working on
your system other programs sill have problems, e.g. when you click on a
hyperlink inside your email client.
Sadly I don't know a solution for the DDE problem. Sometimes rebooting
helps - initially at least - sometimes stopping a few applications helps.
Sometimes it doesn't. :-(
3.2 What the software _can't_ do
3.2.1 Why doesn't it convert Word/Wordperfect/RTF/my favourite wp documents?
Because it wasn't designed to. No, really.
The software is designed to convert *ASCII* text into HTML. That is plain,
unformatted documents. Word and other wp packages use binary formats that
contain formatting codes embedded in the text (or in some cases the text is
embedded in the codes :-).
Even RTF, which is a text file, is so heavily full of formatting information
that it could not be perceived as normal text (look at it in Notepad and
you'll soon see what I mean).
Why the omission? Well, like I said, that was *never* the intention of this
program. I always took the view that, in time, the authors of those wp
packages would introduce "export as HTML" options that would preserve all
the formatting, and in general this is what has happened. To my mind
writing such a program is "easy".
My software tackles the much more difficult task of inferring structure
where none is explicitly marked. In other words trying to "read" a plain
text file and to determine the structure intended by the author.
See also "[[GOTO Do you have a html-to-text converter, rtf-to-html converter etc?]]".
3.2.2 How can I use DDE with Netscape 6.0?
You can't. Unlike Netscape versions up to and including 4.7, Netscape 6.0
doesn't support DDE in its initial release under Windows.
3.2.3 Can I use AscToHTM to build a web site with a shopping cart?
By itself, no.
AscToHTM can only really produce relatively "static", mostly-text web pages.
To add any dynamic contents and graphics you'd effectively need to add the
relevant HTML yourself, so the answer is essentially "no".
Adding a shopping cart is actually fairly tricky. You either have to
install the software yourself, or sign up with an ISP that will do this
for you. Most such systems require a database (of items being sold).
Having not dealt much with such systems myself I can't really advice
on a *web authoring* tool (which is what AscToHTM is) that would integrate
seamlessly with a shopping cart system.
My advice would be to identify an ISP that offers shopping cart
functionality and see what methods they offer for web authoring.
I wish you luck.
3.2.4 How do I interrupt a conversion?
At present you can't. The windows version won't respond to stimulus while
a conversion is in progress, meaning that the windows will not refresh.
Normally this isn't a problem, but in large conversions this can be a
little disconcerting.
Fixing this is on the "to do" list.
3.3 Tables
3.3.1 How does the program detect and analyse tables?
Here's an overview of how the software works, this will give you a flavour
for the complexity of the issues that need to be addressed.
The software first looks for pre-formatted regions of text. It does this by
1) Spotting lines that are clearly formatted, looking for large white space and
any table-like characters like '|' and '+'. If may also look for
code-like lines and diagram-like lines according to the policies set.
2) Each time a heavily formatted line is encountered an attempt is made
to extend the preformatted region by "rolling it out" to adjacent, not so
clearly formatted lines
3) This "roll out" process is stopped whenever it encounters a line that is
clearly not part of the formatted region. This might be a section heading
or a set of multiple blank lines (the default is 2).
Once a preformatted region is identified, analysis is performed to see whether this
is a table, diagram, code sample or something else. This decision depends on
4) The mix of "graphics" characters as opposed to "text" characters
5) The presence of "code-like" indicators like curly brackets, semi-colons
and "++" and other special character sequences. Note, the software
doesn't understand code syntax, it just recognises commonly used
character combinations.
6) How well the data can be fitted into columns of a table (below)
If nothing fits then this text is output "as normal", expect that the line structure
is preserved to hopefully retain the original meaning.
If the software decides a table is possible, it
7) Characterizes the contents of each character position. So for example
a character position that contains mostly blank characters on each line
is a good candidate for a column boundary.
8) Infers from the character positions the likely column boundaries
Once a tentative set of column boundaries has been identified, the following steps
are repeated
9) Place all text into cells using the current column boundaries
10) Measure how "good a fit" the text is to the columns, looking for values
that span column boundaries, or columns that are mostly "empty"
11) Eliminate any apparently "spurious" columns. For example "empty" columns
may get merged with their neighbours.
Finally, having settled on a column structure the software
12) Tries to identify the table header, preferably by detecting a horizontal
line near the top of the table.
13) Tries to work out column alignments etc. If the cell contents are numeric
the cell will be right aligned, otherwise the placement of the text
compared to the detected boundaries will be observed
14) Identifies how many lines goes into each row. If blank lines or
horizontal rules are present, these may be taken as row boundaries.
15) places all text into cells, using the configuration found.
Naturally any one of these steps can go wrong, leading to less than
perfect results.
The program has mechanisms (via policies and preprocessor commands) to
a) Influence the attempt to look for tables
b) Influence the attempt to extend tables (steps (1)-(3))
c) Influence the decision as to what a preformatted region is (steps (4)-(6))
d) Influence the column analysis (steps (7)-(11))
e) Influence the header size and column alignment (steps (12)-(15))
Read the table sections in the "[Tag Man]" and "[Pol man]" for more details.
3.3.2 Why am I getting tables? How do I stop it?
The software will attempt to detect regions of "pre-formatted" text. Once
detected it will attempt to place such regions in tables, or if that fails
sometimes in

...

markup.
Lines with lots of horizontal white space or "table characters" (such as "|".
"-". "+") are all candidates for being pre-formatted, especially where
several of these lines occur.
This often causes people's .sigs from email to be placed in a table-like
structure.
You can alter whether or not a series of lines is detected as preformatted
with the policies
Look for preformatted text : No
Minimum automatic

size : 4
The first disables the search for pre-formatted text completely. The second
policy states that only groups of 4 or more lines may be regarded as
preformatted. That would prevent most 3-line .sigs being treated that way.
If you have pre-formatted text, but don't want it placed in tables (either
because it's not tabular, or because the software doesn't get the table analysis
quite right), you can prevent pre-formatted regions being placed in tables via
the policy
Attempt TABLE generation : No
3.3.3 Why am I _not_ getting tables?
First read "[[GOTO how does the program detect and analyse tables?]]" for an
overview of how tables are detected.
If you're not getting tables this is either because they are not being detected,
or that having been detected they are being deemed to be not "table-like". Look
at the HTML code to see if there are any comments around your table indicating
how it's been processed.
If the table is not being detected this could be because
- the lines don't look table-like. Try increasing the white space, or
adding a vertical bar '|' as your column separator.
- some lines are table-like, but the "roll out" isn't including the adjacent
less formatted lines. Try changing the policy *Table extending factor*
- The detected "table" is too small compared to the value in the policy
*Minimum automatic

size*.
If all this fails, edit the source to add preprocessor commands around the table
as follows
$_$_BEGIN_PRE
$_$_BEGIN_TABLE
...
...(your table lines)
...
$_$_END_TABLE
$_$_END_PRE
3.3.4 Why do my tables have the wrong column structure?
First read "[[GOTO how does the program detect and analyse tables?]]" for an introduction
to how tables columns are analysed.
The short answer is "the analysis went wrong". Answering *why* it went wrong
is almost impossible to answer in a general way. Some things to consider
- Was the table extent correctly calculated? If adjacent lines were
wrongly sucked into the table this will affect the analysis. Try
adding blank lines around the table, adjusting the "Table extending factor"
policy, or adding BEGIN_TABLE/END_TABLE preprocessor tags to correct any
errors in calculating the extent.
Often the table extent is correct, but the analysis of the table has gone
wrong.
- Check the text doesn't mix tabs and spaces together in an inconsistent
manner. Either set the "Tab size" policy, or replace all tabs by spaces.
- Look to see if some data just "happens" to line up the blanks. In some
small tables this can happen. Consider adjusting the
"Minimum column separation" policy to a value greater than 1.
- Consider adjusting the "Column merging factor" policy to reduce/increase
the number of columns produced for the table.
If all this fails you can explicitly *tell* the software what the table layout
by using either the TABLE_LAYOUT preprocessor command, or the "Default TABLE
layout" policy. Only use the policy if all tables in the same source file have
the same layout.
3.3.5 Where did all my table lines go?
The software removed them because it thought they would look wrong as
characters. The lines are usually replaced by a non-zero BORDER value
and/or some tags placed in cells.
3.3.6 How can I get the program to recognize my table header?
One tip. If you insert a line of dashes after the header like so...
$_$_BEGIN_PRE
Basic Dimensions
Hole No. X Y
-------------------------
1 3.2500 5.0150
2 1.2500 3.1250
etc.....
$_$_END_PRE
The program *should* recognize this as a heading, and modify the HTML
accordingly (placing it in bold).
Alternatively you can tell the program (via the policy options or preprocessor
commands) that the file has 2 lines of headers.
3.3.7 Why am I getting strange COLSPAN values in my headers?
(see the example table in 3.3.6)
The spanning of "Basic Dimensions" over the other lines can be hit and
miss. Basically if you have a space where the column gap is expected the
text will be split into cells, if you don't then the text will be placed in
a cell with a COLSPAN value that spans several cells.
For example
$_$_BEGIN_PRE
| space aligns with column "gap"
v
Basic Dimensions
Hole No. X Y
-------------------------
1 3.2500 5.0150
2 1.2500 3.1250
etc.....
$_$_END_PRE
In this case you'd get "Basic" in column 1 and "Dimensions" spanning columns
2 and 3. If you edit this slightly as follows then the "Basic Dimensions" will
span all 3 columns
$_$_BEGIN_PRE
| space no longer aligns with column "gap"
v
Basic Dimensions
Hole No. X Y
-------------------------
1 3.2500 5.0150
2 1.2500 3.1250
etc.....
$_$_END_PRE
It's a bit of a black art.
Sometimes when the table is wrong, it's a good idea to set the BORDER size
to 0 (again via the policy options) to make things look not so bad. It's
a fudge, but a useful one to know.
3.4 Headings
3.4.1 How does the program _recognize_ headings?
The program can attempt to recognize five types of headings:
*Numbered headings*. These are lines that begin with section
numbers. To reduce errors, numbers must be broadly in sequence
and headings at the same level should have the same indentation.
Words like "Chapter" may be before the number, but may confuse
the analysis when present.
*Capitalised headings*. These are lines that are ALL IN UPPERCASE.
*Underlined headings*. These are lines which are followed by
a line consisting solely of "underline" characters such as underscore,
minus, equals etc. The length of the "underline" line must closely
match the length of the line it is underlining.
*Embedded headings*. These are headings embedded as the first
sentence of the first paragraph in the section. The heading will
be a single all-UPPERCASE sentence. Unlike the other headings, the
program will place these as bold text, rather than using heading
markup. You will need to manually enable the search for such headings,
it is not enabled by default.
*Key phrase headings*. These are lines in the source file that
begin with user-specified words (e.g. "Chapter", "Appendix" etc.)
The list of words and phrases to be spotted is case-sensitive and
will need to be set via the "Heading key phrases" policy.
The program is biased towards finding numbered headings, but will allow
for a combination. It's quite possible for the analysis to get confused,
especially when
- headings are centred, rather than at fixed indents. The policy
"Check indentation for consistency" should be disabled if this is the
case.
- headings include the words Chapter, Part etc. You should consider
using the "Heading key phrase" policy and disabling the search for
numbered headings in such cases.
- The numbering system repeats (e.g. Part I, 1,2,3,... Part II,
1,2,3...). Again, consider using "key phrase" and/or underlined
heading detection as an alternative.
- The file has numbered lists at a similar indentation to the numbered
sections. If possible move your numbered lists a few characters
to the right of the indentation that headings are expected at.
- The file has a large number of capitalised non-heading lines.
Manually disable the search for capitalised headings if this happens
- The numbering system is "exotic" (e.g. II.3.g)
To tell if the program is correctly detecting the headings
a) Look at the HTML to see if

,

etc. tags are being added
to the correct text.
b) If the headings are wrong, check the analysis policies are being set
correctly by looking at the values shown under
_Conversion Options -> Analysis policies -> headings_
after the conversion.
Depending on what is going wrong do one or more of the following :-
i) Adjust the headings policy (e.g. to disable capitalised headings)
ii) Edit the source to replace centred headings by headings at
a fixed indentation.
iii) Edit the source so that numbered lists are at a different
indentation to numbered sections.
iv) If your numbering system is too exotic, edit your source so
that all the headings are "underlined" and get the program
to recognize underlined, rather than numbered headings.
v) If possible consider the use of the "Heading key phrase" policy
instead.
3.4.2 Why are my headings coming out as hyperlinks?
This is a failure of analysis. The program looks for a possible contents list
at the top of the file before the main document (sometimes in the first
section).
If your file has no contents list, but the program wrongly expects one, then
as it encounters the headings it will mark these up as contents lines.
To prevent this, set the analysis policy
Expect contents list : No
to "no". Or add a preprocessor line to the top of your file as follows
$_$_BEGIN_PRE
$_$_CHANGE_POLICY Expect contents list : No
$_$_END_PRE
3.4.3 Why are the numbers of my headings coming out as hyperlinks?
Either a failure of analysis, or an error in your document. The software checks
headings "obey policy" and are in sequence. If you get your numbering sequence
wrong, or if you place the heading line at a radically different indentation
to all the others, then the software will reject this as a heading line, in
which case the number may well be turned into a hyperlink.
If it's an error in your document, fix the error.
For example, a common problem is numbered lists inside sections. If the
list numbers occur at the same level of indentation as the level 1 section
headings, then eventually a number on the list will be accepted as the
next "in sequence" header. For example in a section number [[TEXT 3.11]], any list
containing the number 4 will have the "4" treated as the start of the
next chapter. If section "3.12" is next, the change in section number from 4
will be rejected as "too small", and so all sections will be ignored until
section [[TEXT 4.1]] is reached.
The solution here is edit the source and indent the numbered list so that
it cannot be confused with the true headers, Alternatively change it to an
alphabetic, roman numeral or bulleted list.
Another possible cause if is the software hasn't recognized this level of
heading as being statistically significant. (e.g. if you only
have 2 level 4 headings (n.n.n.n) in a large document). In this case
you'll need to correct the headings policy, which is a sadly messy affair.
3.4.4 Why are various bullets being turned into headings, and the headings ignored?
The software can have problems distinguishing between
1 This is chapter one
and
1) This is list item number one.
To try and get it right it checks the sequence number, and the indentation
of the line. However problems can still occur if a list item on the right
number appears at the correct indentation in a section.
If possible, try to place chapter headings and list items at different
indentations.
In extreme cases, the list items will confuse the software into thinking they
are the headings. In such a case you'd need to change the policy file
to say what the headings are, with lines of the form
We have 2 recognized headings
Heading level 0 = "" N at indent 0
Heading level 1 = "" N.N at indent 0
(this may change in later versions).
3.4.5 Why are lines beginning with numbers being treated as headings?
The software can detect numbered headings. Any lines that begin with
numbers are checked to see if they are the next heading. This check
includes checking the number is (nearly) in sequence, and that the line is
(nearly) at the right indentation.
If the line meets these criteria, it is likely to become the next heading,
often causing the *real* heading to be ignored, and sometimes completely
upsetting the numbering sequence.
You can fix this by editing the source so that the "number" either occurs
at the end of the previous line, or has a different indentation to that
expected for headings.
3.4.6 Why are underlined headings not recognized?
The software prefers numbered headings to underlined or capitalised headings.
If you have both, you may need to switch the underlined headings on via
the policy
Expect underlined headings : Yes
3.4.7 Why are only _some_ of my underlined headings not recognized?
If the program is looking for underlined headings (see "[[GOTO Why are underlined headings not recognized?]]")
then the only reason for this is that the "underlining" is of a radically different length
to the line being underlined. Problems can also occur for long lines that
get broken.
Edit your source to
- place the whole heading on one line
- make the underlining the *same* length
3.4.8 How do I control the header level of underlined headings?
The level of heading associated with an underlined heading depends on the
underline character as follows:-
'****' level 1
'====','////' level 2
'----','____','~~~~' level 3
'....' level 4
The actual *markup* that each heading gets may depend on your policies.
In particular level 3 and level 4 headings may be given the same size
markup to prevent the level 4 heading becoming smaller than the text it
is heading. However the _logical_ different will be maintained, e.g.
in a generated contents list, or when choosing the level of heading at which
to split large files into many HTML pages.
3.4.9 Why are only the first few headings are working?
A couple of possible reasons :-
- a numbered list is confusing the software. This is the same
problems as "[[GOTO why are the numbers of my headings coming out as hyperlinks?]]"
- Some of your headings are "failing" the checks applied. See the
discussion in "[[GOTO how does the program recognize headings?]]"
One of the reasons for "failure" is that - for consistency - headings must
be in sequence and at the same indentation. This is an attempt to prevent
errors in documents that have numbers at the start of a line by chance being
treated as the wrong headings.
If some headings aren't close enough to the calculated indent then
they won't be recognised as headings. If a few headings are discarded
then later headings that *are* at the correct indentation are discarded
as being "out of sequence".
If you're authoring from scratch then the easiest solution is to edit
all the headings to have the same indent. Alternatively disable the
policy "Check indentation for consistency".
3.5 Hyperlinks
3.5.1 Why doesn't it correctly parse my hyperlinks?
The software attempts to recognize all URLs, but the problem is that - especially
near the end of the URL - punctuation characters can occur. The software then
has difficulty distinguishing a comma separated list of URLs from a URL with
a series of commas in it (as beloved at C|Net).
This algorithm is being improved over time, but there's not much more you
can do than manually fix it, and report the problem to the author who will
pull out a bit more hair in exasperation :)
3.5.2 Why doesn't it recognize my favourite newsgroup?
To avoid errors the program will only recognize newsgroups in the "big 7"
hierarchies. Otherwise filenames like "command.com" might become unwanted
references to fictional newsgroups.
This means that uk.telecom won't be recognized, although if you place
"news:" in from of it like this news:uk.telecom then it is recognized.
If you want to make "uk." recognized as a valid news hierarchy, then set the
policy
recognized USENET groups : uk
Then any work beginning "uk." may become a newsgroup link.
3.5.3 Why are only some of my section references becoming hyperlinks?
The program will only convert numbers that match known numbered sections
into hyperlinks. If the number is a genuine section heading, then the
chances are that this level of heading has not been detected. This has
happened in large documents which contained only 2 level 5 headings. In
such document you may need to manually add the extra level to your policy file.
Another limit is that the program won't convert level 1 heading references,
because the error rate is usually two high. For example if I say
"1, 2, 3" it's unlikely I want this to become hyperlinks to chapters
1, 2 and 3.
3.5.4 Why are some numbers becoming hyperlinks?
In a numbered document numbers of the form n.n may well become hyperlinks
to that section of the document. This can cause "Windows [[TEXT 3.1]]" to become
a hyperlink to section 3.1 if such a section exists in your document.
You can either insert some character (such as "V" to make "V3.1"), place the
number inside a protective pre-processor TEXT tag as follows
[[OT]]TEXT [[TEXT 3.1]][[CT]]
or disable this feature entirely via the policy
Cross-refs at level : 3
(which means only "level 3" headings such as n.n.n will be turned into links,
or
Cross-refs at level : (none)
which should disable the behaviour.
3.5.5 Why are some long hyperlinks not working?
The software will sometimes break long lines to make the HTML more readable. If
this happens in the middle of a hyperlink, the browser reads the end of line
as a space in the URL.
You can fix this by editing the output text so that the HREF="" part
of the file is all on the same line.
This "feature" may be fixed in later versions of AscToHTM.
3.5.6 How do I preserve one URL per line?
Some files contain lists of URLs, with one URL per line. By default the
software will not normally preserve this structure because long lines are
usually concatenated into a single paragraph.
You can change this behaviour using the option on the
_Output policies -> Hyperlinks_ policy sheet.
See also "[[GOTO why isn't the software preserving my line structure?]]"
3.6 Policy files
3.6.1 How many policies are there? Where can I read more about individual policies?
First time I looked it was nearly 200, recently the number is approaching 250.
They kind of sneak up on you, I guess. The "[Pol man]" gives a pretty
comprehensive description of what each one does and where it can be found.
Last time I checked that file was 5000 lines of text before conversion to HTML.
People complain that there are too many policies, but then they say "couldn't
you add an option to ...", and so it goes. Organizing these policies in a
logical manner is a fairly difficult problem, and if anyone has any bright
ideas I'm listening. In recent versions I added overview policies to make
things easier to locate or to switch off en masse.
3.6.2 My policy file used to work, but now it doesn't. Why?
Make sure you're using an "incremental" policy file, rather than a full
one. You can do this by viewing the .pol file in a text editor. An
"incremental" policy file will only contain lines for the policies you've
changed. A full policy file will contain all possible policies.
If you load a "full" policy file you prevent the program intelligently
adjusting to the particular file being converted. If this happens either
edit out the lines you don't want from your policy file, or reset the
policies to their defaults and create a new policy file from scratch.
NOTE: There used to be a bug whereby sometimes a policy would inadvertently
get saved as a "full" file. that should be fixed now.
3.6.3 xxxx Policy is not taking effect. What shall I do?
(see 1.7)
3.7 Bullets and lists
3.7.1 Why is the indentation wrong on follow-on paragraphs?
The program can't distinguish between indented paragraphs and paragraphs
that are intended as follow-on paragraphs from some bullet point or list
item.
This means that whilst the first paragraph (the one with the bullet point)
is indented as a result of being placed inside appropriate list markup,
the second and subsequent paragraphs are just treated as indented text.
The bullet point will be indented as one level deeper than the text position
of the bullet. The follow-on paragraph will be indented according to it's
own indentation position compared to the prevailing documentation pattern.
Ideally this will be one level deeper than the text position
of the bullet.
Occasionally the two result in different indentations. The solutions are
either to
a) Review your *indent position(s)* policy with a view to adjusting
the values to give the right amount of indentation to the follow-on
paragraphs. Sometimes adding an extra level to match the indentation
of the follow-on paragraph is all that's necessary.
b) Edit your source text slightly, adjusting the indent of either the
list items or follow-on paragraphs until the two match.
3.7.2 Why is the numbering wrong on some of my list items?
HTML doesn't allow the numbering to be marked up explicitly. Instead you
can only use a START attribute in the tag to get the right first number
which is incremented each time a

tag is seen.
Some browsers don't implement the START attribute, and so they always restart
numbering at 1.
There's not much I can do about this problem.
I've also seen a bug in Opera V3.5 where any tag (such as ) placed between
the and the

causes the numbering to increment. That shouldn't
be a problem here, as that's illegal HTML markup - and we try very hard not
to generate any of that!
3.7.3 Some of my text has gone missing. What happened?
There's a bug (in Opera), where a tag between the and

tag
causes all that text to not be displayed.
That shouldn't be a problem here, as that's illegal HTML markup - and we
try very hard not to generate any of that!
If there's any other problem of this sort please email *infosupport.com*
with details.
3.8 Contents List generation
3.8.1 How do I add a contents list to my file?
There are a number of ways:-
- If the file already has a contents list this may be detected
if the sections are numbered, and the contents line will be turned
into links to the sections concerned.
- You can forced the addition of a contents list using the policies
under the menu at
_Conversion Options -> Output Policies -> Contents List_
A hyperlinked contents list will be generated from the headings that
the program detects. This list will be placed at the top of the first
file.
- If you don't want the generated list to be placed at the top of the
file, insert the preprocessor command $_$_CONTENTS_LIST at the
location(s) you want. This command takes arguments that allow a
limited number of formatting options. It can also be limited in
scope, so you can, if you wish, add a $_$_CONTENTS_LIST to each
chapter in your document.
3.8.2 Why doesn't my contents list doesn't show all my headings?
First read "[[GOTO how does the program recognize headings?]]".
If you're generating a contents list from the observed headings, then any
missing headings are either because
a) The program didn't recognize the headings
b) The policy *Maximum level to show in contents* has been set to
a value that excludes the desired heading.
If you're converting an in-situ contents list, then only (a) is likely to
apply, in which case you need to ensure the program recognizes your headings.
3.8.3 Some of my contents hyperlinks don't work!
There used to be a problem whereby the software would add hyperlinks to sections
that didn't exist, or would point to the wrong file when a large file was being
split into many smaller files.
Both problems should now be fixed, so if you encounter this problem, contact
*infosupport.com*.
3.9 Emphasis
3.9.1 Why didn't my emphasis markup work?
Emphasis markup can be achieved by placing asterisks (*) or underscores (_)
in pairs around words or phrases. The matching pair can be over a few
lines, but cannot span a blank line. Asterisks and underscores can be nested.
Asterisks generate *bold markup*, underscores generate _italic markup_,
and combining these generates _*bold, italic markup*_.
If you wrap a phrase in underscores, and replace and replace all the spaces
by underscores [[TEXT _like_this_]] then the result will be underlines
_like_this_ and not in italics.
The algorithm copes reasonably well with normal punctuation, but if you use some
unanticipated punctuation, it may not be *recognized*!&%@!
You can have a _phrase that spans a couple of lines that contains
*another phrase of a different type* in the middle of it_, but you can't
have two phrases of the same type nested that way. Be reasonable :-)
Phrases that span a blank line are not permitted. You'll need to end the
markup before the blank line, and re-start it afterward. This is to reduce
the chances of false matches.
3.10 Link Dictionary
3.10.1 What is the Link Dictionary?
The link dictionary allows you to add hyperlinks to particular words or
phrases. You can choose the phrase to be matched, the text to be displayed
and the URL to be linked to.
This can help when building a site by converting multiple text files. For
example the whole www.jafsoft.com site is built from text files, and extensive
use of a link dictionary is made to add links from one page to another.
3.10.2 My links aren't coming out right. Why?
Known problems include
- if the "match text" matches part of the URL the program may get
confused. Try to keep them different.
- if the "match text" of one link is a substring of another the
program will get confused
- if a link is repeated on the same line on the first occurrence
is converted (fixed post V3.0)
- if the "match text" spans two lines it won't be detected.
One tip is to place brackets round the [match text] in your source file...
this not only makes the chances of a false match less likely, but also makes
it clearer in the source files where the hyperlinks will be.
3.10.3 I can't enter links into the Link Dictionary. What gives?
The Link Dictionary support in the Windows version of the software is a little
quirky. Apologies for that.
The way it should work is that you click on "add new link definition",
button.
I realize now that this is counterintuitive, and will probably address this
in the next release.
If you save your policy, each link appears as a line of the form
Link definition: "match text" = "display text" + "URL"
e.g.
Link definition: "jaf" = "John Fotheringham" + "http://www.jafsoft.com/"
The whole definition must fit on one line.
You may find it easier to open your .pol file in a text editor and add these
by hand.
3.11 Batch conversion
For more information see the section "Processing several files at once" in the
main documentation. The software supports wildcards, and console versions
are available to registered users which are better suited for batch conversions.
In the shareware versions no more than 5 files may be converted at once.
This limit is absent in the registered version (see
"[[GOTO what's the most files I can convert at one go?]]").
3.11.1 How do I convert a few files at once?
If you only want a few files converted, then the simplest way is to drag and
drop those files onto the program. You can either drag files onto the program's
icon on the desktop, or onto the program itself.
If you drag files onto the program's icon there is a limit with this approach
of around 10 files. This limit arises because the filenames are concatenated
to make a command string, and this seems to have a Windows-impose limit of
255 characters. This problem may be solved in later versions.
The same limit doesn't seem to apply when you drag files onto the open
program.
Alternatively you can browse to select the files you want converting.
3.11.2 How do I convert _lots_ of files at once?
If you want to convert many files in the same directory, then just type in
a wildcard like "*.txt" into the name of the files to be converted.
Registered users of the software can get a console version of the software.
This can accept wildcards on the command line, and is more suited for batch
conversion, e.g. from inside windows batch files (for example it won't
grab focus when executed).
If you want to convert many files in different directories, either invoke
the console version multiple times using a different wildcard for each
directory, converting one directory at a time, or investigate the use of
a steering command file when running from the command line. See the main
documentation for details.
3.11.3 What's the most files I can convert at one go?
The largest number of files converted at one time using the wildcard
function was reported to be around 2000. A week later someone contacted
me with around 3000 files to be converted. A few weeks after that someone
was claiming 7000. If you'd like to claim a higher number, let me know.
Theoretically the only limit is your disk space. The program operates
on a flat memory model so that the memory used is largely independent
of the number of files converted, or the size of the files being converted.
Such conversions are a testament to the program's stability and efficient
use of system resources. That said if possible we recommend you break the
conversion into smaller runs you reduce your risks :-)
3.12 File splitting
3.12.1 Why isn't file splitting working for me?
The program can only split into files at headings it recognises (see
"[[GOTO how does the program recognize headings?]]"). You first need
to check that the program is correctly determining where the headings are,
and what type they are.
Headings can be numbered, capitalised or underlined. To tell if the
program is correctly detecting the headings
a) Look at the HTML to see if

,

etc. tags are being added
to the correct text.
b) If the headings are wrong, check the analysis policies are being set
correctly. If necessary set them yourselves under
_Conversion Options -> Analysis policies -> headings_
Once the headings are begin correctly diagnosed, you can switch on file
splitting using the policies under
_Conversion Options -> output policies -> file generation_
Note that the "split level" is set to 1 to split at "chapter" headings, 2
to split at "chapter and major section" headings etc.
Underlined headings tend to start at level 2, depending on the underline
character (see "[[GOTO How do I control the header level of underlined headings?]]")
Hopefully this will give you some pointers, but if you still can't get it to
work, please mail a copy of the source file (and any policy file you're
using) to *infosupport.com* and I'll see what I can advise.
3.13 Miscellaneous questions
3.13.1 How do I suppress the Next/Previous navigation bar when splitting a large
document?
Prior to version 4 there was a bug which meant the policy "Add navigation bar"
was being ignored when splitting files (the only time it was used). This
is now fixed.
However also available in version 4 is a new "HTML fragments" feature that
allows you to customize some of the HTML generated by the software. This
includes the navigation bars so that, for example, if you wanted to suppress
just the top navigation bar, you could define the fragment NAVBAR_TOP to
be empty.
See "[[GOTO customizing the HTML created by the software]]" and the "[Tag Man]"
for more details.
3.13.2 Why am I getting regions of

text?
The software attempts to detect pre-formatted text in your files and, when
it finds some, attempts to turn these into tables. In many cases having
detected some pre-formatted text it recognises that it cannot make a table
and so resorts to using

...

sections actually work quite well for some documents, but in other
cases they would be better not handled this way.
Happily the solution is simple. On the menu go to
_Conversion Options -> Analysis policies -> What to look for_
and disable "pre-formatted regions of text".
3.13.3 Do you have a html-to-text converter, rtf-to-html converter etc?
No.
My converters convert from *plain ASCII text* into HTML or RTF. Their
"unique selling point" is that they intelligently work out the structure
of the text file.
However *other* people provide other converters.
There are a number of html->text converters on top of which Netscape
has a good "save as text" feature. Or you can import the HTML into
Word and use Word's save as text features (although in my opinion these
are inferior to Netscape's).
If you visit my ZDNet listing at http://www.hotfiles.com/?000M96 and click
on the "related links" you'll see a number of converters listed.
There are at least two RTF-to-HTML converters called RTF2HTML and RTFtoHTML
and of course Word for Windows offers this capability (it doesn't suit
everyone though).
In fact, here are four products:-
RTFtoHTML can be found at http://www.sunpack.com/RTF/ [[BR]]
RTF2HTML can be found at http://www.xwebware.com/products/rtf2html/ [[BR]]
RTF-2-HTML can be found at http://www.easybyte.com/rtf2html.com [[BR]]
IRun RTF conveter (free) can be found at http://www.pilotltd.com/irun/index.html
Yet another Word convetrter can be found at http://www.yawcpro.com/
4.0 Adding value to the HTML generated
======================================
4.1 Adding Title and Description and Keyword META tags
There are policies that allow Title, Description and keywords to be added to
your pages.
The title will default to "Converted from """, but a number of
policies allow the title to be made to adopt the first section title, or
any text that you provide.
Alternatively you can use preprocessor commands embedded in the source file
as follows
$_$_BEGIN_PRE
$_$_TITLE This is my lovely HTML page
$_$_DESCRIPTION This page was converted from text
$_$_DESCRIPTION and this description was added using preprocessor
$_$_DESCRIPTION commands
$_$_KEYWORDS Converted, from, text
$_$_END_PRE
This approach is in many ways simpler, as it avoids the need for policy
files, and keeps all your source in one file.
4.2 Adding other META tags
The program doesn't have a mechanism to explicitly add other META tags,
however you can still achieve this by using the "script file" feature
that allows text to be copied into the section of the document.
You can also use the HEAD_SCRIPT "HTML fragment" in the same way.
See "[[GOTO customizing the HTML created by the software]]".
Originally indented as a way of adding JavaScript to a document, in fact
you can place anything you like in such a sections, including tags.
In fact the "script" file need not contain any JavaScript at all, so in that
respect it is mis-named.
See "[[GOTO Adding JavaScript]]"
4.3 Adding Headers and Footers
The software will allow you to add headers and footers to each file generated.
You can do this either through policies or by defining some "HTML fragments"
The "HTML fragments" method is preferred. If both policies and fragments
are defined then the fragments will be used.
You can define the "HTML fragments" HTML_HEADER and HTML_FOOTER (see
"[[GOTO customizing the HTML created by the software]]").
Alternatively, use the policies concerned are
HTML header file : c:\include\header.inc
HTML footer file : c:\include\footer.inc
The value is the name of the file to be used (you must supply a full or
relative path so that the file may be located).
Whether defined by file or as a "HTML fragment", these fragments will be
copied into each HTML page generated after the tag and before
the tag respectively.
If a large file is being split into many smaller HTML files these headers and
footers will be copied into *every* HTML generated. This is different to using
an $_$_INCLUDE statement, which only gets executed once.
These files can be useful to add a standard title in the header, and links to
other parts of the site (home, contacts etc) in the footer of whatever.
4.4 Adding Javascript
There's a limit to how you can add JavaScript to a page generated from text.
That said the program will allow you to embed javascript (or indeed anything
else, such as META tags) into the ... section of the document.
This is the recommended location for including JavaScript as this ensures it
is all read before anything is drawn.
You can specify the "script" code to be copied either by defining a
HEAD_SCRIPT "HTML fragment" (see "[[GOTO customizing the HTML created by the software]]"),
or by using the policy
HTML script file : ..\scripts\myscript.js
This should point to a file that contains all the scripting required. The
program will simply copy this text into the header of each HTML page generated.
Using the HEAD_SCRIPT fragment has the advantage that you can place the
necessary text into your source file, which avoids the need to manage
individual policy files and script files. This would be done as follows
$_$_BEGIN_PRE
$_$_DEFINE_HTML_FRAGMENT HEAD_SCRIPT
..
tags and any block
..
$_$_END_BLOCK
$_$_END_PRE
See the "[Tag Man]" for more about using "HTML fragments".
NOTE: For the JavaScript to have an effect, you may need to embed
further HTML into the body of the source text.
See "[[GOTO how do I add my own HTML to the file?]]".
4.5 Adding colour/color
A number of policies allow you to choose your document colours. These can be
found under the Windows menu
_Conversion Options -> Output policies -> Document colours_
and
_Conversion Options -> Output policies -> Tables_
All colours should be specified in HTML format, i.e. as 6-character hex values
in the form rrbbgg. A few colours like "Red", "White" and "Black" may be
entered by name. Wherever possible the program will use the name so as to
make the HTML more understandable.
If you don't want *any* colours added to your HTML (not even the default white
background) you can use the policy *Suppress all colour markup*.
For a full list of colour policies, see the "[Pol man]".
4.6 Adding images to the HTML
See "[[GOTO how do I add my own HTML to the file?]]" which includes an
example which is used to add an image to HTML version of this document.
4.7 Adding hyperlinks to keywords and phrases
Use the "[[GOTO Link Dictionary]]".
4.8 Splitting large documents into sections
The program can only split into files at headings it recognizes. So first
you need to check that the program is correctly determining where your
headings are, and what type they are. See "[[GOTO how does the program recognize headings?]]"
Once the headings are begin correctly diagnosed, you can switch on file
splitting using the policies under
_Conversion Options -> output policies -> file generation_
Note that the "split level" is set to 1 to split at "chapter" headings, 2 to
split at "chapter and major section" headings etc.
Underlined headings tend to start at level 2, depending on the underline
character (see "[[GOTO How do I control the header level of underlined headings?]]")).
Hopefully this will give you some pointers, but if you still can't get it to
work, please mail me a copy of the source file (and any policy file you're
using) and I'll see what I can advise.
4.9 Customizing the HTML created by the software
From version 4 onwards AscToHTM will allow you do define "HTML fragments"
that can be used in place of the standard HTML generated by the program in
certain situations.
These fragments can be placed in a separate file, which is pointed to by
the policy "HTML fragments file", or can be included in the source file
itself. For example the fragment
$_$_BEGIN_PRE
$_$_DEFINE_HTML_FRAGMENT HTML_HEADER

Head of each page

$_$_END_BLOCK
$_$_END_PRE
defines a centred header and a horizontal ruler that will be placed at the
top of each page. This could include a navigational link to your home
page, and would be useful when splitting a large document into smaller
pages - you'd get the same header on each page.
See the "HTML fragments" chapter in the "[Tag Man]" for more details.
5.0 Diagnosing problems for yourself
====================================
The program offers a number of diagnostic aids. These can be awkward to use,
but if you want to get a better idea of what's going on these can sometimes
help.
The various diagnostic options can be accessed via the menu option
_Conversion Options -> Output policies -> File generation_
5.1 Generate a .lis file
The program can be made to generate listing files. A fragment is shown below.
$_$_BEGIN_PRE
56: 103 |1.2.4 Who is the author?
57: 1 |
58: 104 |1.2.4.1 John A Fotheringham
59: 1 |
60: |That's me that is. The program is wholly the responsibility
61: |Fotheringham, who maintains it in his spare time.
62: 1 |
63: 1 |
$_$_END_PRE
These show the source lines in truncated form. Each line is numbered, and
markers show how the line has been analysed. In this case the line with "Who
is the author?" has been allocated a line type of 103 ("header level 3") and
is followed by a line of type 1 ("blank"). A complete list of line types and
code is included at the end of the file.
Three files are generated; a ".lis1" file which is a listing from the
Analysis pass, a ".lis" file which is a listing from the output pass and a
".stats" file which lists statistics collected during the analysis. Ignore
this last file.
The ".lis1" and ".lis" files have similar format, but represent the file as
analysed before and after the application of program policies. Thus more
lines will be marked as headings in the ".lis1" file, but only those that
"pass policy" - i.e. are in sequence and at the right indentation - will be
marked as headings in the ".lis" file.
Understanding these files is a black art, but a quick look can sometimes help
you understand how the program has interpreted particular lines that have gone
wrong, and give you a clue as to which policies may be used to correct this
behaviour.
5.2 Generate a .log file
The program will display messages during conversion. You can filter these
messages (e.g. to suppress certain types) by using the Menu option
_Settings -> Diagnostics_
These messages can also be output to a .log file by using the options under
_Conversion Options -> Output policies -> File generation_
This log file will contain *all* messages, including those suppressed by
filtering. In the Windows version you can also choose to save the messages
displayed to file.
Looking through the .log file can sometimes reveal problems that the program
has detected and reported.
5.3 Generate a .pol file
The program operates in three passes.
- The first pass analyses the file, and sets various policies
automatically (assuming these haven't previously been loaded
from a policy file).
- The second pass calculates the output file structure,
- The third pass actually generates the output files.
You can use the options under _Conversion Options_ to review the policies
that have been set.
Alternatively you can save these policies to file, using the menu option
_Conversion options -> Save policies to file_
Selecting the "save all policies" option. Be careful not to overwrite any
existing "incremental" file.
This file will list all policies used, which you may review... particularly
looking for any analysis policies that seem to have been incorrectly set.
5.4 Understanding error messages
In the fullness of time an [Error Manual] will be produced. (see 1.7)
5.5 Diagnosing table problems
See "[[GOTO how does the program detect and analyse tables?]]" and other topics
in the "[[GOTO Tables]]" section of this document.
6.0 Future directions
=====================
6.1 RTF generation
The text analysis engine that lies at the core of AscToHTM is now
available in a text-to-RTF converter. This is called [AscToRTF],
but we prefer the name "rags to Rich Text" :-)
The initial release of this software was in March 2000. For more details
visit the [AsctoRTF] home page.
6.2 Multi-lingual user interface
AscToHTM (and AscToRTF) support several languages in the user interface.
These translations have been provided by volunteers, and so far only extend
to parts of the user interface. All the programs' documentation and support
remain in English.
The software also supports the use of "language skins", i.e. the loading of
text files containing all the user interface text. This will hopefully allow
people to convert the program into more languages. We'd welcome copies of
skins developed, and will consider them for future distribution. Please
send them to *infojafsoft.com*
For more details visit http://www.jafsoft.com/products/translations.html
6.3 Improved standards support
*Standards support is now a stated aim of the program*.
However, due to the complexities of generating standards-compliant HTML from
arbitrarily structured text we don't feel we can *guarantee*
standards-compliance. If you find the program generating faulty HTML, please
report it to *infosupport.com*.
If you want to validate your HTML, visit http://validator.w3.org/
Please note, if you embed your own HTML into your source files, this may
well upset the balance in terms of compliance.
Note: When the program detects that it has violated standards, error messages
will be displayed. You should report such violations to
*infosupport.com*.
6.4 Targeting particular HTML versions
Internally the program is aware of the features and limitations of various
versions of HTML as follows
$_$_BEGIN_TABLE
$_$_TABLE_MIN_COLUMN_SEPARATION 2
[[TEXT HTML 3.2]]
[[TEXT HTML 4.0]] Transitional
[[TEXT HTML 4.0]] Strict (not yet supported)
$_$_END_TABLE
For example certain HTML entities are only supported under newer versions of
HTML.
Bearing in mind we're converting text files, there's a limit as to how advanced
the HTML can be (for example I can't work out which text to animate :-)
If you want to target a particular form of HTML, use the
policy
HTML version to be targeted : [[TEXT HTML 4.0]] Transitional
and the program will adjust to do the best it can.
Note: "[[TEXT HTML 4.0]] Strict" is not yet supported as the program uses
a number of "deprecated" tags that are still allowed in "[[TEXT HTML 4.0]] Transitional".
6.5 CSS and Font support
Font support will be introduced shortly. Due to the program's history, the
HTML currently being generated is more akin to [[TEXT HTML 3.2]].
Over time we plan to offer proper [[TEXT HTML 4.0]] and CSS support, although
obviously this will be limited to what can be sensibly applied to converted
text.
$_$_DEFINE_HTML_FRAGMENT HTML_HEADER