NAME
XML::Handler::YAWriter - Yet another Perl SAX XML Writer
SYNOPSIS
use XML::Handler::YAWriter;
my $ya = new XML::Handler::YAWriter( %options );
my $perlsax = new XML::Parser::PerlSAX( 'Handler' => $ya );
DESCRIPTION
YAWriter implements Yet Another XML::Handler::Writer. The
reasons for this one are that I needed a flexible escaping
technique, and want some kind of pretty printing. If an
instance of YAWriter is created without any options, the
default behavior is to produce an array of strings
containing the XML in :
@{$ya->{Strings}}
Options
Options are given in the usual 'key' => 'value' idiom.
Output IO::File
This option tells YAWriter to use an already open file
for output, instead of using $ya->{Strings} to store
the array of strings. It should be noted that the only
thing the object needs to implement is the print
method. So anything can be used to receive a stream of
strings from YAWriter.
AsFile string
This option will cause start_document to open named
file and end_document to close it. Use the literal
dash "-" if you want to print on standard output.
AsArray boolean
This option will force storage of the XML in
$ya->{Strings}, even if the Output option is given.
AsString boolean
This option will cause end_document to return the
complete XML document in a single string. Most SAX
drivers return the value of end_document as a result
of their parse method. As this may not work with some
combinations of SAX drivers and filters, a join of
$ya->{Strings} in the controlling method is preferred.
Encoding string
This will change the default encoding from UTF-8 to
anything you like. You should ensure that given data
are already in this encoding or provide an Escape
hash, to tell YAWriter about the recoding.
Escape hash
The Escape hash defines substitutions that have to be
done to any string, with the exception of the
processing_instruction and doctype_decl methods, where
I think that escaping of target and data would cause
more trouble than necessary.
The default value for Escape is
$XML::Handler::YAWriter::escape = {
'&' => '&',
' '<',
'>' => '>',
'"' => '"',
'--' => '--'
};
YAWriter will use an evaluated sub to make the
recoding based on a given Escape hash reasonably fast.
Future versions may use XS to improve this performance
bottleneck.
Pretty hash
Hash of string => boolean tuples, to define kind of
prettyprinting. Default to undef. Possible string
values:
AddHiddenNewline boolean
Add hidden newline before ">"
AddHiddenAttrTab boolean
Add hidden tabulation for attributes
CatchEmptyElement boolean
Catch empty Elements, apply "/>" compression
CatchWhiteSpace boolean
Catch whitespace with comments
IsSGML boolean
This option will cause start_document,
processing_instruction and doctype_decl to appear
as SGML. The SGML is still well-formed of course,
if your SAX events are well-formed.
NoComments boolean
Supress Comments
NoDTD boolean
Supress DTD
NoPI boolean
Supress Processing Instructions
NoProlog boolean
Supress Prolog
NoWhiteSpace boolean
Supress WhiteSpace to clean documents from prior
pretty printing.
PrettyWhiteIndent boolean
Add visible indent before any eventstring
PrettyWhiteNewline boolean
Add visible newlines before any eventstring
SAX1 boolean (not yet implemented)
Output only SAX1 compliant eventstrings
Notes:
Correct handling of start_document and end_document is
required!
The YAWriter Object initialises its structures during
start_document and does its cleanup during end_document.
If you forget to call start_document, any other method
will break during the run. Most likely place is the encode
method, trying to eval undef as a subroutine. If you
forget to call end_document, you should not use a single
instance of YAWriter more than once.
For small documents AsArray may be the fastest method and
AsString the easiest one to receive the output of
YAWriter. But AsString and AsArray may run out of memory
with infinite SAX streams. The only method
XML::Handler::Writer calls on a given Output object is the
print method. So it's easy to use a self written Output
object to improve streaming.
A single instance of XML::Handler::YAWriter is able to
produce more than one file in a single run. Be sure to
provide a fresh IO::File as Output before you call
start_document and close this File after calling
end_document. Or provide a filename in AsFile, so
start_document and end_document can open and close its own
filehandle.
Automatic recoding between 8bit and 16bit does not yet
work correctly !
I have Perl-5.00563 at home and here I can specify "use
utf8;" in the right places to make recoding work. But I
dislike saying "use 5.00555;" because many systems run
5.00503.
If you use some 8bit character set internally and want use
national characters, either state your character as
Encoding to be ISO-8859-1, or provide an Escape hash
similar to the following :
$ya->{'Escape'} = {
'&' => '&',
' '<',
'>' => '>',
'"' => '"',
'--' => '--'
'ö' => 'ö'
'ä' => 'ä'
'ü' => 'ü'
'Ö' => 'Ö'
'Ä' => 'Ä'
'Ü' => 'Ü'
'ß' => 'ß'
};
You may abuse YAWriter to clean whitespace from XML
documents. Take a look at test.pl, doing just that with an
XML::Edifact message, without querying the DTD. This may
work in 99% of the cases where you want to get rid of
ignorable whitespace caused by the various forms of pretty
printing.
my $ya = new XML::Handler::YAWriter(
'Output' => new IO::File ( ">-" );
'Pretty' => {
'NoWhiteSpace'=>1,
'NoComments'=>1,
'AddHiddenNewline'=>1,
'AddHiddenAttrTab'=>1,
} );
XML::Handler::Writer implements any method
XML::Parser::PerlSAX wants. This extends the Java SAX1.0
specification. I have in mind using Pretty=>SAX1=>1 to
disable this feature, if abusing YAWriter for a SAX proxy.
AUTHOR
Michael Koehne, Kraehe@Copyleft.De
Thanks
"Derksen, Eduard (Enno), CSCIO" helped me
with the Escape hash and gave quite a lot of useful
comments.
SEE ALSO
the perl manpage and the XML::Parser::PerlSAX manpage