BinHex is a format used by Macintosh for transporting Mac files
safely through electronic mail, as short-lined, 7-bit, semi-compressed
data streams. Ths module provides a means of converting those
data streams back into into binary data.

The actual data included in the file. The Data fork is typically the
only meaningful part of a Macintosh file on a non-Macintosh computer system.
For example, if a Macintosh user wants to send a file of data to a
user on an IBM-PC, she would only send the Data fork.

Contains a collection of arbitrary attribute/value pairs, including
program segments, icon bitmaps, and parametric values.

Additional information regarding Macintosh files is stored by the
Finder in a hidden file, called the ``Desktop Database''.

Because of the complications in storing different parts of a
Macintosh file in a non-Macintosh filesystem that only handles
consecutive data in one part, it is common to convert the Macintosh
file into some other format before transferring it over the network.
The BinHex format squashes that data into transmittable ASCII as follows:

The file is output as a byte stream consisting of some basic header
information (filename, type, creator), then the data fork, then the
resource fork.

The byte stream is compressed by looking for series of duplicated
bytes and representing them using a special binary escape sequence
(of course, any occurences of the escape character must also be escaped).

The compressed stream is encoded via the ``6/8 hemiola'' common
to base64 and uuencode: each group of three 8-bit bytes (24 bits)
is chopped into four 6-bit numbers, which are used as indexes into
an ASCII ``alphabet''.
(I assume that leftover bytes are zero-padded; documentation is thin).

Compute the MacBinary-II-style CRC for the given DATA, with the CRC
seeded to SEED. Normally, you start with a SEED of 0, and you pump in
the previous CRC as the SEED if you're handling a lot of data one chunk
at a time. That is:

$crc = 0;
while (<STDIN>) {
$crc = macbinary_crc($_, $crc);
}

Note: Extracted from the mcvert utility (Doug Moore, April '87),
using a ``magic array'' algorithm by Jim Van Verth for efficiency.
Converted to Perl5 by Eryq. Untested.

Compute the HQX-style CRC for the given DATA, with the CRC seeded to SEED.
Normally, you start with a SEED of 0, and you pump in the previous CRC as
the SEED if you're handling a lot of data one chunk at a time. That is:

$crc = 0;
while (<STDIN>) {
$crc = binhex_crc($_, $crc);
}

Note: Extracted from the mcvert utility (Doug Moore, April '87),
using a ``magic array'' algorithm by Jim Van Verth for efficiency.
Converted to Perl5 by Eryq.

Instance method.
Get/set the creator of the file. This is a four-character
string (though I don't know if it's guaranteed to be printable ASCII!)
that serves as part of the Macintosh's version of a MIME ``content-type''.

For example, a document created by ``Canvas'' might have
creator "CNVS".

Instance method.
Get/set the type of the file. This is a four-character
string (though I don't know if it's guaranteed to be printable ASCII!)
that serves as part of the Macintosh's version of a MIME ``content-type''.

On each iteration, next() (and done()) may return either
a decent-sized non-empty string (indicating that more converted data
is ready for you) or an empty string (indicating that the converter
is waiting to amass more input in its private buffers before handing
you more stuff to output.

Note that done()always converts and hands you whatever is left.

This may have been a good approach. It may not. Someday, the converter
may also allow you give it an object that responds to read(), or
a FileHandle, and it will do all the nasty buffer-filling on its own,
serving you stuff line by line:

On each iteration, next() (and done()) may return either
a decent-sized non-empty string (indicating that more converted data
is ready for you) or an empty string (indicating that the converter
is waiting to amass more input in its private buffers before handing
you more stuff to output.

Note that done()always converts and hands you whatever is left.

Note that this converter does not find the initial
``BinHex version'' comment. You have to skip that yourself. It
only handles data between the opening and closing ":".

Unlike its cousins base64 and uuencode, BinHex format is not
amenable to being parsed line-by-line. There appears to be no
guarantee that lines contain 4n encoded characters... and even if there
is one, the BinHex compression algorithm interferes: even when you
can decode one line at a time, you can't necessarily
decompress a line at a time.

For example: a decoded line ending with the byte \x90 (the escape
or ``mark'' character) is ambiguous: depending on the next decoded byte,
it could mean a literal \x90 (if the next byte is a \x00), or
it could mean n-1 more repetitions of the previous character (if
the next byte is some nonzero n).

For this reason, a BinHex parser has to be somewhat stateful: you
cannot have code like this:

unless something is happening ``behind the scenes'' to keep track of
what was last done. The dangerous thing, however, is that this
approach will seem to work, if you only test it on BinHex files
which do not use compression and which have 4n HEX characters
on each line.

Since we have to be stateful anyway, we use the parser object to
keep our state.

Solutions that demand reading everything into core don't cut
it in my book. The first MPEG file that comes along can louse
up your whole day. So, there are no size limitations in this
module: the data is read on-demand, and filehandles are always
an option.

A lot of the byte-level manipulation that has to go on, particularly
the CRC computing (which involves intensive bit-shifting and masking)
slows this module down significantly. What is needed perhaps is an
optional extension library where the slow pieces can be done more
quickly... a Convert::BinHex::CRC, if you will. Volunteers, anyone?

Even considering that, however, it's slower than I'd like. I'm
sure many improvements can be made in the HEX-to-BIN end of things.
No doubt I'll attempt some as time goes on...

...there is a layered parsing algorithm to reverse the process.
Basically, it works in a similar fashion to stdio's fread():

0. There is an internal buffer of decompressed (BIN) data,
initially empty.
1. Application asks to read() n bytes of data from object
2. If the buffer is not full enough to accomodate the request:
2a. The read() method grabs the next available chunk of input
data (the HEX).
2b. HEX data is converted and decompressed into as many BIN
bytes as possible.
2c. BIN bytes are added to the read() buffer.
2d. Go back to step 2a. until the buffer is full enough
or we hit end-of-input.

The conversion-and-decompression algorithms need their own internal
buffers and state (since the next input chunk may not contain all the
data needed for a complete conversion/decompression operation).
These are maintained in the object, so parsing two different
input streams simultaneously is possible.