Perl 5 version 14.0
documentation

pack

Takes a LIST of values and converts it into a string using the rules
given by the TEMPLATE. The resulting string is the concatenation of
the converted values. Typically, each converted value looks
like its machine-level representation. For example, on 32-bit machines
an integer may be represented by a sequence of 4 bytes, which will in
Perl be presented as a string that's 4 characters long.

The >
and <
modifiers can also be used on ()
groups
to force a particular byte-order on all components in that group,
including all its subgroups.

The following rules apply:

Each letter may optionally be followed by a number indicating the repeat
count. A numeric repeat count may optionally be enclosed in brackets, as
in pack("C[80]",@arr)
. The repeat count gobbles that many values from
the LIST when used with all format types other than a
, A
, Z
, b
,
B
, h
, H
, @
, .
, x
, X
, and P
, where it means
something else, dscribed below. Supplying a *
for the repeat count
instead of a number means to use however many items are left, except for:

@
, x
, and X
, where it is equivalent to 0
.

<.>, where it means relative to the start of the string.

u
, where it is equivalent to 1 (or 45, which here is equivalent).

One can replace a numeric repeat count with a template letter enclosed in
brackets to use the packed byte length of the bracketed template for the
repeat count.

For example, the template x[L]
skips as many bytes as in a packed long,
and the template "$t X[$t] $t"
unpacks twice whatever $t (when
variable-expanded) unpacks. If the template in brackets contains alignment
commands (such as x![d]
), its packed length is calculated as if the
start of the template had the maximal possible alignment.

When used with Z
, a *
as the repeat count is guaranteed to add a
trailing null byte, so the resulting string is always one byte longer than
the byte length of the item itself.

When used with @
, the repeat count represents an offset from the start
of the innermost ()
group.

When used with .
, the repeat count determines the starting position to
calculate the value offset as follows:

If the repeat count is 0
, it's relative to the current position.

If the repeat count is *
, the offset is relative to the start of the
packed string.

And if it's an integer n, the offset is relative to the start of the
nth innermost ()
group, or to the start of the string if n is
bigger then the group level.

The repeat count for u
is interpreted as the maximal number of bytes
to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat
count should not be more than 65.

The a
, A
, and Z
types gobble just one value, but pack it as a
string of length count, padding with nulls or spaces as needed. When
unpacking, A
strips trailing whitespace and nulls, Z
strips everything
after the first null, and a
returns data without any sort of trimming.

If the value to pack is too long, the result is truncated. If it's too
long and an explicit count is provided, Z
packs only $count-1
bytes,
followed by a null byte. Thus Z
always packs a trailing null, except
for when the count is 0.

Likewise, the b
and B
formats pack a string that's that many bits long.
Each such format generates 1 bit of the result.

Each result bit is based on the least-significant bit of the corresponding
input character, i.e., on ord($char)%2. In particular, characters "0"
and "1"
generate bits 0 and 1, as do characters "\000"
and "\001"
.

Starting from the beginning of the input string, each 8-tuple
of characters is converted to 1 character of output. With format b
,
the first character of the 8-tuple determines the least-significant bit of a
character; with format B
, it determines the most-significant bit of
a character.

If the length of the input string is not evenly divisible by 8, the
remainder is packed as if the input string were padded by null characters
at the end. Similarly during unpacking, "extra" bits are ignored.

If the input string is longer than needed, remaining characters are ignored.

A *
for the repeat count uses all characters of the input field.
On unpacking, bits are converted to a string of "0"
s and "1"
s.

The h
and H
formats pack a string that many nybbles (4-bit groups,
representable as hexadecimal digits, "0".."9""a".."f"
) long.

For each such format, pack() generates 4 bits of the result.
With non-alphabetical characters, the result is based on the 4 least-significant
bits of the input character, i.e., on ord($char)%16. In particular,
characters "0"
and "1"
generate nybbles 0 and 1, as do bytes
"\000"
and "\001"
. For characters "a".."f"
and "A".."F"
, the result
is compatible with the usual hexadecimal digits, so that "a"
and
"A"
both generate the nybble 0xa==10
. Do not use any characters
but these with this format.

Starting from the beginning of the template to pack(), each pair
of characters is converted to 1 character of output. With format h
, the
first character of the pair determines the least-significant nybble of the
output character; with format H
, it determines the most-significant
nybble.

If the length of the input string is not even, it behaves as if padded by
a null character at the end. Similarly, "extra" nybbles are ignored during
unpacking.

If the input string is longer than needed, extra characters are ignored.

A *
for the repeat count uses all characters of the input field. For
unpack(), nybbles are converted to a string of hexadecimal digits.

The p
format packs a pointer to a null-terminated string. You are
responsible for ensuring that the string is not a temporary value, as that
could potentially get deallocated before you got around to using the packed
result. The P
format packs a pointer to a structure of the size indicated
by the length. A null pointer is created if the corresponding value for
p
or P
is undef; similarly with unpack(), where a null pointer
unpacks into undef.

If your system has a strange pointer size--meaning a pointer is neither as
big as an int nor as big as a long--it may not be possible to pack or
unpack pointers in big- or little-endian byte order. Attempting to do
so raises an exception.

The /
template character allows packing and unpacking of a sequence of
items where the packed structure contains a packed item count followed by
the packed items themselves. This is useful when the structure you're
unpacking has encoded the sizes or repeat counts for some of its fields
within the structure itself as separate fields.

For pack, you write length-item/sequence-item, and the
length-item describes how the length value is packed. Formats likely
to be of most use are integer-packing ones like n
for Java strings,
w
for ASN.1 or SNMP, and N
for Sun XDR.

For pack, sequence-item may have a repeat count, in which case
the minimum of that and the number of available items is used as the argument
for length-item. If it has no repeat count or uses a '*', the number
of available items is used.

For unpack, an internal stack of integer arguments unpacked so far is
used. You write /sequence-item and the repeat count is obtained by
popping off the last element from the stack. The sequence-item must not
have a repeat count.

If sequence-item refers to a string type ("A"
, "a"
, or "Z"
),
the length-item is the string length, not the number of strings. With
an explicit repeat count for pack, the packed string is adjusted to that
length. For example:

Supplying a count to the length-item format letter is only useful with
A
, a
, or Z
. Packing with a length-item of a
or Z
may
introduce "\000"
characters, which Perl does not regard as legal in
numeric strings.

The integer types s, S
, l
, and L
may be
followed by a !
modifier to specify native shorts or
longs. As shown in the example above, a bare l
means
exactly 32 bits, although the native long
as seen by the local C compiler
may be larger. This is mainly an issue on 64-bit platforms. You can
see whether using !
makes any difference this way:

$Config{longlongsize}
is undefined on systems without
long long support.

The integer formats s, S
, i
, I
, l
, L
, j
, and J
are
inherently non-portable between processors and operating systems because
they obey native byteorder and endianness. For example, a 4-byte integer
0x12345678 (305419896 decimal) would be ordered natively (arranged in and
handled by the CPU registers) into bytes as

0x120x340x560x78# big-endian

0x780x560x340x12# little-endian

Basically, Intel and VAX CPUs are little-endian, while everybody else,
including Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray, are
big-endian. Alpha and MIPS can be either: Digital/Compaq used/uses them in
little-endian mode, but SGI/Cray uses them in big-endian mode.

The names big-endian and little-endian are comic references to the
egg-eating habits of the little-endian Lilliputians and the big-endian
Blefuscudians from the classic Jonathan Swift satire, Gulliver's Travels.
This entered computer lingo via the paper "On Holy Wars and a Plea for
Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980.

Byteorders "1234"
and "12345678"
are little-endian; "4321"
and "87654321"
are big-endian.

For portably packed integers, either use the formats n
, N
, v
,
and V
or else use the >
and <
modifiers described
immediately below. See also perlport.

Starting with Perl 5.9.2, integer and floating-point formats, along with
the p
and P
formats and ()
groups, may all be followed by the
>
or <
endianness modifiers to respectively enforce big-
or little-endian byte-order. These modifiers are especially useful
given how n
, N
, v
and V
don't cover signed integers,
64-bit integers, or floating-point values.

Here are some concerns to keep in mind when using an endianness modifier:

Exchanging signed integers between different platforms works only
when all platforms store them in the same format. Most platforms store
signed integers in two's-complement notation, so usually this is not an issue.

The >
or <
modifiers can only be used on floating-point
formats on big- or little-endian machines. Otherwise, attempting to
use them raises an exception.

Forcing big- or little-endian byte-order on floating-point values for
data exchange can work only if all platforms use the same
binary representation such as IEEE floating-point. Even if all
platforms are using IEEE, there may still be subtle differences. Being able
to use >
or <
on floating-point values can be useful,
but also dangerous if you don't know exactly what you're doing.
It is not a general way to portably store floating-point values.

When using >
or <
on a ()
group, this affects
all types inside the group that accept byte-order modifiers,
including all subgroups. It is silently ignored for all other
types. You are not allowed to override the byte-order within a group
that already has a byte-order modifier suffix.

Real numbers (floats and doubles) are in native machine format only.
Due to the multiplicity of floating-point formats and the lack of a
standard "network" representation for them, no facility for interchange has been
made. This means that packed floating-point data written on one machine
may not be readable on another, even if both use IEEE floating-point
arithmetic (because the endianness of the memory representation is not part
of the IEEE spec). See also perlport.

If you know exactly what you're doing, you can use the >
or <
modifiers to force big- or little-endian byte-order on floating-point values.

Because Perl uses doubles (or long doubles, if configured) internally for
all numeric calculation, converting from double into float and thence
to double again loses precision, so unpack("f",pack("f",$foo)
)
will not in general equal $foo.

Pack and unpack can operate in two modes: character mode (C0
mode) where
the packed string is processed per character, and UTF-8 mode (U0
mode)
where the packed string is processed in its UTF-8-encoded Unicode form on
a byte-by-byte basis. Character mode is the default unless the format string
starts with U
. You can always switch mode mid-format with an explicit
C0
or U0
in the format. This mode remains in effect until the next
mode change, or until the end of the ()
group it (directly) applies to.

You must yourself do any alignment or padding by inserting, for example,
enough "x"
es while packing. There is no way for pack() and unpack()
to know where characters are going to or coming from, so they
handle their output and input as flat sequences of characters.

A ()
group is a sub-TEMPLATE enclosed in parentheses. A group may
take a repeat count either as postfix, or for unpack(), also via the /
template character. Within each repetition of a group, positioning with
@
starts over at 0. Therefore, the result of

x
and X
accept the !
modifier to act as alignment commands: they
jump forward or back to the closest position aligned at a multiple of count
characters. For example, to pack() or unpack() a C structure like

struct{

charc;/* one signed, 8-bit character */

doubled;

charcc[2];

}

one may need to use the template cx![d]dc[2]
. This assumes that
doubles must be aligned to the size of double.

For alignment commands, a count
of 0 is equivalent to a count
of 1;
both are no-ops.

n
, N
, v
and V
accept the !
modifier to
represent signed 16-/32-bit integers in big-/little-endian order.
This is portable only when all platforms sharing packed data use the
same binary representation for signed integers; for example, when all
platforms use two's-complement representation.

Comments can be embedded in a TEMPLATE using #
through the end of line.
White space can separate pack codes from each other, but modifiers and
repeat counts must follow immediately. Breaking complex templates into
individual line-by-line components, suitably annotated, can do as much to
improve legibility and maintainability of pack/unpack formats as /x
can
for complicated pattern matches.