NAME

DESCRIPTION

The section of the FAQ answers question related to the manipulation of data as numbers, dates, strings, arrays, hashes, and miscellaneous data issues.

Data: Numbers

Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?

Internally, your computer represents floating-point numbers in binary. Floating-point numbers read in from a file, or appearing as literals in your program, are converted from their decimal floating-point representation (eg, 19.95) to the internal binary representation.

However, 19.95 can't be precisely represented as a binary floating-point number, just like 1/3 can't be exactly represented as a decimal floating-point number. The computer's binary representation of 19.95, therefore, isn't exactly 19.95.

When a floating-point number gets printed, the binary floating-point representation is converted back to decimal. These decimal numbers are displayed in either the format you specify with printf(), or the current output format for numbers (see "$#" in perlvar if you use print. $# has a different default value in Perl5 than it did in Perl4. Changing $# yourself is deprecated.

This affects all computer languages that represent decimal floating-point numbers in binary, not just Perl. Perl provides arbitrary-precision decimal numbers with the Math::BigFloat module (part of the standard Perl distribution), but mathematical operations are consequently slower.

To get rid of the superfluous digits, just use a format (eg, printf("%.2f", 19.95)) to get the required precision.

Why isn't my octal data interpreted correctly?

Perl only understands octal and hex numbers as such when they occur as literals in your program. If they are read in from somewhere and assigned, no automatic conversion takes place. You must explicitly use oct() or hex() if you want the values converted. oct() interprets both hex ("0x350") numbers and octal ones ("0350" or even without the leading "0", like "377"), while hex() only converts hexadecimal ones, with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".

This problem shows up most often when people try using chmod(), mkdir(), umask(), or sysopen(), which all want permissions in octal.

Does perl have a round function? What about ceil() and floor()? Trig functions?

For rounding to a certain number of digits, sprintf() or printf() is usually the easiest route.

The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of other mathematical and trigonometric functions.

In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig module (part of the standard perl distribution) implements the trigonometric functions. Internally it uses the Math::Complex module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2.

Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.

How do I convert bits into ints?

To turn a string of 1s and 0s like '10110110' into a scalar containing its binary value, use the pack() function (documented in "pack" in perlfunc):

$decimal = pack('B8', '10110110');

Here's an example of going the other way:

$binary_string = join('', unpack('B*', "\x29"));

How do I multiply matrices?

Use the Math::Matrix or Math::MatrixReal modules (available from CPAN) or the PDL extension (also available from CPAN).

How do I perform an operation on a series of integers?

To call a function on each element in an array, and collect the results, use:

@results = map { my_func($_) } @array;

For example:

@triple = map { 3 * $_ } @single;

To call a function on each element of an array, but ignore the results:

foreach $iterator (@array) {
&my_func($iterator);
}

To call a function on each integer in a (small) range, you can use:

@results = map { &my_func($_) } (5 .. 25);

but you should be aware that the .. operator creates an array of all integers in the range. This can take a lot of memory for large ranges. Instead use:

How can I output Roman numerals?

Get the http://www.perl.com/CPAN/modules/by-module/Roman module.

Why aren't my random numbers random?

The short explanation is that you're getting pseudorandom numbers, not random ones, because that's how these things work. A longer explanation is available on http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom Phoenix.

How can I compare two date strings?

Use the Date::Manip or Date::DateCalc modules from CPAN.

How can I take a string and turn it into epoch seconds?

If it's a regular enough string that it always has the same format, you can split it up and pass the parts to timelocal in the standard Time::Local module. Otherwise, you should look into one of the Date modules from CPAN.

How can I find the Julian Day?

Neither Date::Manip nor Date::DateCalc deal with Julian days. Instead, there is an example of Julian date calculation in http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz, which should help.

Does Perl have a year 2000 problem?

Not unless you use Perl to create one. The date and time functions supplied with perl (gmtime and localtime) supply adequate information to determine the year well beyond 2000 (2038 is when trouble strikes). The year returned by these functions when used in an array context is the year minus 1900. For years between 1910 and 1999 this happens to be a 2-digit decimal number. To avoid the year 2000 problem simply do not treat the year as a 2-digit number. It isn't.

When gmtime() and localtime() are used in a scalar context they return a timestamp string that contains a fully-expanded year. For example, $timestamp = gmtime(1005613200) sets $timestamp to "Tue Nov 13 01:00:00 2001". There's no year 2000 problem here.

Data: Strings

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

How do I unescape a string?

It depends just what you mean by "escape". URL escapes are dealt with in perlfaq9. Shell escapes with the backslash (\) character are removed with:

s/\\(.)/$1/g;

Note that this won't expand \n or \t or any other special escapes.

How do I remove consecutive pairs of characters?

To turn "abbcccd" into "abccd":

s/(.)\1/$1/g;

How do I expand function calls in a string?

This is documented in perlref. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

print "My sub returned @{[mysub(1,2,3)]} that time.\n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

print "That yields ${\($n + 5)} widgets\n";

See also "How can I expand variables in text strings?" in this section of the FAQ.

How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers:

To force each word to be lower case, with the first letter upper case:

$line =~ s/(\w+)/\u\L$1/g;

How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated into its different fields. (We'll pretend you said comma-separated, not comma-delimited, which is different and almost never what you mean.) You can't use split(/,/) because you shouldn't split if the comma is inside quotes. For example, take a data line like this:

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):

You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a number, such as the magical ++ autoincrement operator or the syscall() function.

Why don't my <<HERE documents work?

Check for these three things:

1. There must be no space after the << part.

2. There (probably) should be a semicolon at the end.

3. You can't (easily) have any space in front of the tag.

Data: Arrays

What is the difference between $array[1] and @array[1]?

The former is a scalar value, the latter an array slice, which makes it a list with one (scalar) value. You should use $ when you want a scalar value (most of the time) and @ when you want a list with one scalar value in it (very, very rarely; nearly never, in fact).

Sometimes it doesn't make a difference, but sometimes it does. For example, compare:

$good[0] = `some program that outputs several lines`;

with

@bad[0] = `same program that outputs several lines`;

The -w flag will warn you about these matters.

How can I extract just the unique elements of an array?

There are several possible ways, depending on whether the array is ordered and whether you wish to preserve the ordering.

a) If @in is sorted, and you want @out to be sorted:

$prev = 'nonesuch';
@out = grep($_ ne $prev && ($prev = $_), @in);

This is nice in that it doesn't use much extra memory, simulating uniq(1)'s behavior of removing only adjacent duplicates.

How can I tell whether an array contains a certain element?

There are several ways to approach this. If you are going to make this query many times and the values are arbitrary strings, the fastest way is probably to invert the original array and keep an associative array lying about whose keys are the first array's values.

How do I find the first array element for which a condition is true?

How do I handle linked lists?

In general, you usually don't need a linked list in Perl, since with regular arrays, you can push and pop or shift and unshift at either end, or you can use splice to add and/or remove arbitrary number of elements at arbitrary points.

If you really, really wanted, you could use structures as described in perldsc or perltoot and do just what the algorithm book tells you to do.

How do I handle circular lists?

Circular lists could be handled in the traditional fashion with linked lists, or you could just do something like this with an array:

unshift(@array, pop(@array)); # the last shall be first
push(@array, shift(@array)); # and vice versa

How do I shuffle an array randomly?

Here's a shuffling algorithm which works its way through the list, randomly picking another element to swap the current element with:

How do I sort an array by (anything)?

The default sort function is cmp, string comparison, which would sort (1, 2, 10) into (1, 10, 2). <=>, used above, is the numerical comparison operator.

If you have a complicated function needed to pull out the part you want to sort on, then don't do it inside the sort function. Pull it out first, because the sort BLOCK can be called many times for the same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively.

Here we'll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that fails, by straight ASCII comparison of the keys (well, possibly modified by your locale -- see perllocale).

How can I always keep my hash sorted?

What's the difference between "delete" and "undef" with hashes?

Hashes are pairs of scalars: the first is the key, the second is the value. The key will be coerced to a string, although the value can be any kind of scalar: string, number, or reference. If a key $key is present in the array, exists($key) will return true. The value for a given key can be undef, in which case $array{$key} will be undef while $exists{$key} will return true. This corresponds to ($key, undef) being in the hash.

Why don't my tied hashes make the defined/exists distinction?

They may or may not implement the EXISTS() and DEFINED() methods differently. For example, there isn't the concept of undef with hashes that are tied to DBM* files. This means the true/false tables above will give different results when used on such a hash. It also means that exists and defined do the same thing with a DBM* file, and what they end up doing is not what they do with ordinary hashes.

How do I reset an each() operation part-way through?

Using keys %hash in a scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if you use last to exit a loop early so that when you re-enter it, the hash iterator has been reset.

How can I get the unique keys from two hashes?

First you extract the keys from the hashes into arrays, and then solve the uniquifying the array problem described above. For example:

How can I make my hash remember the order I put elements into it?

Why does passing a subroutine an undefined element in a hash create it?

If you say something like:

somefunc($hash{"nonesuch key here"});

Then that element "autovivifies"; that is, it springs into existence whether you store something there or not. That's because functions get scalars passed in by reference. If somefunc() modifies $_[0], it has to be ready to write it back into the caller's version.

This has been fixed as of perl5.004.

Normally, merely accessing a key's value for a nonexistent key does not cause that key to be forever there. This is different than awk's behavior.

How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?

Use references (documented in perlref). Examples of complex data structures are given in perldsc and perllol. Examples of structures and object-oriented classes are in perltoot.

How can I use a reference as a hash key?

You can't do this directly, but you could use the standard Tie::Refhash module distributed with perl.

Data: Misc

How do I handle binary data correctly?

Perl is binary clean, so this shouldn't be a problem. For example, this works fine (assuming the files are found):

Or you could check out http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz instead. The POSIX module (part of the standard Perl distribution) provides the strtol and strtod for converting strings to double and longs, respectively.

How do I keep persistent data across program calls?

For some specific applications, you can use one of the DBM modules. See AnyDBM_File. More generically, you should consult the FreezeThaw, Storable, or Class::Eroot modules from CPAN.

How do I print out or copy a recursive data structure?

The Data::Dumper module on CPAN is nice for printing out data structures, and FreezeThaw for copying them. For example:

use FreezeThaw qw(freeze thaw);
$new = thaw freeze $old;

Where $old can be (a reference to) any kind of data structure you'd like. It will be deeply copied.