hoelzrotag:blogs.perl.org,2009-11-03:/users/hoelzro//8502014-06-19T03:56:07ZA blog about the Perl programming languageMovable Type Pro 4.38System Sleuthing - Using ptrace to Print Backtracestag:blogs.perl.org,2014:/users/hoelzro//850.61402014-06-19T03:55:06Z2014-06-19T03:56:07ZI recently developed a Perl module that allows you to print backtraces when your program makes certain system calls; see my blog for the full story!...Rob Hoelzhttp://hoelz.ro
I recently developed a Perl module that allows you to print backtraces when your program makes certain system calls; see my blog for the full story!]]>
Oh My Globtag:blogs.perl.org,2013:/users/hoelzro//850.53612013-11-18T22:21:01Z2013-11-18T22:22:36ZHello fellow Perl developers, I wrote up a blog entry about typeglobs on my personal blog: what they are, and what they are used for. If you've never heard of typeglobs, or you feel you could stand to pick up...Rob Hoelzhttp://hoelz.ro
Hello fellow Perl developers,

I wrote up a blog entry about typeglobs on my personal blog: what they are, and what they are used for. If you've never heard of typeglobs, or you feel you could stand to pick up a trick or two, check out the post:

]]>
Perl 5 Internals - Part Fourtag:blogs.perl.org,2013:/users/hoelzro//850.52112013-10-07T20:04:00Z2013-10-07T20:06:59Z The first three installments of this series covered Perl's core data types: scalars, arrays, and hashes. This final installment will cover something a bit different: the optree. Those of you who are familiar with compiler concepts are no doubt...Rob Hoelzhttp://hoelz.ro
The first three installments of this series covered Perl's core data types: scalars, arrays, and hashes. This final
installment will cover something a bit different: the optree. Those of you who are familiar with compiler concepts
are no doubt familiar with the notion of an abstract syntax tree (known as an AST for short). The optree is perl's
take on the AST: it's something similar to, but not entirely the same as, an AST. Before we begin looking at the
optree, I recommend reviewing the “Subroutines” and “Compiled code” sections of perlguts, as well as
looking at perlcall. It is by no means required, but it might make digesting this content a little
easier.

]]>
Perl 5 Internals - Part Threetag:blogs.perl.org,2013:/users/hoelzro//850.51042013-09-11T19:27:08Z2013-09-11T19:28:56ZThis is a cross post from my blog. In the previous post, we talked about some of the optimizations that perl performs when conducting string and array operations. This time, we'll be diving into how perl implements hashes. But first,...Rob Hoelzhttp://hoelz.roThis is a cross post from my blog.

In the previous post, we talked about some of the optimizations
that perl performs when conducting string and array operations. This time, we'll be diving into
how perl implements hashes. But first, a brief clarification…

"Perl Never Downgrades"

In the first two posts, I made a point of mentioning that “Perl never downgrades”; that is, once a
scalar has been upgraded to a more inclusive type (ex. from an IV to a PV), it won't go
back again. This concept is also reflected in Perl's usage of memory; once it's allocated memory
to an XV, it won't shrink until that XV is freed up. I was chatting with Yves Orton about these
blog posts, and he mentioned that there is (at least) one scenario in which perl will free this
memory: if you invoke undef as a function on a scalar, rather than assigning undef to
said scalar. For example:

Note how $s retained a reference to its character buffer, but it doesn't have any of the OK
flags set, whereas $t has freed up its character buffer. Similar behavior occurs if you do
undef @array or undef %hash; I encourage you to try it out!

And now, back to our scheduled programming…</awful-pun>

Hashes

Hashing and Splitting

Perl implements hashtables in a fairly typical way; a sparsely populated array of linked lists (AKA
hash buckets). A hashtable implementation runs its keys through a hash function to generate an
index, and usually applies a modulus operation to generate the final index into the bucket array.
Perl has a clever microoptimization here: it always makes sure to allocate a bucket array that has a
size equal to a power of two. Then, instead of using a modulus operation, perl can simply conduct
a bitwise AND operation to discard the high bits. The modulus operation is by no means expensive,
but a bitwise AND takes far fewer cycles.

When the load factor (the number of key/value pairs versus the size of the sparse array) gets too
high, the number of collisions gets to be too high for the hashtable to be fully effective, so the
hashtable is usually grown and the entries are reinserted. This is known internally as
splitting, considering the routine responsible is named hv_split. Perl has a neat optimization for this as well: since it
uses a bitwise AND to find the final array index, it only needs to consider one extra bit of the
hash value when splitting the table.

For example, let's say I have a hash whose buckets array is 8 entries long, and two keys whose hash
values are 0b11111111 and 0b11110111. If we apply a bitwise AND against 0b00000111 (8 -
1), we get the bucket index for these keys, which happens to be 7 for both of our keys. However,
when we increase the bucket array size to 16 (the next power of two), we now bitwise AND the keys
with 16 - 1, or 0b00001111, to get the bucket size, which results in 15 for the first key, and 7
for the second. The second key doesn't change positions! Therefore, when growing the bucket array
from size 2n to 2n+1, we only need to reorganize entries for which bit
n + 1 is set.

If you happen to know how many hash keys you will need (or if you at least have a good idea), you can
tell Perl to preallocate hash buckets by using keys(%hash) as an l-value:

keys(%hash)=$num_expected_keys;

This can save your program from having to repeatedly splitting a hashtable that you're inserting a
lot of data into.

Shared Keys

Consider the following program:

my@hashes;for(1..1_000_000){push@hashes,{ value =>$_};}

This seems like it would take up a lot of space: not only are we creating one million hashes; we're
also creating one million instances of the string “value” and one million numbers, right?

In some languages, you can do what's known as interning a string; this means for each individual
string, there is exactly one copy of that string in memory. Lua does this by default (which makes
sense, considering how heavily it relies on table lookups based on string keys), and Ruby offers
the capability to do this with its Symbol type. Fortunately, Perl benefits from a similar trick.
All strings that are used as hash keys are actually interned into a private hash table (accessible
in XS via the PL_strtab variable); this allows any program using a lot of common hash keys to fit
into a much smaller space in memory.

Next Time

Well, that about covers the three primary datatypes in Perl! Join me next time for an overview of
the optree!

]]>
Perl 5 Internals - Part Twotag:blogs.perl.org,2013:/users/hoelzro//850.50882013-09-05T17:51:06Z2013-09-05T17:55:23ZThis is a cross post from my blog. Welcome back for another exciting episode on the inner workings of the perl interpreter! Last time we covered some of the basic optimizations perl performs on SVs, as well as the consequences...Rob Hoelzhttp://hoelz.roThis is a cross post from my blog.

Welcome back for another exciting episode on the inner workings of
the perl interpreter! Last time we
covered some of the basic optimizations perl performs on SVs, as well as
the consequences of those optimizations. This time, we'll be going over
some of the optimizations specific to strings and arrays. I know I said
we'd be covering hashes too, but this article is already quite lengthy,
and I have enough material on hashes to merit its own article, so look for that
information in the upcoming third part of this series!

Strings

Since Perl was designed to perform operations on bodies on text, you can imagine that it has some
clever optimizations for doing so. One set of optimizations is that trimming from either end of a
string is cheap.

You can easily imagine how this might work from the end of the string; simply place a NUL character
after the new end of the string, and update the length field. Many string implementations do
this, and it should come as no surprise that perl does this as well. Similar to the “never
downgrade” policy that I mentioned in the previous entry, a character buffer in perl is never
shrunk, because perl figures you'll need it again, or at least that you'll be throwing away that
value soon enough anyway.

Perl optimizes removing characters from the beginning of a string as well. It accomplishes this by
setting the OOK flag on the SV (which means “offset ok”), updating the buffer pointer to
point offset bytes past the start of the buffer, and setting the SV's OFFSET field the
offset (in newer perls, this is placed in the string buffer itself). For example:

If you're constantly pushing new elements onto an array, perl will realloc() new arrays as you
go. This is fairly common in the implementation of dynamically sized arrays. If you know (or at
least have a good idea) about how many elements your array will contain, you can use $#array
as an lvalue to tell Perl to preallocate space for those elements:

$#array=$expected_size-1;# - 1 because $#array is the last index, not the size

This fills the array with undef values, though, so you'll have to calculate the insert index by
hand if you do this.

(I'm not sure what the MAGIC is here for in this case; that might merit some future research!)

Next Time

The next article will discuss how perl implements hashtables, along with some of the optimizations
and security implications that come along with that implementation.

]]>
Perl 5 Internals - Part Onetag:blogs.perl.org,2013:/users/hoelzro//850.50642013-08-28T18:50:22Z2013-08-28T20:06:06ZThis is a cross post from my blog. Last week I completed a two-part training at work on perl's internals, led by Yves Orton and Steffen Mueller. We covered some of data structures used by the interpreter, as well as...Rob Hoelzhttp://hoelz.roThis is a cross post from my blog.

Last week I completed a two-part training at work on perl's internals,
led by Yves Orton and Steffen Mueller. We covered some of data structures used by the interpreter,
as well as some of the optimizations it uses and the consequences of how those optimizations are
implemented. Someone on Twitter asked me if there were any slides
available, and unfortunately, the talk was conducted in a very ad-hoc fashion (which actually probably contributed to
its success, as the audience was able to propose new areas to cover). However,
I did end up taking some notes, so I'm going to summarize those here along with what I
remember. A lot was covered in the two sessions, so in the interest of actually getting
this information out (I tend not to publish things that I can't write in one sitting),
as well as keeping your attention, I'll be publishing them in a series of posts.

Required Background

I try not to reinvent the wheel when programming, and that's an approach that applies to documentation as well.
Therefore, I'll assume you have read perlguts. If you prefer a
more visual overview, Reini Urban's illguts is a quite popular alternative.

Inspecting Perl's Guts

A mature language like Perl has accrued quite a number of tools for analyzing itself at runtime; but I'll
mention just two here. Devel::Peek is useful for inspecting the internal structures that make
up a Perl value at runtime; especially its reference count. B::Concise is similarly useful for
inspecting the optree (which is Perl's version of a sort-of AST). One of the most crucial things that
I took away from this talk was a lot can be learned by simply playing with these two modules. Learning to use
these modules effectively, both in debugging and playing around, is crucial to one's evolution as a Perl programmer.

A Word on Terminology

I don't believe that this is mentioned in perlguts, but an SV has two parts: a head and a body. The head
exists for all SVs; it contains bookkeeping information like the reference count and SV flags. The body
exists for most SVs; certain SVs may omit it for optimization purposes. Which brings us to our first
optimization of the evening…

SVs without bodies

SV objects representing the undef value don't have bodies; they don't need them. A more interesting
optimization is what Perl does for integer values; the body pointer points to a location in memory just
before the SV head, so that the value portion of the fake body lines up with the location of the actual value
in the SV itself. I'm sure I'm not doing the technique justice; suffice to say it's very cool! An important consequence
of this is that the following two hashes:

The SV body (which was an IV) is transparently upgraded to a PVIV. Which means now it contains a reference to
a character buffer representing the integer value. Perl keeps this around, and will never downgrade the SV
to an IV again. It does this because if you use a value as a string once, you'll probably do it again, and
Perl tends to favor greater memory usage over recalculation. This may not seem to be a big deal (after all,
what's one scalar?), but imagine you have a large array of numbers and you print them to inspect them. The
recommended technique in this case is to copy the list of scalars you intend to print and throw it away after
you're done.

Intermission

I hope you've enjoyed wading into the waters of the Perl interpreter and its optimizations; stay tuned for Part 2, where I'll be talking
about some of the optimizations that Perl performs when manipulating strings, arrays, and hashes.

]]>
Still alive, Inline::Lua?tag:blogs.perl.org,2012:/users/hoelzro//850.29972012-04-16T14:00:00Z2012-04-16T13:02:45ZI was reading about the Inline module the other day, so naturally I looked to see if there was a binding for one of my other favorite languages, Lua. Sure enough, Inline::Lua exists; however, it has not seen a new...Rob Hoelzhttp://hoelz.ro
I was reading about the Inline module the other day, so naturally I looked to see if there was a binding for one of my other favorite languages, Lua. Sure enough, Inline::Lua exists; however, it has not seen a new release in nearly five years, and it doesn't even build on perls newer than 5.10. I like this idea enough that I'd like to put some time into it and cut a new release; however, I haven't been able to reach the author. So, in accordance with the guidelines I read in the CPAN FAQ, I'm making a post asking if the original author, Tassilo von Parseval, is still in the Perl community and interested in updating this module.

-Rob

]]>
Perl Podcaststag:blogs.perl.org,2012:/users/hoelzro//850.28062012-02-13T15:13:59Z2012-02-13T15:15:19ZI had to make a last-minute drive to Chicago last Friday, which is about a two-and-a-half hour trip from Madison. So naturally, to pass the time, I loaded up my phone with podcasts. When I was about halfway home, it...Rob Hoelzhttp://hoelz.ro
I had to make a last-minute drive to Chicago last Friday, which is about a two-and-a-half hour trip from Madison. So naturally, to pass the time, I loaded up my phone with podcasts. When I was about halfway home, it hit me: I didn't have any Perl podcasts on my phone! Unfortunately, I don't know of Perl podcasts; I've seen Perlcast, but it looks like it hasn't been updated in a year and a half. So, this is a call to the Perl community: are there any Perl podcasts out there?]]>
Asynchronous MySQL Queries in Perl Using DBD::mysql and AnyEventtag:blogs.perl.org,2011:/users/hoelzro//850.23132011-10-19T16:40:46Z2011-10-19T16:46:30ZA lot of people use MySQL, and these days, asynchronous-style programming has really taken off. If you're involved in both of these camps, you may be wondering how to send a query to MySQL and have it inform your event...Rob Hoelzhttp://hoelz.ro
A lot of people use MySQL, and these days, asynchronous-style programming has really taken off. If you're involved in both of these camps, you may be wondering
how to send a query to MySQL and have it inform your event loop when it's ready with the results of that query. A common solution is to use a thread or child process
for each connection, and exchange data using IPC. However, if you're using Perl and DBD-mysql 4.019 or better, you have an alternative: the new asynchronous interface.

Using the new async flag that you can provide to the prepare method, along with the new mysql_fd method, it's fairly easy to have MySQL play nice with AnyEvent.
Here's a simple example:

This script will print "timer fired!" about once a seconds for ten seconds, then "got data from MySQL", and finally "0 3", which is the data from our SELECT statement. Obviously, this example is pretty trivial, but you could easily do this with multiple MySQL connections.

]]>
Edit-Plackup-Test-Rinse-Repeattag:blogs.perl.org,2011:/users/hoelzro//850.23032011-10-17T13:49:54Z2011-10-17T14:05:55ZAt INOC, my place of work, I work on a lot of web applications with the backend written in Perl using Catalyst, and the frontend written in Javascript using ExtJS. With a UI written completely in Javascript, I often encounter...Rob Hoelzhttp://hoelz.ro
At INOC, my place of work, I work on a lot of web applications with the backend written in Perl using Catalyst, and the frontend written in Javascript using ExtJS. With a UI written completely in Javascript, I often encounter bugs of the following form:

Fire up Catalyst.

Login.

Click through half a dozen controls in the UI.

Enter some data.

Click “submit”.

Watch the web application give you an angry error.

As you can imagine, the time for a single iteration of this cycle is fairly long, and the process is quite tedious. Obviously, if the error lies within the Javascript side, there's not much I can do about it, short of writing a Greasemonkey script to do some of the automation for me. However, half of the time, the server is returning some strange output given a certain set of inputs for a particular RPC call. Wouldn't it be nice if you could go through the application and have it record the requests you make to be submitted over and over again at a later time?

Plack-Middleware-Recorder is a distribution that comes with several modules:

Plack::Middleware::Recorder

A PSGI middleware that knows how to serialize requests to a stream.

Plack::Middleware::Debug::Recorder

A debugging panel that allows you to manipulate the recorder middleware from a browser.

Plack::VCR

A utility module that allows you to read a recorded request stream.

These modules allow you to build PSGI applications and scripts that record and replay requests to a web application. However, Plack-Middleware-Recorder also contains two scripts, plack-record and plack-replay, that do exactly what they sound like. So my workflow for handling a server-side bug goes from this:

Steps 1 - 6 above

Add some debugging log output

Repeat, and observe the new output

to this:

plack-record > requests.out

Steps 1 - 6

Add some debugging log output

plack-replay requests.out app.psgi

GOTO 3

Plack-Middleware-Recorder is still very young; I plan on adding better session support, dumping request streams to test files, and other features in the future. In the week since I've written it, I've already gotten a lot of mileage out of it; I hope other people find it just as useful!