Not this... » perlListen. Reflect. Explore. Solve.2015-08-02T22:22:36Zhttp://blog.timbunce.org/feed/atom/WordPress.comTimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=6002014-03-23T14:34:59Z2014-03-23T14:20:11ZContinue reading →]]>For some time now Jens Rehsack (‎Sno‎), H.Merijn Brand (‎Tux‎) and I have been working on bootstrapping a large project to provide a common test suite for the DBI that can be reused by drivers to test their conformance to the DBI specification.

This post isn’t about that. This post is about two spin-off modules that might seem unrelated: Data::Tumbler and Test::WriteVariants, and the Perl QA Hackathon that saw them released.

This was my first year attending a Perl QA Hackathon. An annual event where key developers get together to discuss and develop the code, services, and standards at the core of the Perl ecosystem.

See the Results and Blogs pages to get a sense of the important work that gets done at these events and the in weeks that follow. What’s less clear but just as important are the personal connections made and renewed here.

My focus at the hackathon was on pushing the DBI Test project forward with Sno and Tux. Getting Data::Tumbler and Test::WriteVariants polished up and released was a key part of that. We also had valuable discussions with BooK about useful enhancements to Test::Database.

So, what are Data::Tumbler and Test::WriteVariants? To explain that I’ll start 10 years ago…

The DBI distribution includes DBI::PurePerl, a fairly-complete implementation of DBI in pure-perl, and DBD::Gofer, a fairly-transparent proxy.

Both these modules need testing, and both should behave very much like using the normal DBI. The best way to test that was to re-run the DBI tests while using DBI::PurePerl, re-run them again using DBD::Gofer, and re-run them again using DBI::PurePerl and DBD::Gofer at the same time. So, since 2004, that’s what the DBI does.

When you run Makefile.PL in the DBI distribution it looks at the 44 test files and generates 141 new test files with various combinations of contexts. These generated test files look something like this:

They setup a ‘context’ and then execute the original test. In this case the context is DBD::Gofer + DBI::PurePerl.

This arrangement has proved to be extremely effective. I’ve frequently made a change to the DBI and forgotten to make corresponding changes to DBD::Gofer and/or DBI::PurePerl, only to be forcefully reminded by the tests which worked for plain-DBI failing noisily when run in the extra test contexts.

It was clear that something like this was needed for the DBI Test project. We wanted to generate test variants not only for DBI::PurePerl and DBD::Gofer but also each available database driver. Each driver might also want to add test variants of their own. (DBD::DBM, for example, supports a number of DBM backends and serialization formats that all need testing in combination).

After lots of experimentation and refactoring the relevant logic was extracted out into the Data::Tumbler and Test::WriteVariants modules, generalised, polished up and released during the hackathon.

For some reason I struggle when trying to explain what Data::Tumbler is or does. The summary in the documentation says “Dynamic generation of nested combinations of variants”, which is a bit of a mouthful.

It’s basically a single simple subroutine that recurses into itself driven by the results of calling provider callbacks. As it recurses it builds up a path and a context from the keys and values returned by the providers.

The provider callbacks are passed the current path and context plus a cloned copy of a payload which they can edit. Because it’s cloned, any changes made to the payload will only be visible to any later providers and the consumer.

The recursion bottoms-out when there are no more providers. At this point a consumer callback is called with the current path, context, and payload.

That’s an abstract description, which is fitting as it’s an abstract algorithm. I hope it’s reasonably clear. There are a couple of examples in the documentation synopsis. Currently Test::WriteVariants, described next, is the only use-case. I’d love to find some more, if only to help improve the documentation. Let me know if you can think of any!

Test::WriteVariants directly addresses the use-case of writing a tree of perl .../*.t test files, each setting up various combinations of context values before invoking the test code.

Hopefully you can see where Data::Tumbler fits in: the payload is a hash of tests for which you’d like extra variant tests written; the providers define variants of the contexts in which you’d like the tests executed, typically by setting environment variables. The consumer writes a new *.t file for each element in the payload hash, using the path to build a directory tree, and using the context to set environment variables, etc., in each test file written.

The providers can also remove tests from the payload that aren’t relevant in a given context, or add more that are only relevant to a given context.

Test::WriteVariants allows providers to be specified not just as code references but also as namespaces. In this case it uses Module::Pluggable::Object to find installed plugins within that namespace and wraps them in a code reference for Data::Tumbler. This allows extra test variants to be added by installing other modules.

Although Test::WriteVariants is new, and still evolving quite fast, it’s already proving very useful. Jens is experimenting with using it for improving the testing of List::MoreUtils, especially covering both the XS and pure-perl variants.

I hope you can see uses for Test::WriteVariants in improving the testing of your own modules. If so, please do try it out and let me know how it work out for you and if there’s anything that needs improving.

Happy testing!

Filed under: perl Tagged: perl, testing]]>0TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=5572013-04-10T16:48:43Z2013-04-08T22:27:32ZContinue reading →]]>As soon as I saw a Flame Graph visualization I knew it would make a great addition to NYTProf. So I’m delighted that the new Devel::NYTProf version 5.00, just released, has a Flame Graph as the main feature of the index page.

In this post I’ll explain the Flame Graph visualization, the new ‘subroutine calls event stream’ that makes the Flame Graph possible, and other recent changes, including improved precision in the subroutine profiler.

Precision

Let’s start with the improved precision. That work was actually released a few months ago in Devel::NYTProf 4.23 but not announced.

Devel::NYTProf started life as a line/statement profiler, writing a stream of events, one per statement. It’s important for speed that the stream is space efficient, so statement times were expressed as integer microseconds (a ‘tick’) and written in a compressed form. Values less than 128µs use a single byte. This worked very well for v1. Back in early 2008 minimum statement times were typically just a few microseconds.

When I added the subroutine profiler I chose to use double precision floating point values to hold the subroutine call times with seconds as the units. I presume that seemed reasonable at the time as microseconds (multiples of 1e-6) can be accurately stored double precision floating point values and are significantly above the typical machine epsilon of 2.220446e-16.

I’d assumed the values weren’t at risk from the pernicious effect of cumulative round-off errors. The situation got worse with NYTProf v2 because that switched the clock ‘tick’ from 1µs to 100ns on some systems (those with POSIX realtime clock API and OS X). And then worse again when profiling of ‘slowops’ was added in NYTProf v3 since slowops are often far from slow.

The way the subroutine profiler works, calculating inclusive and exclusive times as it goes, makes it sensitive to these accumulated errors. (Sometimes a subroutine that did nothing but call a very fast subroutine many times could be reported as having taken less time than the sum of the times in the subroutine it called.)

The subroutine profiler still uses double precision floating point values to accumulate the times, but now accumulates integer ticks instead of fractional seconds.

(The $t=0.0 and $i=3.0 ensure perl is using floating point values in that example. I checked it with Devel::Peek.)

Subroutine Call Events

There’s one thing the old and deeply flawed Devel::DProf profiler can do that NYTProf hasn’t been able to: the DProf dprofpp utility can generate a subroutine call tree.

NYTProf hasn’t been able to do that because its subroutine profiler worked entirely in memory, accumulating aggregate data about each call arc, but not outputting anything until the end of the profile. So all the calls on any given arc are merged together.

NYTProf v5 adds a new calls option that enables streaming of subroutine call events as they happen. With calls=2 subroutine call and return events are generated. With calls=1 (the default) only subroutine return events are generated. (A curious side effect of perl internals and the way NYTProf works means it can’t reliably know the name of the subroutine at call entry time. So the call entry event isn’t very useful at the moment.)

The call return events are sufficient to recreate a call tree, albeit with some expensive massaging of the data. NYTProf does this with the new nytprofcalls utility which reads and processes the stream of call return events. At the moment it’s undocumented, rather hackish, and only generates the call data in a collapsed form suitable for generating a flamegraph (more below). It could be extended to produce a call tree without too much work. Then, finally, the ghost of Devel::DProf can be laid to rest.

Flame Graph

Brendan Gregg developed the Flame Graph as a way to visualize very large volumes of stack traces sampled by DTrace.

It’s a wonderfully compact and information-rich way to visualize the where a program is spending its time. It’s also unusual and potentially confusing, so a little explanation is required. Keep in mind that it’s a visualization of distinct call stacks and that the colors are not meaningful.

The y-axis represents stack depth. Each box represents the spent time in a particular subroutine when called by the subroutine below it. So a particular subroutine will appear in multiple places if called via different call stacks.

The x-axis spans the time the profiler was running. It does not show the passing of time from left to right, as most graphs do. The left to right ordering has no meaning (it’s sorted alphabetically).

The width of the box shows the inclusive time the subroutine was running, or part of the ancestry of subroutines that were running (the boxes above it). Wider box functions may be slower than narrow box functions, or they may simply be called more often. The call count is not shown.

Brendan’s original flamegraph script generated an SVG that wasn’t well suited to embedding in an application like NYTProf. He’s kindly accepted a series of pull requests to add the key features I was looking for. The most important being the ability to make the boxes clickable: click on a box and you’ll be taken to the report for that subroutine!

Let’s take a closer look at a simple example using a recursive Fibonacci function:

The line at the bottom that spans the full width represents the entire profile run. In this case it was 778µs. (Hover over any block to see the time – you can see one in the image, along with the bold and bordered box it relates to).

The first line above that shows the calls to foo and bar. The line for those is shorter than the total line because the total includes the time perl spent compiling the script. It shows up clearly here because this script is so fast.

Then, above the blocks for both foo and bar, you can see the recursive calls to fib rising like flames (okay, with a little imagination). Two things to note here. Firstly bar is shown to the left of foo simply because the names at each level are in lexicographic order. There’s no deeper meaning in the ordering.

Secondly, you can easily see that bar was faster (narrower) than foo, even though they contain the same code. Why’s that? When foo ran first it would have paid the price for growing the stacks and warming the memory pages. Then when bar was called it gained from foo‘s work.

Flame Graph Generator

Behind the scenes nytprofhtml runs nytprofcalls to generate a file in the report directory called all_stacks_by_time.calls. It then calls flamegraph.pl to read that file and generate the all_stacks_by_time.svg that’s shown in the report.

The all_stacks_by_time.calls has a very simple format. One line per distinct call stack, with subroutine names separated by semicolons, followed by a number (which is either in 1µs or 100ns units depending on the platform). Here’s an example running the code above but calling fib(2) instead of fib(8) to keep it small:

This simple format is perfect for grep’ing! You can effectively zoom-in on any subset of the call stacks by generating a flamegraph of just the stacks that contain the functions you’re interested in. For example, running this command on the profile of perlcritic shown at the top:

You can see that a lot of time is being spent gathering stack traces for exceptions (this is with perlcritic 1.118 on perl v5.14.2).

It would be nice to have a Flame Graph generated for each of the top-N files/modules, showing just the subset of call stacks that involve any of the subroutines defined in that file. I didn’t get around to that for v5.00. Feel free to fork the code, add that in, and send me a pull request!

Minor Changes

The very old and very limited nytprofcsv utility has been deprecated. Let me know if you use it, otherwise it would be around much longer.

The blocks option is no longer on by default – it seems that few people used the ability to view statement times rolled up at the block level. You can always enable it with blocks=1 in the options.

What Next?

For NYTProf? I don’t know.

Next up on my to-do list is giving Devel::SizeMe the love it needs. There’s some deep work I’d really like to get done before YAPC::NA in June.

Maybe I’ll see you there.

Filed under: perl Tagged: nytprof, performance]]>6TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=5492013-03-12T09:31:46Z2013-03-10T22:16:06ZContinue reading →]]>I expressed this idea recently in a tweet and then started writing it up in more detail as a comment to Brendan Byrd’s The Four Major Problems with CPAN blog post. It grew in detail until I figured I should just write it up as a blog post of my own.(I fell out of the way of blogging over the two years or so of focus and distraction that our major house extension took to go from conception to reality. I’ve been meaning to start blogging again more regularly anyway. I’ve a few blog posts brewing in the back of my mind, so we’ll see how it goes.)

There is not enough data on what modules are mature; which ones are the “right ones” to use.

Many modules are only used for semi-private needs.

Modules cannot be renamed or deleted, even with a long-term deprecation process.

I’d like to propose a feature that doesn’t seem to address these issues directly but would, I believe, greatly reduce the significance of all of them.

Olaf Alders responded to Brendan’s post with Sifting Through the CPAN and pointed out the need for better search tools and specifically suggests tagging. While tagging might be helpful in general I think we need a way to explicitly guide users from one module to another.

Suggested Alternatives

I’ve long thought that CPAN would benefit from a mechanism to track “suggested alternative modules”. (And/or perhaps “suggested alternative distributions“, but I’ll just talk about modules for now.)

I envisage a “Suggested Alternatives” section in the right sidebar on every module page. It would show the top-N suggestions, with a [++] icon beside each, ordered by the number of people who have made the suggestion or agreed to it by pressing the [++] icon. And naturally it would have a text field to enter an existing module name, with type-ahead suggestions. Finally, the Suggested Alternatives heading would be a link to a details page.

The details page would show, for that module, every instance of a suggestion being made or up-voted, with the user and the date. That would let people see who made the suggestion and when. Users would be able to remove their own suggestions.

For modules that are the suggested alternative for some other module, their page could show something like “Suggested as the alternative to X other modules by Y people” with a link to a page that would show the corresponding details.

The Alternatives Graph

This “alternatives” data creates a graph of relationships among similar modules in a powerful and directly useful way.

For search results it would be useful not only for ranking but also for widening the search. Modules that are the suggested alternatives for modules in the ‘natural’ results could be included. That’s potentially a big win.

Of course it would be perfectly reasonable for a pair of modules to have suggestions pointing to each other. Or for there to be loops of suggestions. That’s fine and simply expresses the conflicting views of the users making the suggestions.

Similar Modules (a digression)

I also had the idea that there may be value in having a ‘similar modules’ link that shows the list of modules produced by traversing the graph of suggestions for some number of hops in both directions, and ranked by some combination of votes and placement in the graph.

But then I wondered if that would be better implemented an explicit way to suggest a ‘similar module’. In other words, generalize the idea of a “suggested alternative” into a “related module” relationship plus attributes like a “weight”. Where a positive weight denotes a “suggested alternative” and a zero weight is simply a “similar module” or a “see also”. Perhaps there’s also value in having a “complementary module” relationship.

This is all a bit vague. It suggests to me that any code to support a “module relationship” mechanism should be kept generic to allow for other kinds of relationships in future.

The Whys and Wherefores

The primary data of the graph is a link from one module to another with a count of the number of people who agreed with that suggestion.

That surface data is built from a deeper layer that records, for each link, which users that made the suggestion and when.

A helpful extra feature would be to let users optionally give a short reason for whythey are suggesting this particular alternative. Perhaps because they feel it’s unmaintained, or lacks specific features that their suggested alternative has.

Suggestions without the whys would be very useful, and I’d suggest that that much is implemented first. But suggestions without explanations are also very limited. Knowing what motivated someone to suggest a particular alternative would be very helpful to others trying to pick a module for a task. For example, people might make multiple alternative suggestions recommending Bar instead of Foo if you want a certain feature, and Baz instead of Foo if they want another.

I don’t think there’s much risk of this becoming a comment battlefield because on any given page all the comments share the same direction ‘away’ from the module. Someone with an opposing viewpoint would add a separate suggestion with their own comments on the ‘opposite’ module.

I’d suggest the comment field be kept very short, say 50 characters, and provide a separate url to encourage referencing supporting material such as a blog post or mailing list archive.

Other approaches might be to have a few checkboxes with typical reasons (very limited), or perhaps tags, or link in with cpanratings in some way (possibly complex).

Alternative Distributions

The best way to build and present Alternative Distributions data is probably to simply derive it from the Alternative Modules data.

It would simply be a read-only view that collapses the module level graph data down to links between the corresponding distributions.

Yanick Steps Up

After writing a draft of this post I saw a tweet from Yanick with a link to a specific proposal on his blog. I skimmed it, realised it was similar to mine and replied saying to I’d reference it here. I decided I’d finish my post before reading it properly.

So here are my thoughts on Yanick’s suggestions:

Distributions vs Modules: Modules are the fundamental unit of use and the natural focus of attention and reviews. It’s relatively easy to derive distribution suggestions from module suggestions, but not the other way around. Using modules as the focus also means the suggestions will still be valid if a module moves from one distribution to another.

Adding notes: I agree that comments are best avoided for the initial system. I also feel strongly that their value outweighs their risks if implemented and presented carefully, so they should at least be taken into account in the initial design work.

User interface for recommending an alternative: Having a button beside the existing high-profile vote button doesn’t feel right to me. The vote button is a positive action and encouraging low-friction drive-by voting makes sense. Suggesting an alternative is a more negative action, and one to be considered more carefully. Using the sidebar seems more appropriate.

User interface for viewing suggestion alternatives: I’d rather not include any user names on the module page. It complicates the code and confuses the user experience (“which names are shown and why?” etc). The full details are available on the detail page if anyone wants to take the extra step to see them.

Filed under: perl Tagged: cpan, metacpan]]>1TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=5402013-06-17T11:52:33Z2012-10-05T11:49:03ZContinue reading →]]>For a long time I’ve wanted to create a module that would shed light on how perl uses memory. This year I decided to do something about it.

My research and development didn’t yield much fruit in time for OSCON in July, where my talk ended up being about my research and plans. (I also tried to explain that RSS isn’t a useful measurement for this, and that malloc buffering means even total process size isn’t a very useful measurement.) I was invited to speak at YAPC::Asia in Tokyo in September and really wanted to have something worthwhile to demonstrate there.

I’m delighted to say that some frantic hacking (aka Conference Driven Development) yielded a working demo just in time and, after a little more polish, I’ve now uploaded Devel::SizeMe to CPAN.

In this post I want to introduce you to Devel::SizeMe, show some screenshots, a screencast of the talk and demo, and outline current issues and plans for future development.

For a while I thought Devel::NYTProf might be a useful framework for building some kind of “memory profiler”. Something that would measure changes in memory use over time between lines and subroutines. Nicholas Clark even created a clever experimental hack to demo the concept. Sadly the data just didn’t seem to be very useful. It turns out that knowing where memory is allocated and freed isn’t nearly as important as knowing where memory is being held.

The Plan

It was clear that some kind of ‘snapshot’ mechanism was needed. Something that would:

crawl all the data structures within a perl interpreter

have some way of naming the path to each data structure

stream the data out for external storage and processing

be fast enough that snapshots could be taken frequently

visualize the vast amount of data

compare different snapshots

Luckily the hardest part, step 1, was already covered by Devel::Size. Originally written by Dan Sugalski in 2005, then maintained by Tels and BrowserUK, it had been picked up and polished by Nicholas Clark to stay in sync with the many internal optimizations he and others were adding to the perl core. It’s not without problems, and I’ll outline those below, but it was a great base for me.

I added a callback mechanism, so my code and others could “hitch a ride” on the back of Devel::Size as it crawled the data structures, and came up with a very lightweight way to track and output the “name path”.

Textual Output

My initial code just wrote a tree-like textual representation to prove the concept:

There you can see the array (PVAV) ‘node’ with ‘leaf’ sizes for the sv_head (24 bytes), sv_body (40 bytes), and the array of element pointers (av_max, 24 bytes). Below that you can see a ‘link’ called AVelem pointing to a reference (RV) to an array with no elements. The “~note” lines are ‘attributes’ that can be used to provide extra information about nodes. The ‘=NNN‘ gives a running total of the accumulated size.

Graph Visualization

That detail can quickly become overwhelming for non-trivial data structures. Some kind of visualization was needed. So I added a more compact ‘raw’ output format and a script (sizeme_store.pl) to process it. The script ‘decorates’ the nodes with the leaf and attribute data, gives the links better names, and adds extra details like the total size of the children.

The SIZEME env var gives the name of the file to write the raw data to, or in this case the name of a program to pipe the data into. Here I’m asking sizeme_store.pl to write a dot format file which, when rendered by Graphviz, produces a graph like this:

You can see the links have been labeled with the index attribute, and the nodes show how the size is calculated (self+children=total) and the sizes accumulate up the graph.

That’s lovely, and works well for modestly sized data structures. It doesn’t scale well though. You quickly find yourself looking at diagrams like this:

Treemap Visualization

The graph visualization is rather more impressive than it is practical. A more useful visualization for this kind of data is an interactive treemap. Where the size of the boxes represents the memory use and you can drill-down into the data structures. To do that, and have it work on massive data dumps, I needed some kind of database and tree map code that supported on-demand loading. I opted for SQLite as the data store, the JavaScript InfoVis Toolkit for the tree map code, and Mojolicious::Lite as the web app framework.

The overall grey area, which has a title bar labeled “SV(PVAV)”, represents the total memory used by the structure. The area is divided into three parts for the three elements of the array. The smallest, labeled “[0]-> SV(IV)”, is the integer. The next larger one, labeled “[1]-> SV(PV)”, is the string. The largest area is the array reference. Because the referenced array was empty the logic in sizeme_graph.pl has ‘collapsed’ the array into the parent node to simplify the tree map. This is reflected in the label “[2]-> SV(RV) RV-> SV(AV)”.

The darker box is a tooltip that moves with the pointer and displays extra detail about whatever node the pointer hovers over. In this case it’s showing that the total memory use is 88 bytes (the head and the body size of the RV and the AV have been summed up). The rest of the content is mostly debugging information. They’ll be more useful info here in future.

The Whole Picture

The total_size($ref) function dumps the contents of a particular data structure. But it’s not enough to get the whole picture. For that I wanted to be able to dump everything in a perl interpreter. Executing total_size(\%main::) gets closer to everything, but it’s still a long way off.

So I added a perl_size() function. That starts by dumping the stashes (\%main::, or in internals speak PL_defstash) but then goes on to dump many more items you might never have realized existed. PL_stashcache, PL_regex_padav, PL_encoding, PL_modglobal, and PL_parser to name but a few. It then records the amount of unused space in perl’s arenas.

Finally then scans the arenas looking for any values that haven’t been seen yet. Currently this finds quite a lot because the perl_size() code isn’t complete yet. (Many thanks to rafl for helping improve the coverage here.) Once it’s complete, any unseen values found in the arenas will be leaks. So Devel::SizeMe may turn into a useful leak detection tool.

Taking this idea further, there’s also a heap_size() function. The goal here is to try to account for everything in the heap. (See my slides if you’re not familiar with that term.) The one key item here is asking malloc for information about how much memory it’s using and, especially, how much ‘free’ memory it’s holding on to, for malloc’s which support that.

See It In Action

This explanation is rather dry. To get a real sense of what Devel::SizeMe can do you need to see it in action with some non-trivial data. Here’s a screencast of my Perl Memory Use talk at YAPC::Asia (also available as a raw mov here and here, mv4 here and here, and mp4 here and here). The demonstration starts at 13:00.

Simple Usage

Devel::SizeMe notices that it’s been run as perl -d:SizeMe and arranges to automatically call perl_size() in an END block. Simple.

Current Issues

There are two weakness with the current Devel::Size logic that affect Devel::SizeMe.

The first is that it uses a simple depth-first search. That’s fine when just calculating a total, but for Devel::SizeMe it means that chasing references held by one named item, like a subroutine, can lead to all sorts of other items, including entire stashes, appearing to be “within” the item that held the reference. The second is that Devel::Size doesn’t have well defined sense of when to stop chasing references because it doesn’t consider reference counts.

So I plan to add a multi-phase search mechanism. References with a count of 1 will be followed immediately. References with a count greater than one will be queued, along with a count of how many times the reference has been seen so far. In this way all the ‘named’ data reachable from %main:: will be found first and identified with their natural names before the queued items are crawled. This should greatly improve the output.

More coverage is needed in perl_size() to reduce the number of ‘unseen’ items that show up in the arenas, as seen in the screencast.

Future Plans

A priority is to get my changes to the core of Devel::Size integrated back in. It would be crazy to have two modules duplicating this sometimes complex and perl-version-specific logic. My goal is to have a single C file that’s used by both modules. Each would compile it with different macros to enable the required behavior. This should enable Devel::Size to suffer no performance loss for the extra logic that Devel::SizeMe has added.

I’ve already started adding some support for “named” runs. The idea is to enable the size functions to be called multiple times within a single process, and to store the data in separate tables within the database. This is an important step towards being able to compare multiple runs to see how the memory use has changed.

Lots of refactoring is needed to turn my conference-driven-dash-for-the-finish-line hacking into more robust and reusable code. In particular I’d like to get a reasonable stable and useful database schema so other people can write module to process the data generated by Devel::SizeMe.

Further in the future I can imagine having an option to record the existence of pointers to data that’s already been seen. That information is currently discarded but would add a great deal of detail to the output. Reference loops would be much easier to see for example. It would turn the output ‘tree’ into a directed graph and enable much richer visualizations.

We’re just at the start.

Enjoy.

This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com/support/edu. Thank you Maria.Filed under: perl Tagged: memory, performance, perl, sizeme]]>4TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=5022011-11-18T12:11:18Z2011-11-16T21:52:01ZContinue reading →]]>A key part of my plan for Upgrading from Perl 5.8 is the ability to take a perl library installed for one version of perl, and reinstall it for a different version of perl.

To do that you have to know exactly what distributions were installed in the original library. And not just which distributions, but which versions of those distributions.

I’ve a solution for that now. It turned out to be rather harder to solve than I’d thought… As I mentioned previously, I had developed a “distinctly hackish solution” that seemed to be working well. Sadly it didn’t withstand battle testing.

We have a library with almost 5000 modules installed from CPAN over many years. I ran that hackish script and it duly listed the distributions it thought were installed. Using that list I reinstalled them into a new library and ran diff -r to compare the two. That found a bunch of differences that led me into a vortex of hacking and rerunning. Eventually I had to admit that the whole approach wasn’t robust enough and I started to explore other ideas.

Some searching turned up BackPAN::Version::Discover which is meant to “Figure out exactly which dist versions you have installed”. Perfect. Sadly it simply didn’t work well for me. Probably because it’s using a similarly flawed approach to my own.

I knew brian d foy’s MyCPAN project was working towards a similar goal. His approach required us to either run a large BackPAN indexing process or paying to license the data to offset his costs for doing so. That didn’t seem attractive.

I wondered about using GitPAN and the github API to match git blob hashes of local modules with files in the gitpan repos. Sadly GitPAN has fallen out of date and isn’t being maintained at the moment. With hindsight I’m thankful of that because it lead me to a better solution.

MetaCPAN

MetaCPAN is full of awesome. On the surface it looks like another kind of search.cpan.org site. Don’t be fooled. Underneath is a vast repository of CPAN metadata powered by an ElasticSearch distributed database (based on Lucene). How vast? Every file in every distribution on CPAN (and, critically for me, the BackPAN archive) has been indexed in great detail. Including details like the file size and which spans of lines are code and which are pod.

The Method

The next step was how to work out which of those candidates was the one actually installed. The key realization here was that I could use MetaCPAN to get version and file size info for all the modules in each candidate release and see how well they matched what was currently installed.

The whole process falls into several distinct phases…

The first phase finds the name, version, and file size of all the modules in the library being surveyed. (Taking care to handle an archlib nested within the main lib.)

Then, for every module it asks MetaCPAN for all the distribution releases that included that that module version. For rarely changed modules in frequently released distributions there might be many candidates, so it tries to limit the number of candidates by also matching the file size. This is especially helpful for modules that don’t have a version number.

Then, for every candidate distribution release, MetaCPAN is queried to get the modules in the release, along with their version numbers and file sizes. These are compared to the data it gathered about the locally installed modules to yield a “fraction installed” figure between 0 and 1. The candidates that share the highest fraction installed are returned.

Typically there’s just one candidate that has fraction installed of 1. A perfect match. Sometimes the fraction is less than 1 for various obscure but valid reasons. Sometimes life isn’t so simple. There may be multiple candidates that have the same highest fraction installed value. So the next phase attempts to narrow the choice from among the “best candidates” for each module. The results are gathered into a two level hash of distributions and candidate releases.

The final phase is the first to work in terms of distributions instead of modules. For each distribution it tries to choose among the candidate releases.

The Results

The method seems to work well. It identifies files with local changes. It deals gracefully with ‘remnant’ modules that were included in an old release but not in later ones. And it copes with distributions that have been split into separate distributions.

It reports progress and anything unusual to stderr and writes the list of distributions to stdout. You should investigate anything that’s reported to ensure that the chosen distribution is the right one.

I checked the results by creating a new library (see below) and running diff -r old_lib new_lib. I didn’t see any differences that I couldn’t account for.

The survey process is not fast. It can take a couple of hours on the first run for a large library. Most of that time is spent making MetaCPAN calls (lots and lots of MetaCPAN calls) so you’re dependent on network and MetaCPAN performance. Most of the calls are cached in an external file so later runs are much faster.

Using The Results

Using a list of distributions to recreate a library isn’t as straight-forward as it might seem. You can’t just give the list to cpanm because it would try to install the latest version of any prerequisites. I looked at using –scandeps or topological sorting to reorder the list to put the prerequisites first. It didn’t work out. I also looked at using CPAN::Mini::Inject (and OrePAN and Pinto) to create a local MiniCPAN for cpanm to fetch from. They didn’t work out either, for various reasons.

In the end I added a --makecpan dir option so that the surveyor script itself would fetch the distributions and create a MiniCPAN for cpanm to use.

Testing Bonus

Speaking of test failures, I was surprised to see how often tests failed due to problems with prerequisites even though the distribution and its prerequisites had passed their tests when originally installed. For example, imagine distribution A v1, and its prerequisite B v1 are installed. Later, distribution B gets upgraded to v2 but the tests for distribution A don’t get rerun.

Reinstalling all the distributions forces all distributions to be tested with the prerequisites that are actually being used.

You have a production system, with many different kinds of application services running on many servers, all using the perl 5.8.8 supplied by the system.

You want to upgrade to use perl 5.14.1

You don’t want to change the system perl.

You’re using CPAN modules that are slightly out of date but you can’t upgrade them because newer versions have dependencies that require perl 5.10.

The perl application codebase is large and has poor test coverage.

You want developers to be able to easily test their code with different versions of perl.

You don’t want a risky all-at-once “big bang” upgrade. Individual production installations should be able to use different perl versions, even if only for a few days, and to switch back and forth easily.

You want to simplify future perl upgrades.

I imagine there are lots of people in similar situations.

In this post I want to explore how I’m tackling a similar problem, both for my own benefit and in the hope it’ll be useful to others.

Incremental Upgrades

Perl now has an explicit deprecation policy that requires a mandatory warning for at least one major perl version before a feature is removed. So a feature that’s removed in perl 5.14 will generate a mandatory warning, at compile time if possible, in perl 5.12.

This means we should not jump straight from perl 5.8.8 to 5.14.1. It’s important to test our code with the latest 5.10.x and 5.12.x releases along the way. That way if we do hit a problem it’ll be easier to determine the cause.

This also fits in with our desire to simplify future upgrades. Effectively we’re not doing one perl version upgrade but three, although we may only do one or two actual upgrades on production machines.

Multiple Perls

We want the developers to be able to able to easily test their code with different versions of perl, so we need to allow multiple versions to be installed at the same time. Fortunately perlbrew makes that easy.

We’ll probably have the systems team install ready-built and read-only perlbrew perls on all the machines via scp. We’ll use perlbrew as a way to get a set of perls installed but the actual selecting of a perl via PATH etc. we’ll handle ourselves.

Multiple CPAN Install Trees

Major versions of perl aren’t binary compatible with each other. This means extension modules, like DBI, which were installed for one major version of perl can’t be reused with another.

We keep all the code installed from CPAN in a repository, separate from the perl installation. Perl finds them using PERL5LIB env var and installers install there using the PERL_MB_OPT and PERL_MM_OPT env vars to set it as the ‘install_base’.

Since we want developers to switch easily between perl versions, this means we need multiple CPAN installation directories, one per major perl version. We’ll rebuild and reinstall the extension modules into each immediately after building and installing the corresponding perl version.

If we have to rebuild and reinstall the extension modules then we can easily rebuild and reinstall all our CPAN modules. That way we get to rerun all their test suites against each version of perl plus the specific versions of their prerequisite modules that we’re using.

Reinstalling CPAN Distributions

This is where it gets tricky.

Identifying what CPAN distributions we have installed is fairly easy. You can use tools like CPAN.pm or whatdists.pl to generate a list. But there’s a catch. They’ll only tell you what current distributions you need to install to get the same set of modules. That’s not what we need.

We need a list of the specific distribution versions that are currently installed. It turns out that that information isn’t recorded in the installation and it’s amazingly difficult to recreate reliably. (The perllocal.pod file ought to have this information but isn’t updated by the Module::Build installer and doesn’t record the actual distribution name.)

In an extension of his MyCPAN work, brian d foy is trying to tackle this problem by creating MD5 hashes for the millions of files on BackPAN (the CPAN archive) but there’s still much hard work ahead.

Why do we need the specific versions, why not simply upgrade everything to the latest version first as a separate project? Two reasons.

First, we’re caught by the fact that some latest distributions, either directly or indirectly, require a later version of perl. (David Cantrel’s cpxxxan project offers an interesting approach to this problem. E.g., use http://cp5.8.8an.barnyard.co.uk/ as the CPAN mirror to get a “latest that works on 5.8.8″ view. [Thanks to ribasushi++ for the reminder.])

Second, having a complete list of exactly what we have installed also gives us easy reproducibility. Future installs will always yield exactly the same set of files, without risk of silent changes due to new releases on CPAN. The cpxxxan indices for older perls are much less likely to change, but still may. Also, if we upgraded everything to the latest using cp5.8.8an we’d need an extra testing cycle to check for problems with that upgrade before we even start on the perl upgrade.

After contemplating the large, ambitious, and incomplete MyCPAN project, I decided I’d try a distinctly hackish solution to this problem by extending the whatdists.pl script with a perllocal.pod parser and some heuristics. It seems to have worked out well. I’m going to check it by installing the distributions into a different directory and diff’ing that against the original.

If that works out I’ll release the code and write up a blog post about it.

Installing Only Specific CPAN Distributions

Normally when you install a distribution from CPAN you’re happy for the installer to fetch and install the latest version of any prerequisite modules it might need. In our situation we want to install only a specific version of each.

In theory we could arrange that by ordering the list such that the prerequisite modules are installed first. The CPANDB module combined with a topological sort of the requires, test_requires, and build_requires dependencies via the Graph module should do the trick. [Hat tip to ribasushi++ for the CPANDB suggestion.] But there’s a simpler approach…

I’ll probably simply duck that issue by using CPAN::Mini::Inject to create a miniature CPAN that contains only the specific versions of the specific distributions we’re using. Then we can use the cpanm –mirror and –mirror-only options to install from that mini CPAN.

Extending Test Coverage

All the above will give developers the ability to switch perl versions with ease, while keeping exactly the same set of CPAN modules. So now we can turn our attention to testing.

Our test coverage could charitably be described as spotty. Getting it up to a good level across all our code is simply not viable in the short term.

So for now I’m setting a very low goal: simply get all the perl modules and scripts compiled. You could say I’m aiming for 100% “compilation coverage” :-)

This will get all the developers aware of the basic mechanics of testing, like Test::Most and prove and it gives us a good baseline to increase coverage from. More importantly in the short term it let’s us detect any compile-time deprecation warnings as we test with perl 5.10 and 5.12.

To ensure 100% (compilation) coverage I’ll use Devel::Cover to do coverage analysis and write a utility, probably using Devel::Cover::Covered, to find all our perl scripts and modules and check that they have all at least been compiled.

Summary

Multiple perl versions, via perlbrew.

Multiple identical CPAN install trees, one per major perl version.

Proven 100% compilation coverage as a minimum.

So, that’s the plan.

Filed under: perl Tagged: cpan, testing]]>10TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=4862011-04-14T17:51:02Z2011-04-14T17:50:58ZContinue reading →]]>The company I work for, TigerLead.com, has another job opening in West LA:

As a Senior Developer, you will be playing a central role in the design, development, and delivery of cutting-edge web applications for one of the most heavily-trafficked network of real estate sites on the web. You will work in a small, collaborative environment with other seasoned pros and with the direct support of the company’s owners and senior management. Your canvas and raw materials include rich data sets totaling several million property listings replenished daily by hundreds of external data feeds. This valuable data and our powerful end-user tools to access it are deployed across several thousand real estate search sites used by more than a million home-buyer leads and growing by 50K+ users each month. The 1M+ leads using our search tools are in turn tracked and cultivated by the several thousand real estate professionals using our management software. This is an outstanding opportunity to see your creations immediately embraced by a large community of users as you work within a creative and supportive environment that is both professional and non-bureaucratic at the same time, offering the positives of a start-up culture without the drama and instability.

If that sounds like interesting work to you then take a look at the full job posting.

TigerLead is a lovely company to work for and this is a great opportunity. Highly recommended.

Filed under: perl, software Tagged: jobs, perl, postgresql]]>0TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=4602010-07-29T00:00:18Z2010-07-16T17:12:59ZContinue reading →]]>In this post I’m going to talk about the java2perl6api project. What its goals are, why I think it’s important, how it relates to a Perl 6 DBI, what exists now, what’s needs doing, and how you can help.

Firstly I’d like to point out that, funnily enough, I’m not very familiar with Java or Perl6. It’s entirely possible that I’ll make all sorts of errors in the following details. If you spot any do please let me know.

Background

The Java language ecosystem is big and mature after years of heavy investment of time and money.

It doesn’t have a central repository of Open Source modules like CPAN (though Maven repositories likethese are similar I guess). It does, however, have a number of mature high quality class libraries, and a very large number of developers familiar with those libraries (more on that below).

Goals

The primary goal of the java2perl6api project is to make it easy to create Perl 6 class libraries that mirror Java equivalents. By mirror I mean share the same method names and semantics at a high level (though not at a low-level, more on that below).

Secondary goals are to do that well enough that:

the documentation for Java classes can serve as primary the documentation for the corresponding Perl 6 classes. The Perl 6 classes need only document the differences in behavior, which these should be minimal and ‘natural’. The same applies to books describing the Java classes.

Java developers familiar with the Java classes should feel comfortable working with the corresponding Perl 6 classes.

and, hopefully, some way can be found to convert test suites for the Java classes into Perl 6 code that’ll test the corresponding Perl 6 classes. (I appreciate that this is a non-trivial proposition, but there are viable approaches available, like xmlvm.) Even if that can’t be done, extracting and translating tests manually is less work, and more effective, than creating them from scratch for a new API.

Why?

Firstly, creating good APIs is hard. Java APIs like JDBC 3.0 and NIO.2 are the result of years of professional effort and demanding commercial experience. Why not build on that experience?

I appreciate that Java APIs are often limited by the constraints of the language, such as the lack of closures, and that Perl 6 can probably express any given set of semantics more effectively than Java. My point here is that some Java APIs embody, however inelegantly, years of hard won experience that we can benefit from. I’d rather make new mistakes than repeat old ones.

Secondly, there are many more Java developers than Perl developers. Many many more if job vacancies are any indication:

I think we’d be foolish not to try to smooth the path for any Java developers who might be interested in Perl 6. The java2perl6api project is just one small aspect of that.

I really hope someone starts writing a “Perl 6 for Java Developers” tutorial. Perl 6 has the potential to become a very popular language1. Getting just a tiny percentage of Java developers (and Computer Science majors and their teachers) interested in it could be a big help.

Thirdly, any future DBI for Perl 6 and Parrot needs a much better foundation than the very limited and poorly defined one that underlies the Perl 5 DBI. I plan to adopt the JDBC 3.0 API and test suite for that internal role. (You could call this a “Test Suite Driven Strategy”.) I’ll talk more about that in a future blog post.

The History java2perl6api

I’ve been kicking around various ideas for integrating Java and Perl6/Parrot for years. I think I first decided to use JDBC as the inspiration for the DBI-to-driver API in 2006.

You may remember back in 2004, around the 10th anniversary of the DBI, the Perl Foundation setup a “DBI Development Fund” that people could donate to. I’ve never drawn any money from that fund. I want to use it to oil other peoples wheels.

In 2007 Best Practical sponsored Perl 6 Microgrants through the Perl Foundation. I asked if I could piggyback my idea for a Java to Perl 6 API translator onto their microgrant management process but using money from the DBI Development Fund. TPF and Best Practical kindly agreed. I posted a description of the task and Phil Crow volunteered and was awarded the microgrant in April 2007.

Development ground to a halt around the end of 2007 for various reasons. It picked up again for a few months after OSCON 2009 (where I gave a short lightning talk asking for help) then stalled again in October. Partly because we seemed to have hit a limitation with Rakudo and partly because I was focussed on Devel::NYTProf version 3 and then version 4, which took way more time than I expected.

There’s life in the project again now. We’ve dodged the earlier problem, put the code on github, brought it into sync with current Rakudo Perl 6 syntax, and generally instilled some momentum.

The Current java2perl6api

Let’s take a look at a simple example.

To generate a perl6 file that mirrors the API of the java.sql.Savepoint class you’d just execute java2perl6api like this:

The pod section shows the description of the class that javap returned. The java2perl6api utility parsed that Java interface and generated the corresponding Perl6 role. The ‘java.sql.Savepoint’ has been mapped to ‘java::sql::Savepoint’. The generated methods are stubs using ... (the “yada, yada, yada” operator). The types int and java.lang.String have been mapped to Int and Str. Because the only types used were built-ins, no type declarations were added.

The default behavior is to recursively process any Java types referenced by the class which aren’t mapped to Perl 6 types. So executing java2perl6api java.sql.Connection, for example, will generate 48 Perl 6 modules! (Because java.sql.Connection refers to many types, including java.sql.Array which refers to many types including java.sql.ResultSet which refers to java.net.URL which refers to java.net.Proxy etc. etc.) The --norecurse options disables this behavior.

Normally you’ll want to use the recursion but instead of letting it drill all the way into the Java types, you would supply your own ‘typemap’ specification via an option. That tells java2perl6api which Java types you want to map to which Perl 6 types. So instead of recursing into the java.net.URL type to generate a java/net/URL.pm6 file, for example, you can tell java2perl6api to use a specific Perl 6 type. Perhaps just Str for now.

How this relates to JDBC / DBDI / DBI v2

I want to start applying java2perl6api to the JDBC classes now to create a “Database Driver Interface” or “DBDI” for Perl 6.

Starting with the DriverManager class and the Connection interface I’ll use java2perl6api to generate corresponding Perl 6 roles with heavy stubbing out of types. Basically anything I don’t need to think about right now will be mapped to the Any type.

I’ll start fleshing out some basic implementation logic for each in a Perl 6 class that does the corresponding role. I’ll probably use PostgreSQL as the first driver and the guts of MiniDBD::Pg as inspiration.

The first minor milestones will be creating connections, then execute non-selects, then selects then prepared statements. Somewhere along the way I expect they’ll be a Perl 6 DBDI driver implemented for the Perl 6 MiniDBI project. The next key step would be to start refactoring the code heavily so anyone wanting to implement a new driver should only have to implement the driver specific parts. (There are some JDBC driver toolkits that can provide useful ideas for that.)

What needs doing

One fairly simple item is to add a --prefix option to specify an extra leading name for the generated role. So java.sql.Savepoint with a prefix of DBDI would generate a DBDI::java::sql::Savepoint role.

Another item, less simple but more important, is to automatically discover the values of constants and embed them into the generated file. Probably the best way to do that is to extend the parser (which uses Parse::RecDescent) to parse the verbose-mode output of javap, which includes those details.

Perl has three regular expression match variables ( $& $‘ $’ ) which hold the string that the last regular expression matched, the string before the match, and the string after the match, respectively.

As you’re probably aware, the mere presence of any of these variables, anywhere in the code, even if never accessed, will slow down all regular expression matches in the entire program. (See the WARNING at the end of the Capture Buffers section of the perlre documentation for more information.)

Clearly this is not good.

I’ve long planned to add detection and reporting of this to Devel::NYTProf, along with things like method cache invalidation, but it’s never risen to the top of the list. In fact, now I look, I see it never even got entered into the ever-growing collection of ideas recorded in the HACKING file.

After the 4.00 release, plus few minor releases, I’d put NYTProf on hold and was starting to focus on my java2perl6 API translation project (more news on that soon).

Someone with a performance problem is likely to use a profiler like NYTProf to see where time is being spent in their code. That might point out that significant time is being spent in regular expressions, but even then they might not make the leap to consider these special match variables as a possible cause. The profiler should point it out to them!

NYTProf version 4.03 didn’t. Clearly that was not good. So NYTProf version 4.04 now does!

In the list of files on the index page it highlights the file and adds a comment:

On the report page for the file itself it adds an unmissable, and hopefully self-explanatory, note to the top of the page:

I’d be very interested to hear from anyone who now discovers these problem variable lurking in their application code or any CPAN modules.

Go take a look!

Filed under: perl Tagged: nytprof, perl]]>4TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=4492010-07-08T21:24:29Z2010-07-08T12:48:27ZContinue reading →]]>The name Buzz Moschetti probably isn’t familiar to you. Buzz was the author of the Perl 4 database for Interbase known as Interperl.

Buzz was one of the four people to get the email on September 29th 1992 from Ted Lemon that started the perldb-interest project which defined a specification that ultimately lead to the DBI. (The other people were Kurt Andersen re informix, Kevin Stock re oraperl, and Michael Peppler re sybperl. I joined a few days later.)

Update: It turns out that it was actually Buzz who sent that original email, Ted just forwarded it on to others, including me. So Buzz can be said to have started the process that led to the DBI!

I hadn’t heard from Buzz for many years until he sent me an email recently.

This is his story:

Thought I’d share a quick story with you.

Recently, I was frustrated with a development team’s efforts in putting
together some DB-oriented reconciliations. The candidate solution was a
blend of precompiled SQL in COBOL code, file dumps and ftps, programs to
read files, more programs to read other DBs, etc. etc. Not only was
the process orchestration a project in its own right, the end-to-end
logic required to accurately perform the reconciliation was distributed
across several programs and platforms, diluting the knowledge base. I
knew a perl program using multiple DBD drivers to different DB engines
could do it in a much cleaner way, but over the years my job has changed
and although I still use perl regularly, I don’t do much in the way of
DBD/DBI. To make matters worse, one of the targets was mainframe DB2
and very little work had been done here with DBD::DB2. Also, the
Sybperl module continues to be heavily used in addition to DBD::Sybase,
so local DBD/DBI expertise in general is thin. I decided to get it
working on my own.

The infrastructure team spun up for me a Linux virtual machine with a
modern build environment on it. This had the latest gcc compilers and a
firm-approved build of perl 5.8.5 right out of the box. It took a few
days of low-priority requests to get the appropriate 32bit Linux
client-side SDKs for the DB2 and Sybase products but soon enough I had
an environment set up with headers and shared libs. I was ready to
build some perl modules, something I haven’t done in years.

I went to CPAN and downloaded DBD::DB2, untar’d it, and ran perl
Makefile.PL and make. Everything worked perfectly and the whole
exercise took minutes. ‘make test’ sets PERL_DL_NONLAZY and warned of
some unused symbols not being found, but that was OK. The rest of the
tests that I expected to work with my level of permissions worked fine.
‘make install’ worked perfectly. Buoyed by this success, I wrote a
4-liner test program just to connect and fetch some data from a table I
knew about. Outside of the test environment, however, the shared libs
for DB2 were not found so I cheated and relinked and reinstalled DB2.so
with the -Wl,-rpath option to “cement in” the location of those libs so
I wouldn’t have to fuss with LD_LIBRARY_PATH. My test program now
worked fine. Newly comfortable with the process, I downloaded
DBD::Sybase and built and installed the module in scarcely more time than
it took for the compiler to run. In my excitement I skipped over the
DBD::Sybase 4-liner test program and went straight to a slightly bigger
script that used both modules and grabbed data from both DBs. It
quietly and quickly executed.

Total time from initial download with almost no clues to a running
example: about 40 minutes. Later, for grin’s sake, I threw in
DBD::Oracle for good measure. That went even faster — about 5 minutes
— from CPAN download to printing “Oracle connected!” because I was more
familiar with the connection string syntax that is bespoke for each
engine.

As I watched the program run, it made me reflect on how far we’ve come
and how easy yet sophisticated the perl module ecosystem has become.
There is no question that this multi-DBD perl program is easier to
understand and support than a solution involving a set of disconnected
programs, platforms, and files. But I think it is the organization and
design of the resources as a whole — DBI, DBD, CPAN, MakeMaker, pod,
binary and non-binary library locations, etc. — that makes the whole
environment so clear, symmetric, and easy to use with confidence. I
think back to the build environment that I used to create interperl, and
the progress that has been made in terms of both breadth of module
functionality and depth of framework for module build portability is
simply amazing. Perl has grown far beyond just being another language.
It has a value proposition as an able integrator of widely disparate
functionality.

I exited the Perl mainstream some time ago but I am watching from the
side and I applaud the work you’ve done in this space.

Take care.

Thanks Buzz!

Filed under: perl Tagged: cpan, dbi, perl, perl4]]>1TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=4052010-06-11T14:55:41Z2010-06-11T14:01:58ZContinue reading →]]>I’ve recently started looking into geocoding in perl. We’re currently using some old hand-coded logic to query the Yahoo Search API. I wanted to switch to Geo::Coder::Yahoo but I noticed that that depended on Yahoo::Search which hadn’t been updated since March 2007 and had accumulated a number of bug reports (which may well be closed by the time you read this).

Several related to the fact that Yahoo::Search didn’t handle Unicode properly when using its default internal XML parser (instead of the optional XML::Simple which does the right thing, but slowly).

What happened next makes a nice little example of getting things done in the Open Source world…
I emailed Jeffrey Friedl, the author of Yahoo::Search, to enquire about the status of the module and saying I might be willing to help out with maintenance. He replied promptly saying he’d be delighted if someone could take it over.

I also changed the distribution from using a traditional Makefile.PL over to the awesome dzil tool to reduce friction and simplify future releases.

Then I changed the permissions on PAUSE to enable me to make releases and eventually, when that propagates to RT, be able to maintain the bug tracker and close the tickets.

And finally for today, I’ve uploaded a development release to CPAN for cpantesters to smoke test. I’ll close the RT tickets in due course.

Once I’ve made a final release in a few days I’ll have shaved my Yahoo::Search yak. After that I’ve no real interest in maintaining it on an ongoing basis.

Could you be interested looking after a freshly shaven and sweet smelling Yahoo::Search? Available free of charge to a good home!

Filed under: perl Tagged: geo, perl, xml, yahoo]]>0TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3992010-06-09T21:27:03Z2010-06-09T21:24:40ZContinue reading →]]>Where I’m working at the moment we’re using the Yahoo Geocoding API but aren’t very happy with it. I’ve been asked to look into how we can improve our geo coding.

Geo coding services vary greatly in accuracy, precision, availability, throughput capping, and other attributes. So it can help to try multiple services until you get sufficient confidence in the result.

Firstly I’m interested in your experiences with geocoding services. Which you’ve tried, and which you’d recommend (for geocoding for US addresses). What problems you’ve encountered and any advice you’d like to pass on.

Secondly I’m interested in your thoughts on working with multiple services.

Geo::Coder::Multiple looks interesting but quite limited. For example, it’ll accept the first valid response even if it’s of low precision. There’s also no provision for checking multiple results to derive some measure of confidence, for “knowing when to stop”.

Some feature ideas:

Ordered list of geocoders

Auto rate limit by detecting over-limit response and disabling for a period, perhaps with exponential back-off.

Result-picking callback to pick best result from those collected so far. It could tell if there were more to try and return undef to mean “keep going”.

Some pre-defined result-picking callbacks for common use cases.

Any thoughts on those?

What kind of features would you like to see?

Want to help build this?

Filed under: perl Tagged: geo, perl]]>11TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3932011-04-29T16:41:08Z2010-06-09T19:26:17ZContinue reading →]]>I released Devel::NYTProf v3 on Christmas Eve 2009. Over the next couple of months a few more features were added. The v3 work had involved a complete rewrite of the subroutine profiler and heavy work on much else besides. At that point I felt I’d done enough with NYTProf for now and it was time to focus on other more pressing projects.

Over those months I’d also started working on enhancements for PostgreSQL PL/Perl. That project turned into something of an epic adventure with more than its fair share of highs and lows and twists and turns. The dust is only just settling now. I would have blogged about it but security issues arose that led the PostgreSQL team to consider removing the plperl language entirely. Fortunately I was able to help avoid that by removing Safe.pm entirely! At some point I hope to write a blog post worthy of the journey. Meanwhile, if you’re using PostgreSQL, you really do want to upgrade to the latest point-release.

One of the my goals in enhancing PostgreSQL PL/Perl was improve the integration with NYTProf. I wanted to be able to profile PL/Perl code embedded in the database server. With PostgreSQL 8.4 I could get the profiler to run, with some hackery, but in the report the subroutines were all __ANON__ and you couldn’t see the source code, so there were no statement timings. It was useless.

The key problem was that Devel::NYTProf couldn’t see into string evals properly. To fix that I had to go back spelunking deep in the NYTProf guts again; mostly in the data model and report generation code. With NYTProf v4, string evals are now treated as files, mostly, and a whole new level of insight is opened up!

In the rest of this post I’ll be describing this and other new features.

Seeing Into String Evals

Let’s start by taking a look at a small example:

perl -d:NYTProf -e 'eval("sub { $_ }")->() for 1,2,2'

That executes three string evals, each of which defines an anonymous subroutine which is then executed. Two of the three evals have identical source code.

With NYTProf 3.11 the report for the “-e file” looked like this:

Two key things to note: Firstly, there’s no link to drill-down to see the actual source code executed by the eval. (In this example we can see the source, but that’s rare in practice.) Secondly, the three anonymous subroutines have been merged. You can’t see individual details like call counts, callers, or timings.

(In case you’re wondering, the main::BEGIN subroutine is defined as a side effect of loading NYTProf, and the main::RUNTIME subroutine is a dummy created by NYTProf to act as the ‘root caller’. You’ll see it appear as the caller of the anonymous subs in a later screenshot.)

With NYTProf 4.00 the same report looks like this:

Now you can see much more detail right there. The two evals with identical source code have been merged, as have the identical anonymous subroutines defined by them. The eval and anonymous sub with different source code have been kept separate. What you can’t easily see from the image is that the “string eval” texts in the grey annotation are links. This is where it’s gets more interesting…

Clicking on the “2 string evals (merged)” link takes us to a typical NYTProf report page showing the performance-annotated source code executed by the eval:

What you’re looking at here is source code that never existed as a file. (That second line containing a semicolon was added by perl as part of the implementation of eval.)

In the table at the top, you’ll see “Eval Invoked At” with a link that’ll take you to the eval statement that executed this source code. You’ll also see a “Sibling evals” row. That’s added in cases where an eval was executed multiple times and not all were merged into a single report. Finally, because this particular eval includes data merged from others, the report includes a clear banner alerting you to how many evals were merged to produce this report page.

Update: You’ll need a recent version of perl (5.8.9+, 5.10.1+, or 5.12+) to see the eval source code.

So why does NYTProf go to all the effort of merging evals and anonymous subs? Here’s a real-world example using a profile of perlcritic:

Without merging, those 4196 evals would have produced 4196 report pages! Their sheer volume making them almost useless. Now it’s clear from the report that only a few distinct source code strings are used and the reports for each are far more useful.

In that example perlcritic is compiling lots of tiny snippets of code. Many applications use string eval to compile large quantities of code. The ORLite module is one example. It dynamically generates and compiles a large chunk of code with many subroutines that implement a customized interface for a specific SQLite database file.

With NYTProf 3.11 you couldn’t see the hundreds of lines of source code, or the per-caller subroutine performance, or the individual statement performance. All you could see was a list of subs calls and the overall time spent in each:

I see a SQLite in the Distance

Getting all this working correctly, especially the data model manipulations required for merging evals and anonymous subroutines, was far more painful than I’d anticipated. I both blessed and cursed the test suite on numerous occasions!

I think it would be wise for NYTProf reporting and data model code to read the data from, and manipulate the data in, an SQLite database. That would yield simpler more maintainable code. It would also be enable nytprofhtml to be used for presenting performance data from other sources, including perl6.

If you’re interested in working on this, starting with a utility to load an nytprof.out file into a SQLite database, please contact the mailing list.

String Eval Timings

There’s a slight caveat worth noting about the timings shown for string evals that define subroutines.

Timings for string evals are taken from the statement profiler (the subroutine profiler doesn’t pay attention to evals). So the “time spent executing the eval” is the sum of the time spent executing any statements within the eval.

That’s fine for evals that don’t define subroutines. For those that do, the time for the eval includes not only the time spent executing the eval itself but also time spent executing statements in subroutines defined within the eval but called later from outside it.

Hence the careful wording you can see in the example from perlcritic shown previously:

I could have automatically subtracted the subroutine time from the eval time but I was wary about doing that (for some reason that currently escapes me). Maybe that’ll change in the near future. Further out, a future version of NYTProf might use the subroutine profiler logic to time evals. That’s a deeper change that would give a more natural view of the timings.

Other Changes in v4

Subroutines that couldn’t be associated with a perl source file, such as xsubs in packages with no perl source, used to not appear in reports at all. So associated caller and timing information couldn’t be seen. Now it can.

Similarly, subroutine calls that couldn’t be associated with a specific line, such as calls made by perl to END blocks, are now shown in reports. They appear as as calls from line 0.

NYTProf v3 added renaming of BEGIN subs, so a BEGIN (or use) on line 3 would be called BEGIN@3 and so kept distinct from others in the same package. NYTProf v4 takes that further by detecting the rare cases where the modified name isn’t unique and adding a sequence number to it, like BEGIN@3.59.

NYTProf v2 added the savesrc option to enable storing a copy of the profiled perl source code into the profile data file itself. This makes report generation immune from later changes to the source files. NYTProf v4 now enables that option by default.

The report generator used to only generate report lines up to the maximum number of source lines present. If there was no source code available, for whatever reason, you’d get an empty report for that file, even though there was useful information to report. Now the report generates enough lines to ensure all available profile information gets included. This is especially useful for old perl versions where source code is more likely to be unavailable. Also, the report generator now collapses groups of three or more blank lines.

Nicholas Clark contributed changes to refine the timing of the beginning and end of profiling. Now END blocks defined at runtime are included in the profile and compilation-only checks (e.g., perl -c) can also be profiled.

You may be aware that the POSIX::_exit function exits the process immediately, without flushing stdio buffers and without giving perl a chance to clean up. That means NYTProf didn’t get a chance to finish and the profile wasn’t usable. NYTProf v4 now intercepts calls to POSIX::_exit and cleans up properly.

Finally, tired of waiting for nytprofhtml to produce a report from a long profile run? The new --minimal (-m) option makes nytprofhtml skip building reports for the rarely used ‘blocks’ and ‘subs’ levels of detail and skips generating the graphviz .dot files. That saves a lot of time.

The Ann Arbor Perl Mongers group was being restarted (after a 10 year gap) by the TigerLead tech team. I’m working for TigerLead and was going to be in Ann Arbor for a meeting so they asked me to give a couple of talks: Devel::NYTProf and Perl Myths.

I like giving talks at events like these because there’s no set time limit and the audience is more relaxed (the free pizza probably helped).

I’ve uploaded a screencast of the Perl Myths talk. As usual it covers the Perl jobs market, CPAN, best practices, power tools, community and perl6. At almost 1 hour 20 minutes it’s significantly longer than my usual, more rushed, 40 minute version given at conferences and includes 15 minutes of Q & A at the end.

Filed under: perl Tagged: modern perl, myths, perl]]>2TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3812010-05-31T12:09:52Z2010-05-31T12:09:49ZContinue reading →]]>I’ve just been updating the page where I keep links to my presentations and noticed that, not only had I not updated the section for the 2009 Italian Perl Workshop, but I hadn’t even uploaded the screencasts I’d made.

So, with apologies for the delay, here’s my entry for IPW09, with the links to the uploaded screencasts:

Since then a few point releases have accumulated some changes and features worth mentioning:

Jan Dubois contributed portability fixes for Windows and 64bit configurations. NYTProf should now run well on most, if not all, Windows configurations with recent versions of perl.

Markus Peter contributed a sub-microsecond timer for Mac OS X. It yields the same 100ns resolution used on systems with POSIX realtime clocks.

Nicholas Clark has contributed a huge amount of work recently, including many optimizations and a major refactoring of I/O.

Nicholas’s nytprofmerge utility is now significantly faster as a result of those changes. He’s also fixed a bunch of edge cases. If you’re generating multiple profile data files and would like to merge them into a single report, nytprofmerge is now a very effective tool.

I fixed the usecputime=1 option as it was broken in several ways. It’s still of limited value and the docs now explain that more clearly.

I’ve also added a section to the docs to explain how to make NYTProf faster. For those rare cases where the performance impact of profiling is a problem.

Assorted crashing bugs and odd behaviors in edge cases (like goto &sub out of an AUTOLOAD being called for a DESTROY in perl <5.8.8) have been fixed. NYTProf now also behaves more sanely with multiplicity and threads (although it still can't actually profile multiple threads or interpreters).

One little UI tweak worth noting is that sortable tables now show a little arrow in the heading of the sorted column. If you didn’t know that you could click most column headings to sort by that column, hopefully the arrow will act as a visual reminder.

The only thing I’m likely to work on soon is the handling is string evals. They’re mostly hidden in the reports now. I need to improve that to make PostgreSQL::PLPerl::NYTProf actually useful. So that’s pretty much bound to happen sometime between now and my speaking at PGcon in May.

Filed under: perl Tagged: nytprof, performance, perl, postgresql]]>1TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3622009-12-24T23:16:13Z2009-12-24T23:08:54ZContinue reading →]]>After more than six months, and more than a few technical hurdles, NYTProf v3 has been released at last.

In this post I’ll review the major changes and significant new features.

What’s new in Devel::NYTProf v3?

Treemap

The first big feature is a visualization of the exclusive time spent in subroutines represented as a treemap:

That’s a treemap of a profile of perlcritic 1.088. The colors don’t mean anything. They’re just used to visually group subroutines in the same package. (I’m not very happy with the colors but the JIT toolkit I’m using doesn’t make it easy to use an attractive colour range. It interpolates a value in RGB color space. It would be much better to interpolate the value in HSV color space.)

The treemap is interactive! If you click on a square then the treemap is redrawn “zoomed in” one package level ‘closer’ to the package of the subroutine you clicked on.

Subroutine Caller Tracking

The subroutine profiler has been almost completely rewritten, yielding another major new feature. It now finds and records the name of the calling subroutine. (You might have assumed that NYTProf always did that. In fact it guessed based on the calling file and line number, and so was easily confused by nested subroutines and closures.) By properly tracking the calling subroutine NYTProf can now generate a more accurate call graph.

One immediate beneficiary is the nytprofcg utility (contributed by Chia-liang Kao). nytprofcg reads NYTProf profile data and generates callgrind data for viewing via Kcachegrind. The previous guessing behaviour limited the usefulness of nytprofcg. Now it works well, as you can see here:

I’ve not played with it much yet. If you do, let us know how it works out for you!

The subroutine called main::RUNTIME in the image above is the fake name that NYTProf gives to ‘caller’ of the main script code. Code run at compile time will have a top-level caller of main::BEGIN.

BEGIN

Speaking of BEGINs, they’ve always been a problem because there can be many of them in a single package. Each use statement, for example, generates a BEGIN sub that’s immediately executed then discarded. Previously the data for all those BEGINs was mashed together and so almost useless.

The NYTProf subroutine profiler now renames BEGINs by appending @linenumber to make them unique. A whole new level of detail is opened up by this change. (This, along with a few other new features, requires perl 5.10.1+ or 5.8.9+.)

Goto

NYTProf now handles goto &sub; properly. That tail-call construct is commonly found at the end of AUTOLOAD subroutines—so it’s more common than you might think.

The calling and called subroutine call counts and timings are updated correctly. For the call graph, the destination subroutine appears to have been called by the subroutine that called the subroutine that executed the goto. In other words, if A calls B and B does a goto &C, that call to C will show A as the caller. That fits the way goto &sub works, and ensures inclusive and exclusive times make sense.

Slow Opcode Profiling

This is another major new feature. NYTProf can now profile the time spent executing certain opcodes (the lowest-level units of execution in the perl interpreter).

I originally envisaged adding the mechanism for opcodes that corresponded to system calls (read, write, mkdir, chdir etc.) and called the feature ‘sys ops’. Then I realised there were other perl opcodes that would be worth profiling. The main two being match (m/.../) and subst (s/.../.../). So now the NYTProf subroutine profiler can now profile time spent in regular expressions!

Here’s an example:

The opcodes are given pseudo-subroutine names in the package that invoked the opcode with “CORE:” prepended to the opcode name. In the example above you can see two instances of CORE:match. One accounting for matches performed in the main:: package, and another accounting for matches performed in the File::Find:: package. (They’re marked ‘xsub’ above but I’ve changed that to ‘opcode’ now.)

Profiling of ‘slowops’, as I’ve called them, is controlled by the slowops=N option. A value of 0 turns off slowop profiling. A value of 2 (the default) gives the behaviour shown above, with opcodes called in different packages being accounted for separately. A value of 1 will put all the slowops into a single package named “CORE::“.

Here’s a simple example showing the calling relationship between packages in a little demo script I use for testing:

The dot file for that inter-package view is available as a link on the top-level index page of the report.

On the individual report pages for each source file there’s a link to another dot file. This one shows the calls into, out of, and between the subroutines in package(s) in that source file. For example, here’s the call graph for the subs in the File::Find module:

There are many things that could be improved with that graph, such as adding call counts. Overall though, I’m pretty happy with it.

Report Format Changes

There have been some changes to the main report columns:

There are two new columns, “Calls” and “Time in Subs”. They show the number of subroutine calls executed on that line, and the total time spent in those subroutines due to those calls. Both are color coded, relative to the other values in the same columns, using the same Median Average Deviation that’s used elsewhere.

To make room for the new columns, the column showing the average statement execution time has been removed (it wasn’t much use anyway) and the column headings tightened up. The average value, if you’re interested, is available as a tool-tip, as shown above.

New Options

A few new options have been added, including:

sigexit=S Some signals will abort a process leaving a corrupt profile data file. The sigexit option can be used to tell NYTProf to catch those signals and close the profile cleanly before exiting.

forkdepth=NWhen a process being profiled is forked the child process is also profiled. The forkdepth=N option can be used to limit the number of generations that are profiled. The default is -1 (all generations). A value of 0 effectively disables profiling of child processes.

log=FIf you enable NYTProf trace output, via the trace=N option, it’s normally written to stderr. The log=F can be used to write the log to a specific file instead.

nytprofmerge

As I mentioned above, when a profiled process forks, the child is also profiled, with the profile being written to a new file. So processes which have many children, like mod_perl, end up with many profile data files. Naturally many people have expressed a wish for NYTProf to be able to merge multiple profiles into a single report. Sadly no one has stepped up actually do the work, till now.

Nicholas Clark, who contributed the great zip compression for v2.04 (and a major contributor to the perl core and pumpkin for the 5.8.2+ releases) has come up trumps again. NYTProf v3 includes a new nytprofmerge utility that’ll read multiple profiles and write out a new, merged, profile. It’s very new, and somewhat experimental, but answers a very real need. Give it a whirl and let us know how it goes.

Screencast

I gave a talk on Devel::NYTProf at the (excellent) Italian Perl Workshop in October. I covered both the features in version 3 and the phased approach I take to optimizing perl code. You can watch the screencast.

And finally

It’s Christmas Eve here in Ireland. After days, and nights, of hard frost the countryside is spectacularly encased in tiny ice crystals sparking in the bright sunshine. I’m delighted to have stumbled into working on NYTProf. It’s a great project at the intersection of two of my professional passions: performance and visualization. And I’m delighted to give you NYTProf v3 in time for this Christmas.

Enjoy!

Tim.

Posted in perl Tagged: nytprof ]]>13TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3442009-10-06T09:01:55Z2009-10-05T22:00:42ZContinue reading →]]>I’m working with PostgreSQL for my day job, and liking it.

We’re fairly heavy users of stored procedures implemented in PL/Perl, with ~10,000 lines in ~100 functions (some of which have bloated to painful proportions). This creates some interesting issues and challenges for us.

There’s a window of opportunity now to make improvements to PL/Perl for PostgreSQL 8.5. I’m planning to work with Andrew Dunstan to agree on a set of changes and develop the patches.

As a first step along that road I want to map out here the changes I’m thinking of and to ask for comments and suggestions.

Goals:

Enable modular programming by pre-loading user libraries.

Soften the hard choice between plperl and plperlu, so there’s less reason to “give up” and use plperlu.

Improve performance.

Improve flexibility for future changes.

Enable use of tracing/debugging tools.

Specific Proposals:

Enable configuration of perl at initialization

Add ability to specify in postgresql.conf some code to be run when a perl interpreter is initialized. For example:

The Safe compartment used for plperl functions can’t access any namespace outside the compartment. So, by default, any subroutines defined by libraries loaded via plperl.at_init_do won’t be callable from plperl functions.

Some mechanism is needed to specify which extra subroutines, and/or variables, should be shared with the Safe compartment. For example:

plperl.safe_share = '$foo, myfunc, sum'

Permit some more opcodes in the Safe compartment

I’d like to add the following opcodes to the set of opcodes permitted in the Safe compartment: caller, dbstate, tms.

Execute END blocks at process end

Currently PostgreSQL doesn’t execute END blocks when the backend postgres process exits (oddly, it actually executes them immediately after initializing the interpreter). Fixing that would greatly simplify use of tools like NYTProf that need to know when the interpreter is exiting. Updated: used to say “at server shutdown” which was wrong.

Name PL/Perl functions

Currently PL/Perl functions are compiled as anonymous subroutines. Applying the same technique as the Sub::Name module would allow them have ‘more useful’ names than the current ‘__ANON__’.

For a PL/Perl function called “foo”, a minimal implementation would use a name like “foo__id54321″ where 54321 is the oid of the function. This avoids having to deal with polymorphic functions (where multiple functions have the same name but different arguments).

The names won’t enable inter-function calling and may not even be installed in the symbol table. They’re just to improve error messages and to enable use of tools like Devel::NYTProf:: PgPLPerl (as yet unreleased).

Miscellaneous updates to the PL/Perl documentation

To document the new functionality and expand/update the related text.

Improve Performance

It seems likely that there’s room for improvement. Some code profiling is needed first, though, so I’ll leave this one vague for now.

Any comments on the above?
Anything you’d like to add?

If so, speak up, time is short!

Footnote

For completeness I’ll mention that I was thinking of adding a way to permit extra opcodes (plperl.safe_permit=’caller’) and a way to use a subclass of the Safe module (plperl.safe_class=’MySafe’). I dropped them because I felt the risks of subtle security issues outweighed the benefits. Any requirements for which these proposals seem like a good fit can also be met via plperl.at_init_do and plperl.safe_share.

Posted in perl Tagged: plperl, postgresql ]]>14TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3392009-09-22T21:41:32Z2009-09-22T21:41:26ZContinue reading →]]>Last weekend I went up to Dublin to speak at OSSBarcamp. I took the train from Limerick on Friday so I’d already be in Dublin the following morning, without having to get up at the crack of dawn.

Dublin.pm

Aidan Kehoe and I had a very small but interesting Dublin.pm meeting that night. Their first since 2004! Our wide-ranging discussions that night included me trying to understand what led Dublin.pm to flounder instead of flourish. I think a key factor was the (implicit?) expectation that members should make technical presentations.

Living in the west of Ireland there aren’t enough local Perl users (that I’ve found so far) to have a viable Perl Mongers group. So I setup the Limerick Open Source meetup instead.

Here’s what worked for us: We sit around in a quiet comfy hotel bar and chat. Naturally the chat tends towards the technical, and laptops are produced and turned around to illustrate a point or show results of a search, a chunk of video etc. There’s no set agenda, no declared topics, and no presentations. And yet, I think it’s fair to say, that everyone who’s come along has learnt interesting (albeit random) stuff.

I’d like to hear from perl mongers, in groups of all sizes, what kinds of balance between the social and technical aspects of Perl Mongers meetings works (or doesn’t work) for you.

OSSBarcamp

At OSSBarcamp I gave a ~15 minute ‘lightning talk’ on Devel::NYTProf in the morning, and a ~50 minute talk on Perl Myths in the afternoon.

There is so much happy vibrant productive life in the Perl community that updating the presentation has been lovely experience. I keep having to revise the numbers on the slides upwards. There are lots of great graphs and they’re all going upwards too! (Many thanks to Barbie for the great new graphs of CPAN stats.)

I’ve put a PDF of the slides, with notes, on slideshare. Best viewed full-screen or downloaded.

I made a screencast but I think I’ll hang on to that until after I give the same talk, updated again, at the Italian Perl Workshop (IPW09) in Pisa in October — I’m really looking forward to that! I’ll make another screencast there and decide then which to upload.

After OSSBarcamp last week, and before IPW09 in late October, I’ll be flying to Moscow, visa permitting, to give a talk at the HighLoad++ (translated) conference. I’ve never been to Russia before so that’s going to be an amazing experience!

Posted in ireland, perl Tagged: conference, graphs, jobs, language, myths, ossbarcamp, perl6, presentation ]]>0TimBuncehttp://www.tim.bunce.namehttp://timbunce.wordpress.com/?p=3262009-09-10T15:39:11Z2009-09-10T12:00:54ZContinue reading →]]>I just added a concluding slide to my updated Perl Myths talk. Having comprehensively debunked some myths with hard facts about perl and its ecosystem, I wanted to end with a slide that summarized some truths.

I liked the slide text so much I wanted to share it with you:

Perl:

has a massive library of reusable code
has a culture of best practice and testing
has a happy welcoming growing community
has a great future in Perl 5 and Perl 6
is a great language for getting your job done
for the last 20 years, and the next 20!

It would make more sense after seeing the talk, but I think it stands well on its own as a summary of Perl.