Listen. Reflect. Explore. Solve.

Post navigation

NYTProf 2.08 – better, faster, more cuddly

I’ve just released NYTProf 2.08 to CPAN, some three and a half months after 2.07.

If you’ve been profiling large applications then the first difference you’ll notice is that the new version generates reports much faster.

The next thing you may notice is that statement timings are now nicely formatted with units. Gisle Aas contributed the formatting code for 2.07 but I had to do some refactoring to get it working for the statement timings.

Another nice refinement is that hovering over a time will show a tool-tip with the time expressed as a percentage of the overall runtime.

Almost all the tables are now sortable. I used jQuery and the tablesorter plugin for that. I’ve not added any fancy buttons, just click on a table heading to sort by that column. You’ll see a little black arrow to show the column is sorted. (You can hold the shift key down to add second and third columns to the sort order.)

A profiler isn’t much use if it’s not accurate. NYTProf now has tests for correct handling of times for string evals within string evals. In fact the handling of string evals got a big overhaul for this version as part of ongoing improvements in the underlying data model. I’m working towards being able to show annotated performance reports for the contents of string evals. It’s not there yet, but definitely getting closer.

A related feature is the new savesrc=1 option. When enabled, with a recent version of perl, the source code for each source file is written into the profile data file. That makes the profile self-contained and, significantly, means that accurate reports can be generated even after the original source files have been modified.

Another new option is optimize=0. You can use it to disable the perl optimizer. That can be worth doing if the statement timings, or counts, for some chunk of code seem odd and you suspect that the perl optimizer has rewritten it.

The final new feature noted in the NYTProf 2.08 Changes file is that it’s now possible to generate multiple profile data files from a single application. Since v2.0 you could call DB::disable_profile() and DB::enable_profile() to control profiling at runtime. Now you can pass an optional filename to enable_profile to make it close the previous profile and open a new one. I imagine this would be most useful in long running applications where you’d leave profiling disabled (using the start=none option) and then call enable_profile and disable_profile around some specific code in specific situations – like certain requests to a mod_perl app.

There’s one more new feature that I’ve just realised I’d forgotten to add to the Changes file before the release: Timings per rolled-up package name. What’s that? Well, it’s probably easiest to show you…

These images are taken from a profile of perlcritic. Each shows the time spent exclusively in subroutines belonging to a certain package and any packages below it. Hovering over a time gives the percentage, so I can see that the 57.3s spent in the 36 PPI packages accounted for 42% of the runtime.

This gives you a quick overview for large (wide) codebases that would be hard to get in any other way.

Tables are generated for upto five levels of package name hierarchy, so you can drill-down to finer levels of detail.

I can visualize a much better UI for this data than the series of tables nytprofhtml currently produces, but my limited free time and jQuery skills prevent me doing more. Patches welcome, naturally.

Enjoy!

p.s. I’ve a screencast from my NYTProf talk at the London Perl Workshop in November I hope to (finally) upload soon. It includes a demo of the package roll-up timing tables.

Like this:

Related

9 thoughts on “NYTProf 2.08 – better, faster, more cuddly”

Just wondering, has there been any progress on getting a way to merge the nytprof.out output from multiple processes into one merged report? I’d love to be able to use it more with SpamAssassin, and we tend to use multiple processes a lot.

Hi Justin. David Steinbrunner popped up on the mailing list recently offering to work on this [http://groups.google.com/group/develnytprof-dev/browse_frm/thread/bb6e5a6af6b62353] but nothing much has come from it yet.

I have a simulator that forks off 2 groups of processes. Ideally, I’d like to view the performance of the whole bunch together. It’s also hard to piece together which process is which, if I’d wanted to inspect the pieces manually.

I’m also interested in overall performance, so I want to know which system calls are blocking the most (wall clock time). Does this tool provide that?