Menu

Edit: 2015/7/1: Sessions will now be held each day of the conference during the afternoon coffee breaks from 3 – 3:30 PM.

Edit: 2015/7/2: Sessions will be held in room 210 on the main level of the conference center.

This year at SciPy 2015 I’d like to run some informal “office hours” help sessions to help people with any questions they might have. I can imagine questions about:

scientific Python libraries (NumPy, SciPy, Pandas, matplotlib…)

software installation (Anaconda, conda, pip…)

software packaging

Git & GitHub

the command line (shell)

web applications

much more!

The sessions will be during the afternoon coffee breaks 3 – 3:30 PM each day of the conference (Wednesday – Friday). The SciPy organizers have very kindly reserved room 210 for the sessions. Follow me on Twitter for any last minute updates.

If there seems to be significant interest I’ll try to find times for some additional sessions, but that might be hard to do.

In the interest of helping to improve the diversity and beginner friendliness of the SciPy conference, I’m offering to help first-time speakers from underrepresented groups with their talk proposals and potential talk preparations for SciPy 2015. If that sounds like you and you’d like my help editing a proposal and/or preparing a talk, send me an email.

The Palettable API has also been updated for the IPython age. All available palettes are now loaded at import and are available for your tab-completion pleasure. Need the YlGnBu palette with nine colors? That’s now available at palettable.colorbrewer.sequential.YlGnBu_9. Reversed palettes are also available with a _r suffix.

It has been over two years since Erik Bray and I made the first release of SnakeViz 0.1, a tool for visualizing performance profiles of Python code. It had multiple performance bottlenecks, but it worked just well enough that it took me a long time to prioritize making improvements. That time has finally come around and I’m happy to announce that SnakeViz 0.2 is now available!

What’s New

The look and feel of SnakeViz remains much the same (see a screenshot), but there are some new things on the screen:

Detailed function information when hovering over the visualization

Call stack list for tracking where you are when zooming the visualization

Control the depth of the displayed call tree

Limit the display of functions that take up relatively little time

Under the Hood

The first release of SnakeViz had some performance bottlenecks:

It tried to transfer a complete call tree from the server to the client as JSON

It tried to display the entire call tree in the sunburst visualization

Those limited the usefulness of SnakeViz with profiles that contained calls to a lot of functions. The version 0.2 release is an almost complete rewrite in order to make SnakeViz work with larger profiles.

The first limitation is addressed by moving the building of call trees into the client application. Profile data is passed from the server to the client in close to the same form as it’s available from Python’s pstats module. Once in the client, the profile data is used to construct call trees on demand for visualization.

The second limitation is addressed by limiting how much of the profile is visualized at once. Call trees are built only to a user specified depth and users can opt to omit functions that do not use much time from display. (The “depth” and “cutoff” controls.)

I and others have tested SnakeViz 0.2 with some fairly large profiles and found it works. You can read more about SnakeViz in the updated docs. Please give it a try! Issues can be reported on GitHub.

Testing Python results is often as straightforward as assert result == expected, especially with builtin types. But that doesn’t work with NumPy or Pandas data structures because using == with those doesn’t return True or False. Instead, == results in new arrays filled with boolean values. This is useful for boolean indexing, but leads to this error when testing:

In [2]: a = np.arange(10)
In [3]: b = np.arange(10)
In [4]: assert a == b
------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-6bf76ad3480a> in <module>()
----> 1 assert a == b
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()

You can check whether all the elements in two arrays are equal using the .all() method:

In [5]: (a == b).all()
Out[5]: True

But that errs if the arrays are different sizes/shapes, and the result is an uninformative True or False when they are the same size. Luckily, NumPy has this situation covered.

Library Versions

For reference, these are the versions of NumPy and Pandas I’m currently using:

The examples show how you get somewhat descriptive output when the comparisons fail, including if the shapes are mismatched and what percentage of elements differ between the two arrays.

Similar functionality is available in the array_equal function, which returns True or False instead of raising an exception.

assert_allclose

assert_array_equal checks for exact equality. That’s fine for integer and boolean values, but often fails with floating point values because of very slight differences in the results of values calculated different ways or on different computers. For comparing floating point values I use assert_allclose.

assert_allclose takes atol and rtol arguments for specifying the absolute and relative tolerance of the comparison. For the most part I leave these at their defaults: atol=0 and rtol=1e-07. That’s a small enough tolerance that I’m confident the numbers are quite close, but large enough to let floating point noise go through. Sometimes, though, it’s useful to choose custom tolerances. For example, I was once writing tests based on numbers I copied out of a paper. The numbers were provided to four decimal places so in my tests I used npt.assert_allclose(result, expected, atol=0.0001). Choosing appropriate tolerances for testing with assert_allclose can be tricky depending how accurate you expect your code to be. Unfortunately, I don’t have any great advise on that.

Notes

One very handy thing about assert_array_equal (and its scalar friendly cousin assert_equal) is that it handles values like nan intelligently. Normally nan compared to anything else, even nan, results in False. That’s the official, expected behavior, but it does make testing harder. assert_array_equal handles this for you.

Note that array_equal and equal behave in the official manner and will always return False for comparisons to nan.

Testing with Pandas

Pandas also has a testing module, but it is apparently meant more for internal testing of Pandas itself than for Pandas users. There is no documentation page for it, but it’s still available and I use it in testing. I import it via import pandas.util.testing as pdt.

The three main things I use are assert_frame_equal, assert_series_equal, and assert_index_equal. assert_frame_equal and assert_series_equal take arguments that let you control whether the comparisons are exact or approximate, and whether to compare types in addition to value equality. By default they use an allclose-like comparison.

assert_frame_equal is sensitive to the order of columns and rows in the tables. I’ve found this is not always what I want, sometimes it’s fine if ordering changes as long as the same column names and index labels are in both tables. I’ve made my own assert_frames_equal function for testing that case.

Just because you’re using complex data containers like arrays and DataFrames in your code doesn’t mean you can’t test it. NumPy and Pandas are themselves heavily tested and you can test your own code using the same utilities the NumPy and Pandas developers use. Happy testing!