Post navigation

Python for Scientific Computing – a collection of resources

This post is about the Python ecosystem for scientific/ technical computing. Generally, when someone says that he/she is using Python for technical computing, we must interpret it as the “Python ecosystem for scientific/ technical computing”. Vanilla Python, which is a general purpose, versatile language was not designed for and is not suitable for technical computing (such as linear algebra, symbolic computing, vectorized operations etc.) in itself. However, the language provided just the right set of tools, and a framework within which scientists and engineers could easily implement their ideas. Python was quickly embraced by the general scientific community which built several packages using Python that are quite suitable for technical computing. Currently there hundreds of different Python-based libraries. This post is meant to be a basic introduction to a core set of scientific packages in the Python ecosystem, for someone new to Python (though I highly doubt I have any such audiences).

Python is a very powerful language for doing all sorts of things, and at all stages of research — from general computing, system programming, design of experiments, building device interfaces, connecting and controlling multiple hardware/software tools, to heavy scientific number crunching, data analysis and visualization. Python is an interpreted, general purpose, object-oriented, high-level scripting language, which supports multiple programming paradigms — procedural, object-oriented, and functional programming. The core design philosophy of Python are simplicity, code readability, and expressivity.

Python is easy to learn. It is intuitive and simple, yet it is powerful, beautiful and expressive.

What makes Python particularly attractive for scientist and engineers is that it is open-source, highly portable, intuitive to use, and features dynamic and strong typing. It provides both interactive and script based programming environments like MATLAB. It also features automatic memory management and garbage collection enabling scientists and engineers (with or without strong computer science background) to direct their time and energy on their algorithm and let the interpreter handle the low-level stuffs. All the above qualities in addition to its large scientific community-support allows greater opportunity for code-sharing, open and collaborative research, and thus it supports the philosophy of reproducible research.

Python itself is a full featured programming language with a large set of tools in the standard library (sure you have heard “batteries included”!!) What is even more attractive is that there is a whole technical computing ecosystem around Python built by the different scientific communities. Most of the tools are built on top of Numpy, which itself is built on top of Python. Numpy extends Python with capabilities such as vectorization, homogeneous arrays, multi-dimensional arrays, fast element-wise operations, broadcasting and universal functions that are essential for scientific/technical computing. The figure below shows a basic landscape of the scientific python ecosystem. Please note that it is not meant to be a complete, rather a very basic reference to the most common tools around Python for scientific computing.

The ease and interactivity of the language coupled with the availability of good community support and specialized scientific libraries enables a newcomer to quickly learn and do meaningful work using Python. The interested reader can read more about the arguments about advantages of Python and how it compares to other languages here.

When I started learning Python I collected a number of resources related to the use of Python for scientific and numerical computation. I think these resources may help someone who is just getting started using Python for Scientific computing. For this reason I have decided to share a list of those resources here in this blog post.

Before listing out the resources, I would like to recommend someone new to Python to download any of the free distribution packages (especially if you are using Windows), such as Anaconda, Enthought Python Distribution (now known as Canopy), pythonxy and Pyzo. Also, for Windows system there is WinPython which is great as multiple instances of WinPython can be used without interfering with each other. Anaconda from Continumm Analytics also has this capability. Additionally, WinPython may also be used as a portable environment. The above mentioned package distributions come pre-loaded with all the common packages and toolboxes required for scientific computing, IPython (an advanced Python shell and highly interactive editor), plotting libraries like Matplotlib (a powerful 2D plotting library with limited 3D plotting capability) and Mayavi (3D plotting library, using VTK underneath), and it will allow one to quickly get started without having to worry about how and what group of packages to install. For someone interested in using an integrated development environment, I can highly recommend the free and open-source IDE called Spyder. It is specially built for scientific computing with Python. The EDP package comes with Scite IDE and not the Spyder IDE. Although Scite is great I like Spyder more than Scite. One can separately install Spyder along with the EDP package. The new Enthought Canopy comes with the Canopy IDE which is very similar to Matlab’s IDE. (Personally, I use the IPython notebook 80% of the time for developing initial code as my work involves a lot of experimentation. The few times I need an IDE, I use Spyder. Other times, I use the Sublime Text editor. I use EPD, Anaconda and WinPython distributions in my different machines.)

For getting started, the “fundamental” tagged videos lectures from Marakana is great.[Updated: 08/03/2015]

Online Python Tutor is a great way to understand the guts of Python by actually “seeing” how python programs get executed.

pyvideo.org has a huge collection of Python related videos indices (not actual videos) on the internet (including most of the Python conferences).

Doug Hellmann’s Python Module of the Week (PyMOTW) series is a tour of the Python standard library using short examples. This resource is really useful to quickly understand and put to use any of the modules in the standard library.

Python Scientific Lecture Notes (Scipy Lecture notes), edited by Valentin Haenel, Emmanuelle Gouillart and Gael Varoquaux (❗ First stop, if you are new). This is a wonderful resource and it can also act as a quick reference. It is certainly worth bookmarking this page.

Introduction to NumPy and Matplotlib using IPython by Eric Jones (SciPy 2012) This is a tutorial video, and quite enlightening one. Concepts of slicing and indexing ndarrays are lucidly explained. A brief overview of the low-level memory layout of the numpy arrays is presented towards the end.

Matplotlib: Lessons from middle age (Scipy 2012 by John Hunter) This video is actually not a tutorial video. In fact it is a keynote speech given by the late John Hunter in which he reflects upon and gives advises on developing successful open-source projects based on his experiences of developing matplotlib. There is not much to learn about using matplotlib itself, in this talk.

Scipy2011 tutorials including recordings of Introduction to SciPy: Optimization, linear algebra, statistics, and more …, Guide to Symbolic Mathematics with SymPy, and Statistical Learning with scikit-learn.

Of course, this is just the beginning. I have purposely not included links to many other powerful scientific python tools such as Cython (or Numba) for speeding up performance up to 1000 folds and wrapping C codes, Pandas for data science, or PyTools/h5Py for using HDF5. This post is meant to be a guide for someone staring off on using Python for scientific applications.

This is how my collection looks like mostly. Obviously, I acknowledge that I have surely missed some great tutorial/blog/website/video related to the use of Python for Scientific and Numerical computation. So, I would like to request my reader to please let me know about any such resource that you may be aware of. I would be more than happy to update the above list. Thank you very much.

CAUTION: DO NOT USE --pylab inline

If you are in practice of staring IPython notebook using the --pylab inline option, immediately stop using it. If you are new, you might find in many places that recommend the use of --pylab inline option but please don’t get this bad habit. Use %matplotlib [inline|qt|osx|gtx]. For details see No Pylab Thanks.

Thanks, I settled on Anaconda yesterday. I’ll be using your wonderful collection of resources to learn how to use it. Thank you very much, your site is quite valuable to me and I appreciate your effort!