From how the operating system handles your requests through design principles on how to use concurrency and parallelism to optimize your program's performance and scalability. We will cover processes, threads, generators, coroutines, non-blocking IO, and the gevent library.

How processes, threads, coroutines, and non-blocking IO work from the operating system through code implementation and design principles to optimize Python programs. The difference between parallelism and concurrency and when to use each.

The premise is that to make an informed decision you need to know what is happening under the hood. Once you understand the low level functionality, you can make the correct decision in the design phase.

The emphasis is on practical application to solve real world problems.

In this tutorial, I will cover how to write very fast Python code for data analysis. I will briefly introduce NumPy and illustrate how fast code for Python is written in SciPy using tools like Fwrap / F2py and Cython. I will also describe interesting new approaches to creating fast code that is leading changes to NumPy on a fundamental level.

In this tutorial, I will cover how to write very fast Python code for data analysis including making use of NumPy and using GPUs. I will largely focus on writing extensions to Python using hand-wrapping and Cython but will touch also on using tools like weave, Instant, ShedSkin and compare them to PyPy. I will also spend the last part of the tutorial on using GPUs with Python and discuss the performance trade-offs of the technology. This will be a high-level overview of the space with deep dives in Cython and GPUs