Intel Releases Optimized Python for HPC

Is Python overtaking Java as the most popular programming language? Python has successfully found its way into nearly all the major application areas, including data science and machine learning, financial transactions, academic research, and, in particular, numerical computing for science and engineering.

Why is this surprising? Because Python, which was introduced in 1991 as a good way to learn programming, has been long thought to be a slow interpretive language that’s just not the right choice for large industrial-strength applications.

However, the myths about Python seem no longer to be true. Easy to learn and understand, Python comes with a very active user community. As a result, today Python is a significant contender, especially with younger or beginner programmers. What attracts people to Python is its clean style, and the vast number of open source library packages that seem to have solutions for everything worth doing. If your day job is data analysis or computational physics and not computer science, and programming is just one of the tools you use to get work done, Python is now a reasonable choice.

So it’s no surprise to see the growing presence of Python in high performance computing, big data, machine learning, and data science. But what about performance, particularly for really big data sets, and for the seriously compute-bound applications in science and engineering? With all the recent breakthroughs in hardware, processing speed, clustering, storage, and so on, how can Python applications take advantage of these advances, especially the latest Intel Xeon® and Intel Xeon Phi™ processors?

The recent Intel Distribution for Python release, part of Intel Parallel Studio XE 2017, shows that near native C performance can be achieved with compilers and library packages optimized for the Intel architecture. Moreover, the library packages targeted for big data analysis and numerical computation included in this distribution now support scaling for multi-core and many-core processors as well as distributed cluster and cloud infrastructures.

By implementing popular Python packages such as NumPy, SciPy, scikit-learn, to call the Intel Math Kernel Library (Intel MKL) and the Intel Data Analytics Acceleration Library (Intel DAAL), Python applications are automatically optimized to take advantage of the latest architectures. These libraries have also been optimized for multithreading through calls to the Intel Threading Building Blocks (Intel TBB) library. This means that existing Python applications will perform significantly better merely by switching to the Intel distribution.

And, Python is compilable. The distribution includes both the Numba just-in-time (JIT) compiler, and the Cython compiler that gives C-like performance along with Python bindings for many C and C++ libraries.

Even more surprising, Python supports distributed parallel systems with the mpi4py library, which interfaces the Intel MPI Library over InfiniBand and the Intel Omni-Path communications fabric. The result is decreased latency and increased scaling for distributed Python applications.

But it’s not just the optimized libraries and compilers that generate vector/parallel code for the latest Intel processors; there is first-class Python support with the Intel Vtune™ Amplifier, the performance analyzer in the Intel Parallel Studio XE suite. Intel Vtune provides line-by-line source code profiling to help find and correct issues causing performance hot spots or bottlenecks in Python as well as in C and C++ source.

It’s also no surprise that many data scientists already utilize Python for their data analysis production jobs. But poor performance dominates. Intel has included optimized popular machine learning packages such as scikit-learn, as well as a Python version of their Data Analytics Acceleration Library, pyDAAL, in the High Performance Python distribution. Benchmarks show significant performance benefits with both, including in many-core environments.

Clearly, Python is already here to stay in the HPC domain. With Intel Distribution for Python, those early myths are history.

Intel Python Distribution, along with other optimized tools and compilers, is solidly integrated into Intel Parallel Studio XE 2017.

Resource Links:

Latest Video

Industry Perspectives

Addison Snell gave this talk at the Stanford HPC Conference. “Intersect360 Research returns with an annual deep dive into the trends, technologies and usage models that will be propelling the HPC community through 2017 and beyond. Emerging areas of focus and opportunities to expand will be explored along with insightful observations needed to support measurably positive decision making within your operations.” [READ MORE…]

White Papers

This pioneering study focuses primarily on the virtual performance of throughput workloads. Download the new white paper from VMWare that explores the possibilities of virtualizing HPC throughput in computing environments.