Comparing Python, NumPy, Matlab, Fortran, etc.

We carry out a series a basic experiments to compare Python related packages (Python, NumPy) and compilers (GNU Fortran, Intel Fortran). We also add Matlab and Java in our study. The primary objective of this exercise is to determine how NumPy performs with respect to the other packages and compilers. All the calculations were carried out in dali.

Compilers/Packages

Version

Python

2.5.4

NumPy

1.3.0

Matlab

matlab-R2008a

GNU Fortran (gfortran)

4.1.2

Intel Fortran (ifort)

intel-11.1.038

Java

1.6.0

Problem 1

This example shows the importance of avoiding loops (as far as possible) when manipulating arrays in NumPy. We have a 5000x5000x3 matrix A and we want to perform the following operations on A:

We record the elapsed time needed to do the array assignments. The results are summarized on the table below.

Packages/Compilers

Elapsed Time (s)

Python

48.55

NumPy

0.96

Matlab

2.398

gfortran

0.540

gfortran with -O3

0.280

ifort

0.276

ifort with -O3

0.2600

Java

12.5518

In Fortran, NumPy and Matlab we did not use the loop to carry out the desired operations. We instead use array indexing. Note how slow was Python and how efficient was NumPy.

Problem 2

Here, we want to multiply two randomly generated nxn matrices A and B:

C=AxB

For NumPy and Matlab, we use the predefined matrix multiplication functions whereas in Fortran, we wrote the code to perform the multiplication. The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).

Compilers/Packages

Configuration

n=1000

n=1200

n=1500

NumPy

intrinsic function

6.1

10.48

29.31

Matlab

intrinsic function

0.548

0.714

0.97

triple do-loop

116.90

201.16

385.95

gfortran

triple do-loop

7.844

13.736

28.077

matmul

1.328

2.5401

8.484

-O3 and triple do-loop

1.7121

3.236202

9.156573

-O3 and matmul

1.328

2.532

8.364

ifort

triple do-loop

0.9960630

2.112131

7.372461

-O3 and triple do-loop

0.9800609

2.056128

7.356460

matmul

3.5642

6.224

12.6927

-O3 and matmul

0.3760

0.6480

1.3560

DGEMM(*)

0.2360

0.4480

0.8080

Java

1.92805

3.6075

10.2055

(*) We did not use any compilation option and only link with "-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread".

The above table suggests that built-in functions are more appropriate to perform matrix multiplication. DGEMM is far more efficient. It is important to note that DGEMM is more suitable for large size matrices. If for instance n=100, the function matmul out performs DGEMM. An interesting discussion on the performance of DGEMM and matmul using the Intel Fortran compiler can be read at:

The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.

Second-Order Scheme

Compilers/Packages

n=50

n=100

Python

31.9193

509.859

NumPy

0.41324

4.19535

Matlab

0.460252

5.483992

gfortran

0.212

2.496

gfortran with -O3

0.056

0.696

ifort

0.036

0.368

ifort -O3

0.036

0.368

Java

0.0537

0.9012

Fourth-Order Compact Scheme

Compilers/Packages

n=50

n=100

Python

46.15203

751.783

NumPy

0.610216

6.38891

Matlab

0.640044

6.531990

gfortran

0.236

3.248

gfortran with -O3

0.088

1.256

ifort

0.052

0.656

ifort -O3

0.052

0.672

Java

0.12180

2.2022

Once again, we observe that Python is not suitable to manipulate arrays. NumPy and Matlab have comparable results whereas the Intel Fortran compiler displays the best performance. Java did not use array indexing like NumPy, Matlab and Fortran, but did better than NumPy and Matlab.

The results presented above are consistent with the ones done by other groups: