Python for IDL users

A guide for IDL users using Python for the first time. This web page is intended to help get you started using Python and making plots with Matplotlib. Please report any errors or ommisions to me - a.j.heaps@reading.ac.uk.

A more detailed look at Python is available on-line at Johnny Lin's website.

Setting up Python

To use the Canopy Enthought Python distribution in the Meteorology department type:setup canopy

Users outside the department will need Python with the NumPy, matplotlib, basemap and netCDF4 packages installed.

Importing packages

In IDL all the functionality of the language is available by just typing idl - maps, stats etc. In Python you have to explicitly import packages you need. Some of the commonly used packages are:

After importing a package functions can then referenced - for example, np.mean(a) for the NumPy mean function.

Running Python programs

In Python no end statement or procedure/function name is needed. To run a Python program typepython prog.py

Python basics

Python floats are at least 64 bit numbers.
Python integers are 32 bit and not 16 bit as in IDL. A long integer is specified by putting a L or l after the number. Long integers can grow as large as is needed.

In Python variable names case sensitive while in IDL they are case insensitive. There is no need to declare variable types - just use them. As in IDL you can reassign variables to be different types.

# is the comment symbol. All text after a # is a comment.
Use print a,b,c to see the variables a,b,c. In IDL this would be print, a,b,c.

Array elements are accessed via [] brackets only. In IDL you can use both [] and (). Both Python and IDL indicies start at zero - i.e. the first element is a[0].

Indentation is the way of closing if then statements or loops in Python - there's no need for an endif or endfor as in IDL.

The rules of operator precedence are the same as in IDL - an integer multiplied by a float gives a float.

Python has a C style data interface while IDL has a Fortran style. If we have read in a standard atmosphere grid, in IDL it will appear as longitude, latitude, height, in Python it will appear as height, latitude, longitude. In IDL you would use a[i,*], the equivalent in Python is a[:,i]. See section on netCDF reading later.

Loops in Python

for i in np.arange(4):
print i

0
1
2
3
Python is indentation sensitive so make sure your loop statements are indented. There is no end of loop statement in Python.

Large loops in Python are very slow. The following example is ten times slower than in IDL for a loop of 100 million points. A standard UM grid 96x73x22 for 360 days has 57 million points so this is not an unusual amount of data to be looking at.

First of all we create some test random data from -5 to 5. We will loop over the data and set the b array to be 1 for all points that are greater than zero.a=np.random.rand(100000000)*10-5
b=np.zeros(100000000,dtype=int)
pts=np.arange(0,100000000)
for i in pts:
if a[i] > 0: b[i]=1
This took 160 seconds for a 100 million points.
Using pts=range(0,100000000) reduces this to 100 seconds.

Using NumPy where gives a much quicker answer
b=np.where(a < 0, a, 1) took 3.1 seconds

Using WHERE in IDL completed the same task in 0.16 seconds.

Creating arrays

The NumPy arange function is like INDGEN, FINDGEN in IDL. Up to three arguments are generally used - min, max and step. Note that result doesn't include the max value passed to it. np.arange(5) - array([0, 1, 2, 3, 4])np.arange(2,6) - array([2, 3, 4, 5])np.arange(3,9,1.5) - array([3., 4.5, 6., 7.5])

Passing data to routines and getting results back.

lonticks,lonlabels=mapaxis(min=plotvars.lonmin, max=plotvars.lonmax, type=1)
Define a routine with a def statement.

def myproc(min=min, max=max):
return max-min

Don't forget to indent the lines after the procedure def line.

myproc(min=1, max=4) - 3myproc(max=4, min=1) - 3myproc(1,4) - 3 - this is using Python's keyword position to avoid putting in the min and max keywords.

To return multiple values return them comma separated with the return command

def myproc(min=min, max=max):
return max-min, min+(max-min)/2.0

myproc(min=1, max=4) - (3, 2.5)a,b=myproc(min=1, max=4)

Reading in netCDF data

Use the netCDF4 python interface.from netCDF4 import Dataset as ncfilenc=ncfile('gdata.nc')lons=nc.variables['lon'][:]lats=nc.variables['lat'][:]p=nc.variables['p'][:]temp=nc.variables['temp'][0,:,:]
: is a delimeter which here means all values. Python orders its data C style and IDL Fortran style. temp is a lon,lat,height grid and that's how it would appear in IDL. In Python it appears as height,lat,lon. In this example temp=nc.variables['temp'][0,:,:] gives the first value of pressure - 1000mb and all the longitudes and latitudes.
The data file - gdata.nc

Timing functions

To see where your code is spending the most time use the time moduleimport timet1 = time.time()
do stufft2 = time.time()print 'Time taken was ', t2-t1, 'seconds'

Python projects in the department

Making plots

Matplotlib is the basis for all the following plots. The code and data are available under each plot. Basemap is used here for the map plots.

Graphs

ex35.py
- sine and cosine curves and a legendex36.pydata
- lines with symbols, a non-numeric x-axis and minor tick marksex38.pydata
- one dataset but with two colours to represent different locationsex40.pydata
- filling a curve with colourex41.pydata
- graph with two y axes using twinxex43.py
- using cubic spline fitting from scipy
ex44.py
- line fitting
ex45.pydata
- error plotex46.pydata
- bar plot. Horizontal lines behind bars were created using zorder=-1 to place them
behind all the plot elements.