This is the first article in a series, to read the entire series go here.

Why Python is Interesting

Programs must be written for people to read,
and only incidentally for machines to execute.
-Abelson and Sussman

Readable Syntax

When Guido van Rostrum designed Python it's clear that he had this principle, if not this exact quote in mind. The result is a language which often reads like pseudo-code. For example:

def buyFruit(fruit):

if fruit.color == 'red'and fruit in[Apple, Orange, Tomato]:

return fruit.price

Flexible Semantics

Although Python primarily an object oriented/procedural language, it has functional elements including functions as first class objects, lambda expressions and map/filter/reduce functions. This gives considerable flexibility in how you go about solving problems.

An excellent summary of the design philosophy that guides both Python's code design and the style of Python programs in general see: The Zen of Python

List Comprehension

Python's list comprehension syntax is a compact and clear way to create new lists from existing lists. The syntax itself looks a lot like set comprehension notation as it's used in discrete mathematics. Here are a few examples:

Given the base list of integers:

xs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can create a list of every integer doubled with:

[x *2for x in xs]

Or you can create a list of only odd integers with:

[x for x in xs if x %2 == 1]

Or a combination of both, doubling all of the odd integers from 1 to 10:

[x *2for x in xs if x %2 == 1]

Generators

Generators are a powerful addition to Python's syntax. They permit a form of lazy evaluation where they create values as they are needed rather than all at once. For example, a common way to write a function for a Sieve of Eratosthenes to generate prime numbers is:

def generatePrimes(topValue):

primes = [2]

for testValue inrange(3, topValue + 1):

divisors = [prime for prime in primes if testValue % prime == 0]

if divisors == []:

primes += [testValue]

return primes

The above function will, when called, generate all of the prime numbers from 2 up to the value given in the argument. The problem is that it will generate them all immediately, whether you need them or not. This is potentially a serious waste of resources if it comes out that you only needed a handful of them. Python solves this problem by using generators:

def generatePrimes(topValue):

primes = [2]

for testValue inrange(3, topValue + 1):

divisors = [prime for prime in primes if testValue % prime == 0]

if divisors == []:

primes += [testValue]

yield testValue

The only difference between our new function and the original is the last line. Note that in the new function there is no return statement, instead there is a 'yield' statement, and it is inside the for loop. The new function returns a generator which can be passed to any function that operates on lists and it will treat it as a linked list. Should the function that uses the generator return before all of the values have been extracted then the remaining values will never be generated. This will allow you to avoid doing work that isn't needed. Even if you find that you need all of those values, rather than having a potentially long wait at the start of the process, the work will be spread throughout the process, allowing for just in time delivery of your data.

Python's Lineage

ABC

Python was originally developed to be a replacement for ABC, and on the surface the resemblance is striking:

ABC:

PUT {} IN telephone

PUT 5551212 IN telephone["Spam Foo"]

PUT 8674309 IN telephone["Eggs Bar"]

FOR name IN keys telephone:

WRITE "Name:", name, " Phone:", telephone[name] /

Python:

telephone = {}

telephone["Spam Foo"] = 5551212

telephone["Eggs Bar"] = 8674309

for name in telephone.keys():

print"Name:", name, " Phone:", telephone[name]

Both of the above map telephone numbers to names in an associative array, then print them in the form:
Name: Spam Foo Phone: 5551212
Name: Eggs Bar Phone: 8674309

In addition to the similarities in syntax, they both use an interactive environment which allows you to type your code directly into the command line and get a response from the interpreter.

Haskell

From the appearance of the syntax, it's clear that Python acquired its list comprehension functionality from Haskell.

Both of the above produce a list of integers, [2, 6, 10, 14, 18] which are the odd numbers from 1 to 10, doubled.

Python Installation

Complications to Look Out For

Currently there are two active branches in Python, 2.7 and 3.1. Python 3.1 has a number of improvements to the language which make the syntax regular and prevent some difficult to track bugs that are a serious issue in 2.7. Unfortunately it is not backwards compatible. I'll be using 2.7 for this article because it continues to have the most libraries available for it.

Downloading And installing Python 2.7

You can find the latest release of Python 2.7 from Python download. From that point installation is easy, run the installer and accept the default values. The full install will take 52MB of space on your drive.

First Program

For a first program, we'll get started with "Hello World". In Python this one is so trivial that it takes more effort to run it than to write it. Copy and paste the code below into your preferred text editor and save it as 'helloworld.py':

Executing the Program

The primary purpose of the hello world program is to create a simple program that demonstrates how to compile/execute a program in the given language.

To execute this one type "c:\python27\python helloworld.py" at the command line.

There is no direct use of a 'main' function in Python. Code execution starts at the beginning of the file and progresses through the end.

There are a number of ways to impose more structure on a Python program, but the purpose of this first project is to make sure our systems are configured correctly. I'll get to the more interesting stuff in next week's post.

Trivia

Python was named for the British comedy group Monty Python.
Python's creator creator, Guido van Rostrum has the final say on all of Python's design decisions and is called the Benevolent Dictator for Life.

fizzbuzz

The fizzbuzz problem is derived from a game in which the players sit in a circle and count, one after the other. If the number is divisible by 3 or 5, they say either 'fizz' or 'buzz' insteady of the number. If the number is divisible by both 3 and 5 they say 'fizzbuzz.' A fairly common test of basic programming skill is to write a program that will iterate through a sequence of numbers assigning them the appropriate value from the fizzbuzz game.

I'll be using two different solutions to the fizzbuzz problem to take a look at how Python handles conditionals and loops. One solution will use a for loop, the other will use list comprehension.

Function Formatting

The first point to examine is function definition. In the previous post I skimmed over how blocks are delimited in Python. Here you can see that python uses the off-side rule to indicate the start and end of a block and new lines to indicate the end of a statement. This is wonderful because it makes for a very clean looking syntax, and it eliminates two entire classes of bugs: the 'you forgot a semicolon here' bug and the 'remember to close your braces' bug. It also creates its own problems.

Python interprets both a tab and a space as a single unit of whitespace. It reads 1 tab's worth of spaces and an actual tab as being two different things. This can lead to a lot of frustration for new programmers because these look exactly the same to someone reading the code. To help avoid this kind of problem it's considered standard in the Python community to set your editor to use only spaces and never insert a tab character. How to go about this varies from editor to editor. In IDLE, Python's IDE, tabs are treated as spaces by default.

Most frequently in the line preceding a new block there is a line such as a function declaration, an if statement, a looping statement, those lines will have colon's at their ends indicating the end of the statement/start of the block.

def isfizz(num):

"""Returns true if the given number is divisble by 3"""

return num %3 == 0

def isbuzz(num):

"""Returns true if the given number is divisble by 5"""

return num %5 == 0

The other point of interest in the above code is the first line of each of the functions after the declaration. When you write a python function if you start it with a triple quoted string documention generated by pydoc will automatically be incorporated in the function.

If I Were More Assertive

def fizzbuzzSingle(num):

"""This follows the fairly simple pattern of generating

a single instance of fizz buzz depending on the number

value given."""

assert num >0, "The given number " + str(num) + " is not a natural number."

if isfizz(num)and isbuzz(num):

return"fizzbuzz"

elif isfizz(num):

return"fizz"

elif isbuzz(num):

return"buzz"

else:

returnstr(num)

A few interesting pythonisms show up here. On line 5 there's an assertion statement with the format "assert <a boolean statement>, <The string to throw if it fails>". This is just syntactic sugar for:

ifnot num >0:

raise AssertionException("The given number " + str(num) + "is not a natural number."

Lines 7-13 contain the standard setup for Python if/elif/else statements. Although Python usually uses parentheses to delimit arguments in a function, they are unnecessary in python conditionals/loops where the colon indicates the end of the statement. Of course that leads to the question, why they are necessary in function definitions. Anyone who can answer this feel free to let me know.

On line 14, there's an explicit type conversion from an integer to a string. This is necessary in Python because implicit type casting isn't allowed. The basic types all have an equivalent to this converstion, this means the programmer only has to learn one function per type, instead of "intToStr", "floatToStr" etc.

Ranges and Method Calls

def fizzbuzzComprehension(topNum):

"""In this example I've used list comprehension to generate the list of

fizzbuzz strings, then combined them afterwards."""

fizzbuzzList = [fizzbuzzSingle(num)for num inrange(1, topNum + 1)]

return" ".join(fizzbuzzList)

I've used the range function several times, but haven't gotten around to explaining how it works. It shows a few of the nice features in python. At its base, range generates an ordered list of numbers. What that list is depends on the number of arguments.

range(5)# 1 Argument Generates [1, 2, 3, 4]

range(3, 7)# 2 Arguments Generates [3, 4, 5, 6]

range(0, 30, 5)# 3 Arguments Generates [0, 5, 10, 15, 20, 25]

The important idea here is that Python allows variable numbers of arguments in its functions, we'll get to more detail on how this works in a later article.

The last line in fizzbuzzComprehension introduces Python's method call notation. The syntax is fairly common in object oriented languages <object>.<method>(<arguments>). Here the object is a string with one space, the method is "join." Join takes a list of strings and intersperses its object between them.

For Loops, Iterables and Slices

def fizzbuzzFor(topNum):

"""Here I've used the more common for loop to generate the fizzbuzz string,

It's less concise than the list comprehension version, but is also less

suprising to those who aren't as familiar with the syntax."""

fizzbuzzString = ""

for num inrange(1, topNum + 1):

fizzbuzzString += fizzbuzzSingle(num) + " "

# Return the total string, dropping the last space.

return fizzbuzzString[:-1]

On line 6 we introduce Python's version of the for loop. Unlike most languages I've encountered where the format is "for(<start value>, <end condition>, <value change>)", Python's format is "for <new variable> in <iterable>". An iterable, broadly described, is any value that you could perceive as a list of something. Lists, lines in a file, the output of a generator and a sets are a few examples of iterables.

The last line will look especially unusual to those who are used to conventional array index notation. Python uses a rather power syntax called slice notation to refer to subsets of an iterable value where 0 is the first value. You can refer to sub-lists with the syntax "<a list>[<first value>:<value after the last value>]". If you leave out either the start or end value then Python will take from the beginning or the end respectively. Finally, as can be seen in fizzbuzzFor, you can use negative numbers to refer to start from the end of the iterable rather than the beginning. To make the functionality more clear, read a few examples:

"spam"[0:1] == "s"

"spam"[:3] == "spa"

"spam"[-2:] == "am"

"spam"[:-1] == "spa"

Making it Run

if __name__ == "__main__":

print fizzbuzzComprehension(20)

As I mentioned in the last article, Python doesn't have a 'main' function, but there are times when you want to load a module without having it execute all of its code. If you want code that is only executed when the file is executed on its own you check if the the __name__ variable's value is "__main__".

Testing.... Testing

Python comes with its unit test framework built in, among many other libraries. There are a number of others available, but pyunit (called unittest in the python 2.7 release) is more than adequate.

Importing Modules/Libraries

importunittest

from fizzbuzz import*

Both of the above commands import the named libraries. The two differ in how much information you need to reference the contents of their modules. The functions on fizzbuzz can be referenced as though they were declared in the same file. To refer to an element in the unittest module you must indicate that you are referring to a member of that module with dot notation like "<module>.<element>". Unfortunately this notation is confusingly similar to the notation used to reference the methods in objects.

The notation "from <module> import <something>" can either be general and import all of the elements in the module as above, or import specific elements as in:

Class syntax in Python isn't especially suprising, "class <name of class> (<superclasses>)". It's worth noting that Python allows for multiple inheritance and that methods are declared in a similar fashion to functions.

All methods have 'self' as their first argument, this allows you to refer to the object that contains the method without any extra keywords. This can cause some confusion because while all methods have 'self' as their first argument when declared, when the method is called 'self' is left off.

def testFizzBy3(self):

"""Verifies that if a number is divisble by 3, it returns fizz"""

for num inself.justBy3:

result = fizzbuzzSingle(num)

self.assertEqual(result, "fizz")

Pyunit makes use of the exception handling system for its tests. In this case, should the assertion in line 5 fail then the test fails. Also note in line 3, the use of the list generated in the setUp method.

def testMustBePositive(self):

"""fizzbuzzSingle should throw an error when given a non-positive value"""

try:

fizzbuzzSingle(0)

exceptAssertionError:

pass

else:

fail("Expected an assertion error, non-positive should fail.")

In cases where an error is the expected behavior, using the try/except/else syntax demonstrated that the error was thrown. Here Python is actually using a pun on pass/fail. Fail is a legitimate part of the unittest library. Pass is a python keyword indicating that it should do nothing. It's similar to NOP in assembler code.

if __name__ == "__main__":

unittest.main()

The main() method in unittest uses reflection to extract and run all of the methods in the object that start with "test" and executes them as the unit test suite.

Parsing XML

The RSS feed parser extracts the names of the channels and the titles of each of the articles.

The handler below gets most of its functionality from its parent class. There are four methods added: startDocument, startElement, endElement and characters. These each fire as an XML document is being read. Because the methods themselves are generic enough to apply to any XML document, you need a lot of if/else statements inside to deal with specific kinds of XML document.

class RssHandler(xml.sax.handler.ContentHandler):

def startDocument(self):

self.inItem = False

self.inTitle = False

self.channels = {}

self.currentChannel = None

self.str = ""

def startElement(self, name, attrs):

lname = name.lower()

if lname == "item":

self.inItem = True

elif lname == "title":

self.inTitle = True

def endElement(self, name):

lname = name.lower()

if lname == "item":

self.inItem = False

elif lname == "title":

self.inTitle = False

ifself.inItem:

self.channels[self.currentChannel] += [self.str]

else:

self.currentChannel = self.str

ifself.currentChannelnotinself.channels.keys():

self.channels[self.currentChannel] = []

self.str = ""

def characters(self, content):

ifself.inTitle:

self.str += content

This class reads through an XML document and accumulates a relational list of the form: {"Channel 1":["Article 1", "Article 2"], "Channel 2":["Article 3"]}

Context Managers and Downloading Webpages

Context Managers

Files, network connections and other information that require streams of data generally require some amount of boilerplate to open and close those streams. This both adds uninteresting noise to your code and can cause unexpected bugs when you open a stream but fail to close it. Python deals with this through context managers.

Without context managers printing the lines of a file to screen looks like this:

file = open("infile.txt", "r")

printfile.read()

file.close()

Though this is clear and obvious in a small example, in real code there can be problems between opening and closing the file, which cause it not to be closed properly. The standard fix is to wrap this with a try/finally to deal with any exceptions that are thrown in the process:

file = open("infile.txt", "r")

try:

printfile.read()

finally:

file.close()

Where finally makes certain that, regardless of what happens in the middle, the file gets closed. This is effective, but contains a fair amount of boilerplate for a simple action. Context managers allow us to do this:

withopen("infile.txt", "r")asfile:

printfile.read()

The open function returns a file object which contains the context manager. Any object can become a context manager object by adding an __enter__ and __exit__ method for creation and cleanup. Many of Python's standard stream objects have these methods added already.

Downloading Webpages

Unfortunately the urlopen function from urllib2 does not come with a context manager, so I had to write one myself. Fortunately, it's easy.

class Url():

def__init__(self, url):

self.url = url

def __enter__(self):

self.stream = urllib2.urlopen(self.url)

returnself.stream

def __exit__(self, type, value, traceback):

self.stream.close()

Tying it All Together

Using the XML Parser

def generateRsses(feedFile):

withopen(feedFile, "r")asfile:

urls = [url.strip()for url infile.readlines()]

for url in urls:

with Url(url)as rss:

handler = RssHandler()

parser = xml.sax.make_parser()

parser.setContentHandler(handler)

parser.parse(rss)

yield handler.channels

def printFeed(rss):

for channelName in rss.keys():

print"*** " + channelName + " ***"

for title in rss[channelName]:

print"\t" + title

The most complex part of this is the actual parsing of the xml, I'll walk you through it line by line.

Create the RssHandler (the class created above). This defines how the parser will work.

make_parser creates an object that does the XML heavy lifting.

Attach the handler to the parser.

Do the actual parsing.

Extract the resulting data.

Commandline Arguments

if __name__ == "__main__":

[scriptName,feedFileName] = sys.argv

for rss in generateRsses(feedFileName):

printFeed(rss)

The only new element here is sys.argv. sys.argv contains all of the arguments handed into the python interpreter. The first is the name of the script itself, so the program ignores it, the second is the name of the file that contains a list of RSS feeds.

Resources:

This is the last article in a series, to read the entire series go here.

Writing a GUI for our RSS Aggregator

The Plan

I'll be finishing up the Python set with an overview of the Tkinter GUI library. There are numerous other GUI libraries available for Python, the best known of which is wxPython. I chose Tkinter because it comes built into Python and the primary goal here is to, as much as possible, look at Python as it comes out of the box.

What to Expect

When this article is complete:

You will have:

A basic GUI for last week's RSS reader.

You will know:

How to write a GUI in Python's Tkinter.

How to use lambda expressions to create functions on the fly.

The final GUI will look like this:

Files Used in this Project

RssReader.txt: A library for reading RSS information, that I wrote in the previous project. Rename it to "RssReader.py" after downloading.

RssReaderGui.txt: The sourcecode for this project. Rename it to "RssReaderGui.py" after downloading.

The Code

Libraries

Global Variables

currentFeeds = {}

rssDisplay = None

chooseChannel = None

currentChannel = None

currentFeeds = None

I've seen two ways to organizing the code for a simple one window GUI in Python. One is to wrap it all in a single large class. The other is to write a group of functions that share a set of global variables. In this case a single class would have the same problems as globals along with the overhead of adding self. to every method call and class variable.

The variables that are set to None above are place holders for global variables that will be assigned later. They don't have to be assigned here, but it's useful to know what your globals are at the front end.

Toplevel Window Creation

def rssWindow():

root = Tk()

root.title("Rss Reader")

root.geometry("750x500")

root = channelFrame(root)

root = buttonFrame(root)

return root

Although you can define your functions in arbitrary order, I've written these in a, more or less, heirarchical order. I'll walk through what I did here line by line:

At the top level we create a Tk object.

Give the window a title

and an initial size.

The frame that contains the drop down menu and list of articles.

The frame that contains the select and load buttons.

The Upper Window with the Dropdown and Listbox Layout

def channelFrame(parent):

channelFrame = Frame(parent)

channelSelectFrame = Frame(channelFrame)

channelSelectFrame = channelLabel(channelSelectFrame)

channelSelectFrame = channelSelect(channelSelectFrame)

channelSelectFrame.pack(side=LEFT)

channelFrame = rssDisplay(channelFrame)

channelFrame.pack(side=TOP, expand=YES, fill=BOTH)

return parent

Here we create the window that contains textual information from the RSS feeds. To accomplish this, I create frames which contain either other frames or GUI elements. Once the elements are created and organized by frame they are in, then the window behavior is set with .pack

def rssDisplay(parent):

global rssDisplay

rssDisplay = Listbox(parent)

rssDisplay.pack(side=RIGHT, expand=YES, fill=BOTH)

return parent

def channelLabel(parent):

label = Label(parent, text="Select a channel:")

label.pack(side=TOP)

return parent

def channelSelect(parent):

global chooseChannel, currentChannel

currentChannel = StringVar(parent)

channelList = ["None"]

currentChannel.set(channelList[0])

chooseChannel = OptionMenu(parent, currentChannel, *channelList)

chooseChannel.pack(side=BOTTOM)

return parent

Here the individual components are generated, the behaviors of each are described later. On line 2 I used the global keyword. You only need to use this keyword if you plan on changing the global variable, not if you're going to access it.

The Lower Window with the 'Set Config File' and 'Load Feeds' Buttons

def buttonFrame(parent):

buttonFrame = Frame(parent)

buttonFrame = setConfigFileButton(buttonFrame)

buttonFrame = loadRssButton(buttonFrame)

buttonFrame.pack(side=RIGHT, expand=YES, fill=X)

return parent

def setConfigFileButton(parent):

button = Button(parent, command=setConfigFile)

button["text"] = "Set Config File"

button.pack(side=LEFT)

return parent

def loadRssButton(parent):

button = Button(parent, command=loadRss)

button["text"] = "Load Feeds"

button.pack(side=RIGHT)

return parent

Here I create a top level layout for each of the buttons then describe the specific behavior of each one. We describe the buttons behavior by passing the function to be called via the command argument.

Commands called by various elements

Lambda, functions are first class objects

def loadRss():

global chooseChannel, currentChannel

chooseChannel["menu"].delete(0, END)

channelList = currentFeeds.keys()

for channelName in channelList:

chooseChannel["menu"].add_command(label=channelName,

command=lambda(temp = channelName): selectChannel(temp))

selectChannel(channelList[0])

The loadRss function clears the values from the drop down menu then adds each of the current channels to it. It also uses a Lambda function to set the behavior of the drop down box when a given channel is selected.

Lambda functions, as used in line 7 of the above code are extraordinarily useful. It comes in handy, as in the above case, when you need to create a function for which some of the internal values are not known until runtime. Here the Lambda function's purpose is to assign the default value, channelName to the selectChannel function and assigns it as chooseChannel's command.

In general Lambda functions come in the form lambda arg, arg: arg + arg. For people who aren't used to functional it's important to remember that there's only ever one line in a lambda function and the value of that line is always returned without any need to use the return keyword.

When one of the channels is selected from the drop down, this function fires and populates the display window with the names of the current articles.

def setConfigFile():

global currentFeeds

currentFeedFile = askopenfilename(filetypes=[("allfiles", "*"),

("textfiles","*.txt")])

currentFeeds = combineFeeds(currentFeedFile)

loadRss()

Here the program opens a file dialog with askopenfilename which will return the flie selected. It then combines all of the feeds with the same channel name and loads them into the select box to display them with loadRss.

def combineFeeds(fileName):

feeds = {}

for feed in generateRsses(fileName):

for channelName in feed.keys():

if feeds.has_key(channelName):

feeds[channelName] += feed[channelName]

else:

feeds[channelName] = feed[channelName]

return feeds

combineFeeds reads the RSS sources from the given filename, downloads the feeds using the library from the last article then combines any channels that happen to have the same name.

if __name__ == "__main__":

rssWindow().mainloop()

To actually create the window and make it useful, the program calls rssWindow(), described at top of the file, then runs the mainloop() method on it.

This has been an interesting run, next week I'll be taking a look at Markdown, a lightweight markup language which allows you to create nicely typset documents from the comfort of your text editor. Starting in December I'll be taking a look through Scala a functional/object oriented language that runs in the Java VM.