The XML.com Column: Python and XML "Python has long been a popular language for developing XML applications. Python and XML expert Uche Ogbuji gives XML.com readers the essential information needed to use these two technologies together." As of 2006-09-15, Ogbuji had contributed some forty (40) articles on Python and XML.

Python Wiki. Python is an object-oriented, interpreted, and interactive programming language. It is often compared [favorably of course :-)] to Lisp, Tcl, Perl, Ruby, C#, Visual Basic, Visual Fox Pro, Scheme or Java. Python combines remarkable power with very clear syntax. It has modules, classes, exceptions, very high level dynamic data types, and dynamic typing. There are interfaces to many system calls and libraries, as well as to various windowing systems. New built-in modules are easily written in C or C++. Python is also usable as an extension language for applications that need a programmable interface.

Python & XML. By Christopher A. Jones and Fred L. Drake. O'Reilly, December 2001. ISBN: 0-596-00128-2. See the online sample Chapter 1: Python and XML. "Python is an ideal language for manipulating XML, and this new volume gives you a solid foundation for using these two languages together. Complete with practical examples that highlight common application tasks, the book starts with the basics then quickly progresses to complex topics, like transforming XML with XSLT and querying XML with XPath. It also explores more advanced subjects, such as SOAP and distributed web services..." See the description and Table of Contents.

ElementTree Overview. "The Element type is a simple but flexible container object, designed to store hierarchical data structures, such as simplified XML infosets, in memory. The element type can be described as a cross between a Python list and a Python dictionary. The ElementTree wrapper adds code to load XML files as trees of Element objects, and save them back again. The Element type is available as a pure-Python implementation [for Python 1.5.2 and later]. A C implementation is also available, for use with CPython 2.1 and later. A Jython implementation will be included in a future release. There's also an independent implementation, lxml.etree, based on the well-known libxml2/libxslt libraries. This adds full support for XSLT, XPath, and more... The cElementTree module is a C implementation of the ElementTree API, optimized for fast parsing and low memory use. On typical documents, cElementTree is 15-20 times faster than the Python version of ElementTree, and uses 2-5 times less memory. On modern hardware, that means that documents in the 50-100 megabyte range can be manipulated in memory, and that documents in the 0-1 megabyte range load in zero time (0.0 seconds). This allows you to drastically simplify many kinds of XML applications.[description 2006-09]

PyXML. A SourceForge XML package for Python. PyXML is a package collecting the tools required for writing basic XML applications in Python, along with documentation and sample code. Features include (but are not limited to) SAX, DOM, the xmlproc validating parser, an Expat interface, and more... The PyXML package is a collection of libraries to process XML with Python. It contains, among other things (1) xmlproc: a validating XML parser; (2) Expat: a fast non-validating parser. (3) sgmlop: a C helper module that can speed-up xmllib.py and sgmllib.py by a factor of 5; (4) PySAX: SAX 1 and SAX2 libraries with drivers for most of the parsers; (5) 4DOM: A fully compliant DOM Level 2 implementation; (6) javadom: An adapter from Java DOM implementations to the standard Python DOM binding; (7) pulldom: a DOM implementation that supports lazy instantiation of nodes; (8) marshal: a module with several options for serializing Python objects to XML, including WDDX and XML-RPC. Downloads are made available on SourceForge...

IronPython "IronPython is a new implementation of the Python programming language running on .NET. It supports an interactive console with fully dynamic compilation. It is well integrated with the rest of the .NET Framework and makes all .NET libraries easily available to Python programmers, while maintaining full compatibility with the Python language. All of Python's dynamic features like an interactive interpreter, dynamically modifying objects and even metaclasses are available." Release 1.0 Production version: September 05, 2006. See: (1) "Microsoft Frees IronPython," and "Dynamic Languages for Agile Enterprises."

ASPN Python Cookbook. A collaborative collection of your contributions to Python lore. Edited by David Ascher (ActiveState). Python and XML are two technologies that fit with one another. Both have similar characteristics of trees of element with attributes. Python's skill and manipulating data structures make it a great tool for parsing, transforming, and producing XML data. Python Cookbook code is freely available for use and review; users can contribute recipes (code and discussion), comments, and ratings.

PyObjC The Python <-> Objective-C Bridge. The PyObjC project aims to provide a bridge between the Python and Objective-C programming languages. The bridge is intended to be fully bidirectional, allowing the Python programmer to take full advantage of the power provided by various Objective-C based toolkits and the Objective-C programmer transparent access to Python based functionality. The most important usage of this is writing Cocoa GUI applications on Mac OS X in pure Python.

Twisted. "Twisted is an event-driven networking framework written in Python and licensed under the MIT license. See the FAQ for commonly asked questions about Twisted. Twisted projects variously support TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols (including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others), and much more. Supporting numerous protocols, it contains a web server, numerous chat clients, chat servers, mail servers, and more. See the Twisted Matrix Laboratories Projects.

[November 22, 2006] "Drill-down on Three Major New Modules in Python 2.5 Standard Library." By Gigi Sayfan. From devX.com (November 22, 2006). "The freshly minted 2.5 version of Python has lots of goodies, but the three in this article are the cream of the crop. Find out how ctypes, pysqlite, and ElementTree can save you time and aggravation in this extensive article with a ton of great sample code... Module No. 3: xml.etree.ElementTree. This module contains pythonic XML processing tools for parsing and constructing XML documents. Python boasts several standard XML modules that support the DOM and SAX APIs. However, the DOM API (xml.dom.minidom) is modeled after the W3C DOM API and is quite cumbersome. ElementTree is the brainchild of Fredrick Lunde ( http://www.effbot.org). It is a highly pythonic and high-performance XML package. Lunde also contributed the cElementTree, which is a C extension that exposes the same API as the Python package. The performance of cElementTree is amazing (speed and memory foot print). Many pythonistas reject XML as a data exchange format altogether and prefer to simply use direct Python data structures for data exchange. This can be done either as plain text (to be evaluated on the other side using the eval() function) or pickled. However, no one can escape XML these days. It is especially dominant in the important web services domain. To discuss ElementTree, I will continue with the role-playing game example. ElementTree is based on the Element data type. An element has a tag and may also have children (sub-elements), attributes (key-value pairs), content (text string), and a tail (text string that follows the element until the next sibling element). ElementTree is optimized for non-mixed data models (where text never contains elements), which will be the focus of this article... ElementTree is not just an XML builder. It is a parser too. It can take an XML file or string and create an ElementTree out of it. This area of ElementTree is probably the most common one and yet the interface is pretty clunky IMHO. To parse files you call the parse() function with a filename; to parse an XML string you can use one of two identical functions: XML() or fromstring(). Nothing is consistent about this choice of functions. It feels wrong to have more than one way to parse XML strings... ElementTree is a fine piece of software that proves that a friendly API can also be performant. ElementTree offers much more than that, including decent namespace support, fine-grained XML tree building, reading and writing to files, etc. For performance buffs the cElementTree is a real boon..."

[October 26, 2006] "What's New in Python 2.5?" By Jeff Cogswell. From O'Reilly ONLamp.com (October 26, 2006). "It's hard to believe Python is more than 15 years old already. While that may seem old for a programming language, in the case of Python it means the language is mature. In spite of its age, the newest versions of Python are powerful, providing everything you would expect from a modern programming language. This article provides a rundown of the new and important features of Python 2.5. I assume that you're familiar with Python and aren't looking for an introductory tutorial, although in some cases I do introduce some of the material, such as generators. Python 2.5 includes many useful improvements to the language. None of the changes are huge; nor do they require changes to your existing 2.4 code... Python 2.5 includes many welcome changes. For me personally, I will get great use out of the new ElementTree classes for XML processing and the SQLite classes for storing data. The changes to the language itself, particularly generators and contexts, will help me write more robust code, taking away the emphasis on how to do something, and focusing more on what to do. For my own work, I know that moving the emphasis in that direction always helps reduce bugs. As you explore the changes to version 2.5, remember that many of the changes are precursors of what will come when the big Python 3.0 eventually appears..."

[September 28, 2006] "Introducing WSGI: Python's Secret Web Weapon." By James Gardner. From XML.com (September 27, 2006). "The recent Python 2.5 release features the addition of the Web Server Gateway Interface Utilities and Reference Implementation package (wsgiref) to Python's standard library. In this two-part article, we look at what the Web Server Gateway Interface [WSGI] is, how to use it to write web applications, and how to use middleware components to quickly add powerful functionality. Python is a great language for web development. It is straightforward to learn, has a broad and powerful standard library, and benefits from an active community of developers who maintain a range of XML and database tools, templating languages, servers, and application frameworks. In 2003, when the Web Server Gateway Interface specification was drawn up, the Python community also had one major problem. It was often easier for developers to write their own solutions to web-development problems from scratch rather than reusing and improving existing projects. This resulted in a proliferation of largely incompatible web frameworks. If developers wanted a full and powerful solution, they could use Zope, but many of them coming into the community in search of a lightweight framework found it hard to know where to start. Developers within the Python community quickly recognized that it would be preferable to have fewer and better-supported frameworks, but since each framework had its strengths and weaknesses at the time, none stood out as a clear candidate for adoption. The Web Server Gateway Interface (often written WSGI, pronounced "whiskey") was designed to bring the same interoperability that the Java world enjoyed to Python, and to go some way toward unifying the Python web-framework world without stifling the diversity. Most Python web frameworks today have a WSGI adapter, and most server technologies (Apache, mod_python, FastCGI, CGI, etc.) can be used to run WSGI applications, so the vision of web-application portability is fast becoming a reality..."

[September 26, 2006] "Why not Python?" By Vagmi Mudumbai. SAP Blog (SAP Labs India). September 26, 2006. "I was as usual bragging about Python and coaxing one of my colleagues to try it. But he said that he does not like scripting languages because there aren't IDEs which support intellisense and all the bells and whistles that strongly typed languages support. I was trying to make him understand that the dynamism of python makes up for the type safety net. I do not want to talk about all the extra goodies that you get with scripting languages as it is has been discussed elsewhere. I wanted to present a convincing case of Python's dynamic nature and came up with an example that I thought was worth sharing with the community as well. Python is the only language where I get to code working right the very first time. Anyways, here it goes. I took up the example of tracing method calls. Suppose you would like to trace a call for every method and recording statistics like the start time, end time and the execution time in milliseconds, doing this for every method in a C# or Java object would involve instrumenting the code (easy way) or to set up some complicated proxy object which involves careful design and coding. Instead it can be easily achieved using python. [... so there:] A scalable tracing module in about 17 lines of code. Why would anyone not want to use python?..."

[ September 15, 2006] "Amara XML Toolkit Version 1.1.9: Python Tools for XML Processing." By Uche Ogbuji. Announcement September 15, 2006. Amara XML Toolkit is a collection of Python tools for XML processing — not just tools that happen to be written in Python, but tools built from the ground up to use Python's conventions and take advantage of the many advantages of the language. Amara builds on 4Suite, but whereas 4Suite offers more on literal implementation of XML standards in Python, Amara focuses on Pythonic idiom. It provides tools you can trust to conform with XML standards without losing the familiar Python feel. The components of Amara are: (1) Bindery: data binding tool — a very Pythonic XML API' (2) Scimitar: implementation of the ISO Schematron schema language for XML; converts Schematron files to Python scripts; (3) domtools: set of tools to augment Python DOMs; (4) saxtools: set of tools to make SAX easier to use in Python; (5) Flextyper: user-defined datatypes in Python for XML processing. Changes since Amara Version 1.1.7: (a) Add support for EasyInstall; other packaging & installer improvements; (b) Add trimxml command line utility (for running reports on XML files; (c) Switch to Docbook for documentation source; (d) Bindery: Add support for dict-like accessors; (e) Tenorsax: Restore support for PySax; (f) Scimitar: Implement abstract rules; (g) Scimitar: Update Schematron namespace to ISO; (h) Scimitar: Implement phases; (i) Scimitar: Support Schematron queryBinding attribute: XPath, XSLT, EXSLT; (j) Add binderytools.fixup_namespaces function; (k) Add binderytools.quick_xml_scan function; (l) Fix APIs for adding comments and PIs; (m) Fix domtools.abs_path to be more namespace aware; (n) Bug fixes..."

[September 15, 2006] Python Web Server Gateway Interface v1.0. By Phillip J. Eby. PEP: 333. 2006-04-03 or later. "This document specifies a proposed standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers. Python currently boasts a wide variety of web application frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web — to name just a few. This wide variety of choices can be a problem for new Python users, because generally speaking, their choice of web framework will limit their choice of usable web servers, and vice versa. By contrast, although Java has just as many web application frameworks available, Java's 'servlet' API makes it possible for applications written with any Java web application framework to run in any web server that supports the servlet API. The availability and widespread use of such an API in web servers for Python — whether those servers are written in Python (e.g. Medusa), embed Python (e.g., mod_python), or invoke Python via a gateway protocol (e.g., CGI, FastCGI, etc.) — would separate choice of framework from choice of web server, freeing users to choose a pairing that suits them, while freeing framework and server developers to focus on their preferred area of specialization. This PEP, therefore, proposes a simple and universal interface between web servers and web applications or frameworks: the Python Web Server Gateway Interface (WSGI). But the mere existence of a WSGI spec does nothing to address the existing state of servers and frameworks for Python web applications. Server and framework authors and maintainers must actually implement WSGI for there to be any effect. However, since no existing servers or frameworks support WSGI, there is little immediate reward for an author who implements WSGI support. Thus, WSGI must be easy to implement, so that an author's initial investment in the interface can be reasonably low. Thus, simplicity of implementation on both the server and framework sides of the interface is absolutely critical to the utility of the WSGI interface, and is therefore the principal criterion for any design decisions..."

[May 02, 2006] "Get Started With an Open Source CMS, Part 6: Build a Python WebDAV Client for Jakarta Slide." By Michael Oliver (CTO, Alarius Systems LLC). From IBM developerWorks. "Want to learn how to build Python applications? In this tutorial -- the sixth in the series — you'll create a Python Web-based Distributed Authoring and Versioning (WebDAV) client for Jakarta Slide that, in turn, lets you build Python applications for content management. Upon completion, you'll be able to access the Slide or any other WebDAV server from your Python applications... After completing this tutorial, you'll know how to build a Python WebDAV client, have the basic knowledge necessary to build other Python applications, and be able to access the Jakarta Slide or any other WebDAV server from your Python applications..."

[January 31, 2006] "Discover Python, Part 9: Putting it all together." By Robert Brunner (NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign). From IBM developerWorks (January 31, 2006). Previous articles in this "Discover Python" series have discussed a number of topics that confront beginning Python programmers, including variables, container objects, and compound statements. This article builds on these concepts to construct a complete Python program. It introduces Python functions and modules and shows how to build a Python program, store it in a file, and run it from the command line.... [Summary:] This article explained how to write reusable code in Python. It discussed how to use methods, or reusable blocks of code, in a Python program. Methods can take input parameters and also return data, including container datatypes. Together, this functionality makes using methods a powerful way to tackle a range of problems. The article also discussed modules, which let you group related methods and data together into an organized hierarchy that can be reused easily in other Python programs. Finally, you saw how to put it all together to create a fully functioning, stand-alone Python program. You have seen that reusing code means reducing your workload. And when it comes to programmers, being lazy can be a virtue, not a vice...." See: Part 8; Part 7; Part 6; Part 5; Part 4; Part 3; Part 2; Part 1.

[August 22, 2006] "Mix and Match Web Components with Python WSGI." By Uche Ogbuji (Principal Consultant, Fourthougth, Inc). From IBM developerWorks (August 22, 2006). ['Learn about the Python standard for building Web applications with maximum flexibility'] "The Python community created the Web Server Gateway Interface (WSGI), a standard for creating Python Web components that work across servers and frameworks. It provides a way to develop Web applications that take advantage of the many strengths of different Web tools. This article introduces WSGI and shows how to develop components that contribute to well-designed Web applications... Many people have complained the popular Python programming language has too many Web frameworks, from well-known entrants such as Zope to under-the-radar frameworks such as SkunkWeb. Some have argued this diversity can be a good thing, as long as there is some underpinning standardization. Python and Web expert Phillip J. Eby went about the task of such standardization. He authored Python Enhancement Proposal (PEP) 333, which defines WSGI. The goal of WSGI is to allow for greater interoperability between Python frameworks. WSGI's success brings about an ecosystem of plug-in components you can use with your favorite frameworks to gain maximum flexibility. In this article, I'll introduce WSGI, and focus on its use as a reusable Web component architecture.... WSGI is a fairly young specification, but compatible servers, middleware, and utilities are emerging rapidly to completely revamp the Python Web frameworks landscape. The next time you have a major Web project to develop in Python, be sure to adopt WSGI by using existing WSGI components, and perhaps creating your own either for private use or for contribution back to your fellow Web developers.

[July 11, 2006] "Python Web Frameworks, Part 2: Web Development with TurboGears and Python." By Ian Maurer (Senior Consultant, Brulant, Inc). From IBM developerWorks (July 11, 2006). "In this second article of a two-part series, we demonstrate TurboGears, another open source MVC-style Web application framework based on Python. Where the first article was an introduction to the Django framework, this article shows how to use TurboGears to create a Web-based shopping application and concludes with a comparison between Turbogears and Django. The TurboGears developers call this project a "megaframework," because it is made up of several pre-existing subprojects. TurboGears helps glue together a number of main components: (1) MochiKit: JavaScript library; (2) Kid: Templating language; (3) CherryPy: Base Web framework; (4) SQLObject: Object-relational mapper (ORM)... Comparing TurboGears and Django: Django and TurboGears are both MVC-style frameworks that allow for the agile and rapid development of Web sites using the Python language. To choose the best one for your needs, consider these differences: Both projects, like Ruby on Rails, were extracted from existing applications and released to the open source community. Django has been around longer and originally came from an online newspaper that serves millions of page views per day. TurboGears was pulled from a rich-client, RSS News Reader application that is still under development. TurboGears is more community-driven than Django because it was built with pre-existing, open source components. The different backgrounds of each project have led to different project priorities. The Django team, coming from the high-demand, fast-paced world of online journalism, have focused on a framework that allows content-based applications to be constructed quickly and modified easily. The TurboGears team, with its consumer-product foundation, has geared itself toward rich client applications and a pluggable architecture..."

[June 06, 2006] "Python Web frameworks, Part 1: Develop for the Web with Django and Python." By Ian Maurer (Senior Consultant, Brulant, Inc). From IBM developerWorks (June 06, 2006). Django ia an open-source model-view-controller (MVC)-style Web application framework powered by the Python programming language. With Django, you can create high-quality, easy-to-maintain, database-driven Web applications in minutes. The Django project is a custom-built framework that originated with an online newspaper Website and was released as open source in July 2005. The core components of the Django framework are: (1) Object-relational mapping for creating models; (2) Polished administrator interface designed for end users; (3) Elegant URL design; (4) Designer-friendly template language; (5) Caching system. A view is a simple Python method that accepts a request object and is responsible for: (a) Any business logic — directly or indirectly; (b) A context dictionary with data for the template; (c) Rendering the template with a context; (d) The response object that passes the rendered results back to the framework. In Django, the Python method called when a URL is requested is called a view, and the page loaded and rendered by the view is called a template. Because of this, the Django team refers to Django as an MVT (model-view-template) framework. TurboGears, on the other hand, calls its methods controllers and their rendered templates views so that they can fit squarely into the MVC acronym..."

[September 15, 2005] "Charming Python: Scaling a new PEAK.". By David Mertz (Developer, Gnosis Software, Inc). From IBM DeveloperWorks. "The Python Enterprise Application Kit (PEAK) is a Python framework for rapidly developing and reusing application components. While Python itself is already a very high-level language, PEAK provides even higher abstractions. One fairly recent capability added to PEAK is the capability to create generic functions and specifically to dispatch them on predicates, not simply on type... The greatest benefit in PEAK's dispatch package is the possibility it offers for a much more accurate and concise modularization of code. Once you define a generic function and a collection of specializations, you remain free to add as many additional specializations as you want later on -- all without so much as touching the original (hopefully well-tested) code. For large-scale collaboration or simply for applications that are adjusted for a family of related versions, this package looks extremely promising..."

[September 14, 2005] "Processing Atom 1.0. By Uche Ogbuji. From XML.com (September 14, 2005). [In his final Python-XML column, Uche Ogbuji shows us three ways to process Atom 1.0 feeds in Python.'] "In the fast-moving world of weblogs and Web-based marketing, the approval of the Atom Format 1.0 by the Internet Engineering Task Force (IETF) as a Proposed Standard is a significant and lasting development. Atom is a very carefully designed format for syndicating the contents of weblogs as they are updated, the usual territory of RSS, but its possible uses are far more general, as illustrated in the description on the home page: Atom is the name of an XML-based Web content and metadata syndication format, and an application-level protocol for publishing and editing Web resources belonging to periodically updated websites. All Atom feeds must be well-formed XML documents, and are identified with the application/atom+xml media type. Atom is a very important development in the XML and Web world. Atom technology is already deployed in many areas (though not all up-to-date with Atom 1.0), and parsing and processing Atom is quickly becoming an important task for web developers. In this article, I will show several approaches to reading Atom 1.0 in Python. All the code is designed to work with Python 2.3, or more recent, and is tested with Python 2.4.1... Atom 1.0 is pretty easy to parse and process. I may have serious trouble with some of the design decisions for the format, but I do applaud its overall cleanliness. I've presented several approaches to processing Atom in this article. If I needed to reliably process feeds retrieved from arbitrary locations on the Web, I would definitely go for Universal Feed Parser. Mark Pilgrim has dunked himself into the rancid mess of broken Web feeds so you don't have to. In a project where I controlled the environment, and I could fix broken feeds, I would parse them myself, for the greater flexibility. One trick I've used in the past is to use Universal Feed Parser as a proxy tool to convert arbitrary feeds to a single, valid format (RSS 1.0 in my past experience), so that I could use XML (or in that case RDF) tools to parse the feeds directly. And with this month's exploration, the Python-XML column has come to an end. After discussions with my editor, I'll replace this column with one with a broader focus. It will cover the intersection of Agile Languages and Web 2.0 technologies. The primary language focus will still be Python, but there will sometimes be coverage of other languages such as Ruby and ECMAScript..."

[August 24, 2005] "Should Python and XML Coexist?" By Uche Ogbuji. From XML.com (August 24, 2005). "XML is the result of the meeting of two very distinct worlds: the database/data structure worlds and the document management world. As a result, XML is reasonably suitable for expressing data structures, and reasonably so for documents as well. I personally argue that XML is much more suited for documents than for data structures, but this is a long-standing debate in the XML community....I have long said that I would rather use Python and XPath to access XML documents and even XML data stores than XQuery, but being familiar with Java/XML APIs, I can understand why XQuery would be attractive in that case. In an interesting twist on this whole matter, even in languages such as Java, there is some backlash emerging against overuse of XML. Some developers rue the need for complex XML in scripting scenarios where it might have been better to use a language such as Jython, which is already tightly integrated into the host language, and is far better suited to writing code than XML. Conclusion There is plenty of room for discussion about where XML can be useful to Python programmers, and where it can be a hindrance. There is also plenty of room to discuss which XML-related technologies are well suited to use with Python, and which might be best avoided. I'll cover such matters in coming articles. Meanwhile, it's great to see that the Python community has been doing a lot more than just complaining about XML..."

[August 09, 2005] "Building and Filling Out Templates with Python and Cheetah. Generate HTML, XML, Plain Text, and More With This Powerful Templating Engine for Python." By Leonard Richardson (Software Engineer, CollabNet). From IBM developerWorks (August 09, 2005). ['This article explains how to generate any kind of text-based content with Python scripts and Cheetah templates. Cheetah templates are easy to understand and maintain, and they help you separate the static parts of a document from the dynamic parts.'] "The earlier article 'Connecting databases to Python with SQLObject' mentioned the wide variety of open source object-relational mapping libraries for Python. Python programmers like to do things their own way, which leads to a lot of duplication of effort. Out of all that effort, though, often comes one package that's good enough for just about everyone. The same pattern has played out for templating systems: ways of representing static text as forms to be filled out, so that dynamic elements can be plugged in later. The official Python Wiki links to nearly 20 templating systems, and those are just the major ones. What's more, Python comes packaged with several basic templating systems that will work in simple cases. Cheetah is the best Python templating system yet devised... Cheetah has a long pedigree. It's inspired by a Java templating system called Velocity, an improved version of the Webmacro templating system, which is itself an attempt to improve on JavaServer Pages. Cheetah provides a simple language for defining templates that provides basic flow control and object access constructs. It borrows its basic template syntax from Velocity, but adds features that give Cheetah templates access to the convenient constructs of Python. Cheetah offers many more features that aren't covered here. For instance, you can set up a filter that modifies the output of all variable references in a certain way. You can use the #import directive to import arbitrary Python modules into Cheetah templates and call their functions. In fact, almost anything you can do in Python you can do inside Cheetah. However, I recommend that you keep it simple. Remember the goal behind templating systems: separate the dynamic parts of a document from its static description..."

[August 02, 2005] "Connecting Databases to Python with SQLObject.". By Leonard Richardson (Software Engineer, CollabNet). From IBM DeveloperWorks. "An object-relational mapping tool helps improve your productivity by providing classes and objects to manipulate database tables. The best object-relational mapping tool for Python is SQLObject -- an open-source project that does just about everything you might need to program a database. This article introduces SQLObject and its capabilities. After reading this article, you'll be able to connect Python to databases without writing any SQL code...SQLObject is a versatile tool with many small, convenient features. Its limitations are easy to understand, and you can work around them by writing the SQL you need. SQLObject is to relational database programming what Python is to application programming: a convenient way to get the work done in a lot less time..."

[July 27, 2005] "EaseXML: A Python Data-Binding Tool." By Uche Ogbuji. From XML.com (July 27, 2005). "EaseXML is an XML data-binding tool for Python, available under the Python Software Foundation License. The package used to be called "XMLObject," but that generic name led to the situation I mentioned in 'Location, Location, Location'. Philippe Normand responded in a comment on that article that he would be changing the name of his project. In this article, I'll look at EaseXML 0.2, which I downloaded for installation on Python 2.4; Python 2.2. is the minimum version. The installation is standard distutils, a simple matter of python setup.py install. Earlier in this column I have covered Python data bindings that need no more information than the source XML, such as Amara Bindery and Gnosis Objectify. I also introduced one example, generateDS.py, of a data binding that requires an XML schema file to drive the binding. EaseXML is similar to this latter approach, except that the schema format it uses is just a set of Python classes defined with a set of conventions, with each XML element generally corresponding to a distinct Python class. In this way it is very similar to XIST, although it's less comprehensive... EaseXML lacks proper namespaces support, and I think the binding schema API could do with some close analysis. Fortunately, the version control logs seem to show a reasonable rate of development. I think it's worth keeping an eye on EaseXML because it does bring some innovative touches to XML processing in Python, but I would suggest waiting for another couple of releases before using it in production...."

[June 15, 2005] "More Unicode Secrets [and Python]." By Uche Ogbuji. From XML.com (June 15, 2005). "In a previous article I started a discussion of the Unicode facilities in Python, especially with XML processing in mind. In this article I continue the discussion. I do want to mention that I don't claim these articles to be an exhaustive catalogue of Unicode APIs; I focus on the Unicode APIs I tend to use most in my own XML processing. You should follow up these articles by looking at the further resources I mentioned in the first article. I also want to mention another general principle to keep in mind: if possible, use a Python install compiled to use UCS4 character storage. You can determine when you configure Python before building it whether it stores Unicode characters using (informally) a two-byte or a four-byte encoding, UCS2 or UCS4. UCS2 is the default but you can override this by passing the --enable-unicode=ucs4 flag to configure. UCS4 uses more space to store characters, but there are some problems for XML processing in UCS2, which the Python core team is reluctant to address because the only known fixes would be too much of a burden on performance. Luckily, most distributors have heeded this advice and ship UCS4 builds of Python. Even though some of the techniques I've gone over will enable you to generate correct XML, there is more to well-formedness than just getting the Unicode character model right. For example, there are some Unicode characters that are not allowed in XML documents, even in escaped form. I still recommend that you use one of the many tools I've discussed in this column for generating XML output.

[May 19, 2005] "Connect to Apache Derby Databases Using Python.". By Bob Gibson (Advisory Software Engineer, IBM). From IBM DeveloperWorks. There are sometimes good reasons for manipulating Derby relational databases in a language other than Java. If you are interested in using a flexible interpreted language such as Python, then this article can help you get everything set up properly... Although Derby is written in Java, there are times when programs written in other languages need to access a Derby database. Python is an interpreted, high-level, object-oriented dynamically-typed programming language. The fact that the interpreter can be used interactively, or iteratively, provides us with an interesting rapid-prototype language and development/test environment. When combined with a few existing utilities, Python lets you write high-level, dynamic, object-oriented scripts for manipulating Derby relational databases quite easily... You can set up a Windows environment that allows us to access Apache Derby, which is an easy to use, open-source relational database. In addition, you can develop high-level, dynamic, object-oriented programs in Python that allow us to connect to and manipulate a Derby."

[May 18, 2005] "Unicode Secrets [and Python]." By Uche Ogbuji. From XML.com (May 18, 2005). "Poor understanding of Unicode is probably the biggest obstacle users face when trying to learn how to process XML, and Python users are no exception. In my experience, Unicode matters are the most common component in users' cries for help with Python XML tools. In this article and the next I'll present a variety of tips, tricks, and best practices in order to help users minimize Unicode problems... Proper Unicode support is so important for any XML tool that I would go as far as to say it's the single most important criteria in tool selection. Do not use any XML tools that do not have solid Unicode support at the core. This is one of the things I look out for when examining tools for this column, but unfortunately it's not always easy to tell when a package has poor Unicode support. Luckily, all the most widely used XML tools in Python have sufficient Unicode support.... In 'Proper XML Output in Python' I stated a rule: In all public API's for XML processing, character data should be passed in strictly as Python Unicode objects. This is primarily an admonition for XML API designers, but it also applies to users because many API's allow you to pass in strings or Unicode objects interchangeably. Resist the temptation to use this flexibility. Convert all strings to Unicode when passing them to XML APIs. Doing so isn't always as easy as you might think..."

[April 20, 2005] "Making Old Things New Again." By Uche Ogbuji. From XML.com (April 20, 2005). "There have been recent releases of two of the Python-XML projects in which I'm involved; 4Suite and Amara XML Toolkit. One common theme in both releases was marked improvements to the XML document creation APIs. These improvements are significant enough to discuss and compare to the other systems for XML output I have presented in this column. The code uses 4Suite version 1.0b1 and Amara 1.0b2, running under Python 2.3.5. Installation is basically the same as in my earlier articles covering the two packages... New in 4Suite 1.0b1 is the class Ft.Xml.MarkupWriter, which is specialized for creating XML documents from scratch. It offers at least one feature I haven't seen in any other output libraries.... Amara's Bindery component now also allows you to create XML documents from scratch. The interface is not quite as rich as MarkupWriter, but it has some similarities. Amara's API is probably more suitable if you're writing programs that have a variety of document reading and update tasks, besides just creating output. If you really just want to write XML as directly as possible, MarkupWriter is probably a better bet....

[March 16, 2005] "Writing and Reading XML with XIST." By Uche Ogbuji. From XML.com (March 16, 2005). "XIST is a very interesting project I've been meaning to dig into for some time. If you've been following the news section at the end of each of these columns, you'll have noticed the steady work that Walter Dvrwald, the project leader, has put into this toolkit. It started out as a framework for generating HTML and incidentally XML, but the XML facilities have steadily grown and matured, until it is now a sophisticated system for not only generating, but also processing, XML. As the legend on the project page says: "XIST is also a DOM parser (built on top of SAX2) with a very simple and Python-esque tree API. Every XML element type corresponds to a Python class and these Python classes provide a conversion method to transform the XML tree (e.g. into HTML). XIST can be considered 'object-oriented XSL'". XIST isn't one of those projects you hear loudly advocated and debated when Python/XML processing options come up, but it probably should be... Based on my experimentation, XIST is definitely worth serious consideration when you're looking for a Python-esque XML processing toolkit. The extremely object-oriented framework can feel a bit heavy, but I can appreciate some of the resulting benefits, and it would certainly suit some users' tastes very well. I should also mention that there is a lot more to XIST that I was able to cover in this article. I didn't touch on its support for different HTML and XHTML vocabularies, XML namespaces, XML entities, validation and content models, tree modification, pretty printing, image manipulation, and more..."

[January 19, 2005] "Introducing the Amara XML Toolkit." By Uche Ogbuji. From XML.com (January 19, 2005). "As part of my roundup of Python data bindings, I introduced my own Anobind project. Over the column's history, I've also developed other code to meet some need emphasized in one of the previous articles. I recently collected all of these various little projects together into one open source package of XML processing add-ons, Amara XML Toolkit. Amara is meant to complement 4Suite in that 4Suite works towards fidelity to XML technical ideals, while Amara works towards fidelity to Python conventions, taking maximum advantage of Python's strengths. The main components of Amara XML Toolkit are the following: (1) Bindery: data binding tool. The code that was formerly available standalone as "Anobind" but with extensive improvements and additions, including a move of the fundamental framework from DOM to SAX. (2) Scimitar: an implementation of the ISO Schematron schema language for XML. It also used to be a standalone project, which I've announced here in the past. It converts Schematron files to standalone Python scripts... The aim of the project is versatility — giving the developer many flexible ways of processing XML using idioms and native advantages of Python. Because of the popularity of languages such as Java, many XML standards have evolved in directions that don't match up with Python's strengths. Amara looks to bridge that gap..."

[October 13, 2004] "The State of Python-XML in 2004." By Uche Ogbuji. From XML.com (October 13, 2004). The table [provided] lists the currently available Python-XML software that I judge to be significant. It is not a list of every bit of software in Python that has anything to do with XML. For example, I do not list pyglade (part of PyGTK), which is software for generating user interfaces in the GNOME desktop system for UNIX. The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications...I organize the table according to selected areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I have added reference links to column articles on software I've covered in this column. I have set a "heartbeat" rating for each project. One heart means the project is almost inactive and three means the project is very active. I judge this rating subjectively, according to recent activity I can find for each project: mailing list traffic, releases, articles, other projects that use it, etc. In 2002 I reported 34 Python-XML projects. Last year I added 24 and this year 16 (marked with an asterisk) for a grand total of 74. This month alone two new projects have emerged, showing the continuing interest in Python processing of XML. This year I added a new category, for XML generators, with 9 entries. There has been a bloom in Python packages for generating XML. An existing category that keeps on growing is in Pythonic APIs or data bindings. There are 15 as of this year's count. There is no doubt that patience for non-Pythonic ways of processing XML has worn thin, but considering that my list may not even be complete (rumor has it Guido van Rossum has a data-binding tool of his own), one wonders whether this area is ripe for consolidation..."

[August 11, 2004] "Practical SAX Notes." By Uche Ogbuji. From XML.com (August 11, 2004). "In this article I discuss issues related to recent articles in this column, including some practical problems using XML facilities — SAX in particular — across Python versions and installed software configurations. I also revisit ElementTree's support for XML namespaces and discuss some other Python tools' support for breaking large documents into chunks... If you want strict DOM conformance, use pxdom. If you're feeling adventurous enough to avoid DOM altogether, try ElementTree, one of the data bindings I've covered, or PyRXPU (but not PyRXP). Besides space, another factor behind the decision not to move all of PyXML into Python was the fact that PyXML could be updated more frequently than Python as a whole, allowing for quicker bug fixes and feature additions. I think this is no longer much of an issue now that Python has settled into a regular and fairly short release cycle. The main obstacle to making this happen is the lack of a clear owner who can take charge of the state of all things XML in the Python standard library. Many people have generously donated time to Python XML development, but no obvious candidates present themselves who happen to have the available time or sponsorship to lead and maintain a merger of PyXML into Python. It's probably too late for this to be done in time for Python 2.4, but perhaps Python 2.5 is within reach...."

[June 30, 2004] "XML Namespaces Support in Python Tools, Part Three." By Uche Ogbuji. From XML.com (June 30, 2004). "In the last two articles I've discussed namespace handling in Python 2.3's SAX and minidom libraries and in 4Suite. In this article I focus on ElementTree, libxml/Python and PyRXPU... ElementTree supports XML namespaces using James Clark's notation directly for element and attribute names. This is a rather different mechanism from most XML processing APIs, and we'll find out how smoothly it works in comparison... As I discussed in the earlier article, ElementTree does not maintain namespace prefix information. I found out how to use a specialized class to build the element tree, defined as ns_tracker_tree_builder in listing 2. This class receives expat parse events, but I was only able to figure out how to capture information from the namespace events in a "flat" manner: by updating a single dictionary each time I encounter a namespace declaration event (_start_ns). The problem with this is that all namespace scoping information is lost. I expect this approach will cause oddities in any document where a given namespace is used with more than one prefix at different points... my workaround for recording prefixes does not take into account the scope of namespace declarations and, in effect, always reports the last prefix seen for any given namespace. Notice also the fact that plain strings are returned in most cases rather than Unicode objects. I find this problematic..." See also Part 1 and Part 2.

[December 20, 2003] "xmltramp and pxdom." By Uche Ogbuji. From XML.com (December 17, 2003). ['In his Python column, Uche Ogbuji covers "xmltramp", a tool for parsing XML documents into a data structure that's very friendly to Python, and "pxdom", a highlight-compliant, DOM Level 3 implementation.'] "In this article I cover two XML processing libraries with very disjoint goals. xmltramp, developed by Aaron Swartz, is a tool for parsing XML documents into a data structure very friendly to Python. Recently many of the tools I've been covering with this primary goal of Python-friendliness have been data binding tools. xmltramp doesn't meet the definition of a data binding tool I've been using; that is, it isn't a system that represents elements and attributes from the XML document as custom objects that use the vocabulary from the XML document for naming and reference. xmltramp is more like ElementTree, which I covered earlier, defining a set of lightweight objects that make information in XML document accessible through familiar Python idioms. The stated goal of xmltramp is simplicity rather than exhaustive coverage of XML features... pxdom, on the other hand, has the goal of strict DOM Level 3 compliance. It is developed by Andrew Clover, who contributed to the XML-SIG the document 'DOM Standards compliance', a very thorough matrix of feature and defect comparisons between Python DOM implementatons. DOM has generally not been the favorite API of Python users -- or, for that matter, of Java users -- but it certainly has an important place because of its cross-language support..."

[September 17, 2003] "The State of the Python-XML Art, 2003." By Uche Ogbuji. From (September 10, 2003). A year after his first XML and Python review, the author "updates his overall Python-XML survey to encompass notable developments over the past year, many of which have been mentioned in the previous XML.com Python articles. This article serves as a ready and rapid index to folks who want to process XML using 'the best language available for the purpose.' The author organizes the review in a table according to the areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. He rates the vitality of each listed project as either 'weak', 'steady', or 'strong' according to the recent visible activity on each project: mailing list traffic, releases, articles, other projects that use it, etc. The table uses these categories for tools supporting Python-based processing: XML parsing engines, DOM, Data bindings and specialized APIs, XPath and XSLT, Schema languages, Protocols, RDF and Topic Maps, Miscellaneous. A year ago the author reported 34 Python-XML projects; this year he adds 24; most of the additions point to the impressive activity that continues on the Python-XML front..."

[August 18, 2003] "Introducing Anobind." By Uche Ogbuji. From XML.com (August 13, 2003). ['Uche Ogbuji introduces anobind, his new Python databinding tool.'] "My recent interest in Python-XML data bindings was sparked not only by discussion in the XML community of effective approaches to XML processing, but also by personal experience with large projects where data binding approaches might have been particularly suitable. These projects included processing both data and document-style XML instances, complex systems of processing rules connected to the XML format, and other characteristics requiring flexibility from a data binding system. As a result of these considerations, and of my study of existing Python-XML data binding systems, I decided to write a new data Python-XML binding, which I call Anobind. I designed Anobind with several properties in mind, some of which I have admired in other data binding systems, and some that I have thought were, unfortunately, lacking in other systems: (1) A natural default binding, i.e., when given an XML file with no hints or customization; (2) Well-defined mapping from XML to Python identifiers; (3) Declarative, rules-based system for finetuning the binding; (4) XPattern support for rules definition; (5) Strong support for document-style XML, especially with regard to mixed content; (6) Reasonable support for unbinding back to XML; (7) Some flexibility in trading off between efficiency and features in the resulting binding... In this article I introduce Anobind, paying attention to the same considerations that guided my earlier introduction of generateDS.py and gnosis.xml.objectify... Anobind is really just easing out of the gates. I have several near-term plans for it, including a tool that reads RELAX NG files and generates corresponding, customized binding rules. I also have longer-term plans such as a SAX module for generating bindings without having to build a DOM..."

[May 16, 2003] "Using libxml in Python." By Uche Ogbuji. From XML.com (May 14, 2003). ['Uche Ogbuji introduces libxml and its Python bindings.'] "The GNOME project, an open source umbrella projects like Apache and KDE, has spawned several useful subprojects. A few years ago the increase of interest in XML processing in GNOME led to the development of a base XML processing library and, subsequently, an XSLT library, both of which are written in C, the foundational language of GNOME. These libraries, libxml and libxslt, are popular for users of C, but also those of the many other languages for which wrappers have been written, as well as language-agnostic users who want good command-line tools. libxml and libxslt are popular because of their speed, active development, and coverage of many XML specifications with close attention to conformance. They are also available on many platforms. Daniel Veillard is the lead developer of these libraries as well as their Python bindings. He participates on the XML-SIG and has pledged perpetual support for the Python bindings; however, as the documentation says, 'the Python interface [has] not yet reached the maturity of the C API.' In this article I'll introduce the Python libxml bindings, which I refer to as Python/libxml. In particular I introduce libxml2. I am using Red Hat 9.0 so installation was a simple matter of installing RPMs from the distribution disk or elsewhere. The two pertinent RPMs in my case are libxml2-2.5.4-1 and libxml2-python-2.5.4-1. The libxml web page offers installation instructions for users of other distributions or platforms, including Windows and Mac OS X... libxml offers a SAX API, both through the low-level API and and through the bundled drv_libxml2.py, a libxml driver for the SAX that comes with Python and PyXML. libxml supports W3C XML Schema, RELAX NG, OASIS catalogs, XInclude, XML Base, and more. There are also extensive features for manipulating XML documents. I hope to cover these other features of this rich library in subsequent articles..."

[April 15, 2003] "Gems From the Archives." By Uche Ogbuji. From XML.com (April 09, 2003). ['In this month's Python and XML column Uche Ogbuji hunts for treasures in the archives of the Python XML SIG, locating interesting tidbits for producing and displaying XML.'] "The Python XML SIG, particulary its mailing list, is the richest resource there is for those looking to use Python for XML processing. In fact, efforts such as XML Bookmark Exchange Language (XBEL), created by the XML-SIG in September of 1998 and now used in more browsers and bookmark projects than any other comparable format, demonstrate this group's value to the entire XML world. We're all developers here, though, and for developers there is nothing as valuable as running code. There has been plenty of running code emanating from the XML-SIG over the years, including PyXML and a host of other projects I have mentioned in this column. But a lot of the good stuff is buried in examples and postings of useful snippets on the mailing list, and not readily available elsewhere. In this and in subsequent articles I will mine the richness of the XML-SIG mailing list for some of its choicest bits of code. I start in this article with a couple of very handy snippets from 1998 and 1999. Where necessary, I have updated code to use current APIs, style, and conventions in order to make it immediately useful to readers. All code in this article was tested using Python 2.2.1 and PyXML 0.8.2... There is more useful code available in the XML-SIG archives, and I will return to this topic in the future, presenting updates of other useful code from the archives..."

[March 18, 2003] "Using SAX for Proper XML Output." By Uche Ogbuji. From XML.com (March 12, 2003). ['Uche Ogbuji's Python and XML column explains how to use SAX to generate proper XML output: "Generating XML from Python is one of the most common XML-related tasks the average Python user will face; thus, having more than one way to complete such a common task is especially helpful".'] "In an earlier Python and XML column I discussed ways to achieve proper XML output from Python programs. That discussion included basic considerations and techniques in generating XML output in Python code... In this article I introduce an important one that comes with Python itself. Generating XML from Python is one of the most common XML-related tasks the average Python user will face; thus, having more than one way to complete such a common task is especially helpful... Probably the most effective general approach to creating safe XML output is to use SAX more fully than just cherry-picking xml.sax.saxutils.escape. Most users think of SAX as an XML input system, which is generally correct; because, however, of some goodies in Python's SAX implementation, you can also use it as an XML output tool. First of all, Python's SAX is implemented with objects which have methods representing each XML event. So any code that calls these methods on a SAX handler can masquerade as an XML parser. Thus, your code can pretend to be an XML parser, sending events from the serialized XML, while actually computing the events in whatever manner you require. On the other end of things, xml.sax.XMLGenerator, documented in the official Python library reference, is a utility SAX handler that comes with Python. It takes a stream of SAX events and serializes them to an XML document, observing all the necessary rules in the process..." Article includes the regular roundup of what's new in the Python/XML world.

[February 12, 2003] "Simple XML Processing With elementtree." By Uche Ogbuji. From XML.com (February 12, 2003). ['Uche Ogbuji introduces elementtree, a pythonic way of processing XML.'] "Fredrik Lundh, well known in Python circles as "the effbot", has been an important contributor to Python and to PyXML. He has also developed a variety of useful tools, many of which involve Python and XML. One of these is elementtree, a collection of lightweight utilities for XML processing. elementtree is centered around a data structure for representing XML. As its name implies, this data structure is a hierarchy of objects, each of which represents an XML element. The focus is squarely on elements: there is no zoo of node types. Element objects themselves act as Python dictionaries of the XML attributes and Python lists of the element children. Text content is represented as simple data members on element instances. elementtree is about as pythonic as it gets, offering a fresh perspective on Python-XML processing, especially after the DOM explorations of my previous columns... elementtree is very easy to set up. I downloaded version 1.1b3 and you can always find the latest version on the effbot download page. You need Python 2.1 or newer; I used 2.2.1... elementtree is fast, pythonic and very simple to use. It is very handy when all you want to do is get in, do some rapid and simple XML processing, and get out. It also includes some handy tools for HTML processing. The module elementtree.TidyTools provides a wrapper for the popular HTML Tidy utility, which, among other things, can take all sorts of poorly structured HTML and convert it into valid XHTML. This makes possible the elementtree.TidyXMLTreeBuilder module, which can parse HTML and return an elementree instance of the resulting XHTML...

[February 05, 2003] "Python: Language of Choice for EAI." By Aron Trauring (CEO, Zoteca). In EAI Journal Volume 5, Number 1 (January 2003), pages 43-46. ['If you think that the language for Web services is a straight fight between Java and C#, think again. Python is supported by the big Web services vendors, including Microsoft. This object-oriented, high-level interpreted language may be ideal for EAI.'] "... Python plays well with programming standards. Many Python extensions are available that support almost all Internet standards, including Common Object Request Broker Architecture (CORBA), Component Object Model (COM), Simple Object Access Protocol (SOAP), eXtensible Markup Language (XML), and others... Twisted is an opensource, Python-based framework and event-driven network that provides powerful, scalable, and flexible EAI capabilities. At the core of Twisted is its network layer, which can be used to rapidly integrate any existing protocol and model new ones. Whenever the need arises to develop a new protocol, the asynchronous, multiplexed, and two-way Remote Object Protocol (ROP) can be used to quickly implement it. Because the ROP is used with object-level abstractions, changes can be made easily, and new features added, without having to deal with the design restrictions and application development complexities of a custom protocol. Out of the box, Twisted supports several service protocols. These include but are not limited to: HyperText Transfer Protocol (HTTP), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Lightweight Directory Access Protocol (LDAP), Domain Name System (DNS), Sockets Server Version 4 (SOCKSv4), Secure Shell (SSH), Internet Relay Chat (IRC), Telnet, Post Office Protocol 3 (POP3), an America Online's instant messaging... Developers can immediately use these protocols without having to spend time reimplementing them. In addition, Twisted can talk to multiple, industrystandard Database Management Systems (DBMSes). It also can be used to communicate with COM servers and to control and integrate with standard Windows applications. Unlike other frameworks designed to address a specific domain, Twisted is designed to simultaneously support both multiple frameworks and multiple protocols. So it's useful for implementing Websites, Web services, e-mail servers, or instant messaging servers. Moreover, these services can all run in the same process..."

[December 18, 2002] "A Python & XML Companion." By Uche Ogbuji. From XML.com December 11, 2002. ['In his monthly Python & XML column, Uche Ogbuji provides an overview and updates to O'Reilly's Python & XML book, and keeps us up to date with the latest developments in the Python XML world.'] "Python & XML, written by Christopher Jones and Fred Drake, Jr. (O'Reilly and Associates, 2002), introduces Python programmers to XML processing. Drake is a core developer of PyXML, and Jones is an experienced developer in Python and XML, as well as an experienced author. As you would expect from such a team, this book is detailed and handy; however, I have a few notes, amplifications, and updates to offer (the book was released in December of 2001) -- all of which are distinct from the errata that the authors maintain. In this article I will provide updates, additional suggestions, and other material to serve as a companion to the book. You don't have to have the book in order to follow along... Python & XML is a very handy book. The examples are especially clear, and in the latter part of the book the authors develop a sample application which uses much of the book's contents very practically. My main complaint is that it covers XML namespaces so sparsely. Namespaces are very hard to avoid these days in XML processing, regardless of what you may think of them. More examples and coverage of where namespaces intersect DOM, XPath, XSLT, and so on would help a lot of readers. I plan to write an article focusing on XML namespaces in Python processing..."

[November 15, 2002] "Proper XML Output in Python." By Uche Ogbuji. From XML.com. November 13, 2002. ['One of the first issues a newcomer to XML discovers is that of encoding characters: even if you're just using ASCII you bump up against the need to escape characters like '<' and '&'. The problems become worse when you're handling documents in Unicode or other encodings. Uche Ogbuji's Python and XML column this week addresses just these problems. Uche dismisses the notion that writing XML is as simple as a 'print' statement, and provides guidelines that are applicable both for Python programmers and anyone dealing programmatically with XML.'] "... First, I consider ways of producing XML output in Python, which might make you wonder what's wrong with good old print... Indeed, programmers often use simple print statements in order to generate XML. But this approach is not without hazards, and it's good to be aware of them. It's even better to learn about tools that can help you avoid the hazards... The main problem with simple print is that it knows nothing about the syntactic restrictions in XML standards. As long as you can trust all sources of text to be rendered as proper XML, you can constrain the output as appropriate; but it's very easy to run into subtle problems which even experts may miss. XML has been praised partly because, by setting down some important syntactic rules, it eases the path to interoperability across languages, tools, and platforms. When these rules are ignored, XML loses much of its advantage. Unfortunately, developers often produce XML carelessly, resulting in broken XML. The RSS community is a good example. RSS uses XML (and, in some variants, RDF) in order to standardize syntax, but many RSS feeds produce malformed XML. Since some of these feeds are popular, the result has been a spate of RSS processors that are so forgiving, they will even accept broken XML. Which is a pity. Eric van der Vlist -- as reported in his article for XML.com, 'Cataloging XML Vocabularies' -- found that a significant number of Web documents with XML namespaces are not well-formed, including XHTML documents. Even a tech-savvy outfit like Wired has had problems developing systems that reliably put out well-formed XML. My point is that there's no reason why Python developers shouldn't be good citizens in producing well-formed XML output..."

[October 18, 2002] "A Tour of 4Suite." By Uche Ogbuji. From XML.com (October 16, 2002). ['in his latest installment of Python and XML, Uche Ogbuji provides a tour of the core XML processing facilities of 4Suite, an XML application platform for Python.'] "Mike Olson and I began the 4Suite project in 1998 with the release of 4DOM, and it quickly picked up an XPath and XSLT implementation. It has grown to include Python implementations of many other XML technologies, and it now provides a large library of Python APIs for XML as well as an XML server and repository system. In this article and the next, I'll introduce just the basic Python library portion of 4Suite, which includes facilities for XML parsing (complementing PyXML), RELAX NG, XPath, XPatterns, XSLT, RDF, XUpdate and more. If you are unfamiliar with any of these technologies, see the resources section at the end where I provide relevant pointers. Finally, after reviewing 4Suite, I'll summarize events in the Python-XML world since the last article... In the general case, the only prerequisite for 4Suite is Python 2.1 or more recent. PyXML is required if you wish to parse XML in DTD validation mode, or if your Python install does not have pyexpat built in (many Python distributions do)..." Note also the O'Reilly publication Python & XML, by Christopher A. Jones and Fred L. Drake, Jr.

[September 19, 2002] The Python Web Services Developer: XML-RPC for Python." By Mike Olson (Principal Consultant, Fourthought, Inc) and Uche Ogbuji (Principal Consultant, Fourthought, Inc). From IBM developerWorks. September 19, 2002. ['XML-RPC is a simple, lightweight Web services technology that predates SOAP. In this installment of the Python Web services developer, Mike Olson and Uche Ogbuji examine the XML-RPC facilities in Python.'] "XML-RPC is the granddaddy of XML Web services. It is a simple specification for remote procedure calls (RPC) that uses HTTP as the transport protocol and an XML vocabulary as the message payload. It has become very popular because of its simplicity (the full specification is less than ten printed pages), and most languages now have standard or readily available XML-RPC implementations. This includes Python, which started bundling xmlrpclib, an XML-RPC implementation by Fredrik Lundh, in version 2.2. Joe Johnston's IBM developerWorks article "Using XML-RPC for Web services" covers the basics of XML-RPC in the first three sections. Start there if you need to review the basic technology. In this article, we will focus on using the Python implementation. You must have Python 2.2. to run the examples in this article. Also, in the last article, we looked at the relative performance of XML-RPC, SOAP, and other distributed programming technologies. You may want to read that before making major decisions to deploy XML-RPC..."

[September 20, 2002] "The State of the Python-XML Art." By Uche Ogbuji. From XML.com (September 18, 2002). "Welcome to the first Python-XML column. Every month I'll offer tips and techniques for XML processing in Python and close coverage of particular packages. Python is an excellent language for XML processing, and there is a wealth of tools and resources to help the intrepid developer be productive. In what follows I'll survey these tools and resources, giving a sense of how broadly Python supports XML technologies and giving you a head start on the more in-depth topics to follow. One of the best things about Python-XML is the active community of practitioners and contributors. From introductory texts to references to mailing lists, these resources will provide answers to most questions worth asking about Python and XML... The following table lists the currently available Python-XML software that I judge to be significant... The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications. The general rules of thumb for including software are, first, whether it implements a technology or set of technologies strongly associated with XML; and, second, whether it does so in a way that is useful for any arbitrary XML file I may want to process. I've organized the table according to the areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I rate the vitality of each listed project as either 'weak', 'steady' or 'strong' according to the recent visible activity on each project: mailing list traffic, releases, articles, other projects that use it, etc... In the next article I'll tour the many facilities added to core Python by the PyXML package..."

[July 17, 2002]IBM alphaWorks Releases UDDI for Python Package (UDDI4Py). The IBM alphaWorks development team has released a UDDI4Py Python package that "allows the sending of requests to and processing of responses from the UDDI Version 2 APIs. UDDI4Py supports access to the UDDI Registry by abstracting the underlying XML constructs and by the transmission/processing of the various SOAP API messages. It is meant to complement the UDDI tool kit available to the Java development community, and gives customers the alternative of using a different Web services development platform. UDDI4Py is not for the development of Web Services, but rather for discovering and/or publishing the technical interfaces that describe specific Web services using the UDDI Registry. UDDI4Py supplies glue that allows Python applications to dynamically discover and/or publish Web services to and from the public registry. The rapid application development that the Python language provides is leveraged by any system working within the Web services arena and utilizing the UDDI4Py package." [Full context]

[May 25, 2001] "Indexing XML Documents. [XML Matters, Part #10.]" By David Mertz, Ph.D. (He-Of-Innumerable-Epithets e.g., 'Objectifier,' Gnosis Software, Inc.) From IBM developerWorks. May 2001. ['As XML document storage formats become popular, especially for prose-oriented documents, the task of locating contents within XML document collections becomes more difficult. This column extends the generic full text indexer presented in David's Charming Python #15 column to include XML-specific search and indexing features. This column discusses how the tool design addresses indexing to take advantage of the hierarchical node structure of XML.'] "Large multi-megabyte documents consisting of thousands of pages are not uncommon in corporate and government circles. Writers and technicians routinely produce voluminous product specifications, regulatory requirements, and computer system documentation in SGML (Standard Generalized Markup Language) format. In a technical sense, XML is a simplification and specialization of SGML. At a first approximation then, XML documents should also be valid SGML documents. Culturally, however, XML has evolved from a different direction. In one respect, XML is a successor for EDI. In another respect, it is a successor for HTML. Having a different cultural history from SGML, XML is undergoing its own process of tool development. It is becoming more popular, so expect to see more and more of both (usually) informal HTML documents and (usually) formal SGML documents migrating in the direction of XML formats -- particularly using XML dialects like DocBook. However, XML has not yet grown, within its own culture, a tool that effectively and efficiently locates content within large XML documents. General file-search tools like grep on Unix, and similar tools on other platforms, are perfectly able to read the plain text of XML documents (except for possible Unicode issues), but a simple grep search (or even a complicated one) misses the structure of an XML document. When searching for content in a file containing thousands of pages of documentation, you are likely to know much more than you can specify in just a word, phrase, or regular expression. Just which of those agricultural reports, for example, did Ms. June Apple write? A coarse tool like grep will generally find a lot of things that are not of interest. Moreover, ad hoc tools like grep, while very efficient at what they do, need to check the entire contents of large files each time a search is performed. For frequent searches, repeated full-file searching is inefficient... In response to the need outlined above, I have created the public-domain utility xml_indexer. This Python module can be used as a runtime utility and can also be easily extended by custom applications that use its services. The module xml_indexer, in turn, relies on the services of two public-domain utilities I have described in earlier IBM developerWorks articles: indexer and xml_objectify... It turned out that the design of xml_indexer was aided enormously by the object-oriented principles that went into designing indexer. Overriding just a few methods in the GenericIndexer class (actually, in its descendent SlicedZPickleIndexer -- but one could just as easily mix in any concrete Indexer class), made possible the use of an entirely new set of identifiers and data source. Readers who wish to use xml_indexer as part of their own larger Python projects should find its further specialization equally simple." Article also available in PDF format.

[September 30, 2000] Python XML Package from XML-SIG: "There will be an omnibus package that contains everything required for basic XML applications, along with documentation and sample code, and that's also easy to compile and install. A release candidate of the latest release of this package is now available as PyXML-0.5.5.1.tar.gz, dated June 5, 2000. This version contains: (1) SAX; (2) The Pyexpat module; (3) sgmlop; (4) The prototype DOM code (subsequently much revised in the CVS tree); (5) xmlproc, an XML parser written in Python."

"Vaults of Parnassus: Python Resource." Python/XML resources, for example: "4DOM 0.9.3 (A library for XML and HTML processing and manipulation using the W3C's Document Object Model for interface.); 4XPath 0.8.3 (An XML path processing library based on the W3C's specification for the XPath language for addressing parts of an XML document.); 4XSLT 0.8.3 (A python implementation of the W3C's XSLT language.); Expat wrapper module 1.3 - (Wrapper for the Expat XML parser toolkit.); o2x (Convert (emacs) outline-mode files to XML.); Pyxie (XML Processing Library); PyXPath 0.1 (An implementation of the XPath working draft 9-July-1999.); Quick XML Parser 1.2 (A fast, lightweight XML parsing tool); SAX for Python (SAX (Simple API for XML) is a common parser interface for XML parsers. It allows application writers to write applications that use XML parsers, but are independent of which parser is actually used.); sgmlop (The sgmlop module is a fast replacement for the regular expression-based parsers used in the sgmllib/htmllib and xmllib module. A single module supports both SGML and XML.); Wpre (A Simple Tool/Module for Dynamic HTML, XML, SQL, and Other Text Processing - Similar to Here Documents in the Shell); XIST (XML based extensible HTML generator.); XML Document 1.0a1 (Allows you to use xml objects in the Zope environment; you can create xml documents in Zope and leverage Zope to format, query, and manipulate xml.); XML Parser (Extended Markup Language experimental parser); xml-builder.tar.gz (A tool for manipulating libraries written in python from within xml); XML-RPC for Python (The xmlrpclib module is a client-side implementation of Userland's XML-RPC protocol.); XML-Sig CVS snapshot (A nightly updated snapshot of the XML-Sig's CVS package.); XML-Sig Package for Win32 (An easy to install EXE package containing all the Python XML tools for Win32 from the XML-Sig.); xml-toolkit 0.8 (XML app toolkit, including a full XML processor and implementation of WIDL.); xml_objects 0.1 (Parse xml file into list of python xml_object for db storage.); xml_pickle (Pickle python objects to an XML format.); xmlproc (An XML parser written in Python; it is a nearly complete validating parser, with only minor deviations from the specification.) XMLTreeCntrol 0.1 (Parses XML tags in a file into a wxPython tree control.)"

[June 23, 2000] Charming Python: Tinkering with XML and Python. An introduction to XML tools for Python." By David Mertz, Ph.D. (President, Gnosis Software, Inc.). From IBM DeveloperWorks. June 2000. [Get a run-down on the most useful Python modules for XML in this first installment of David Mertz' new Python column. A major element of getting started on working with XML in Python is sorting out the comparative capabilities of all the available modules. In this first installment of his new Python column, 'Charming Python,' David Mertz briefly describes the most popular and useful XML-related Python modules, and points you to resources for downloading individual modules and reading more about them. This article will help you determine which modules are most appropriate for your specific task.'] "Python is in many ways an ideal language for working with XML documents. Like Perl, REBOL, REXX, and TCL, it is a flexible scripting language with powerful text manipulation capabilities. Moreover, more than most types of text files (or streams), XML documents typically encode rich and complex data structures. The familiar 'read some lines and compare them to some regular expressions' style of text processing is generally not well suited to adequately parsing and processing XML. Python, fortunately (and more so than most other languages), has both straightforward ways of dealing with complex data structures (usually with classes and attributes), and a range of XML-related modules to aid in parsing, processing, and generating XML. One general concept to keep in mind about XML is that XML documents can be processed in either a validating or non-validating fashion. In the former type of processing, it is necessary to read a "Document Type Definition" (DTD) prior to reading an XML document it applies to. The processing in this case will evaluate not just the simple syntactic rules for XML documents in general, but also the specific grammatical constraints of the DTD. In many cases, non-validating processing is adequate (and generally both faster to run, and easier to program) -- we trust the document creator to follow the rules of the document domain. Most modules discussed below are non-validating; descriptions will indicate where validation options exist. . ."

[March 03, 1999] "Processing XML With Python." By Bob DuCharme and Paul Prescod. In <TAG> Volume 13, Number 1 (January 1999), pages 1-3. In this <TAG> feature article, Bob DuCharme interviews Paul Prescod on the use of XML in Python. "To learn more about what Python can offer to the XML developer, I talked to Paul Prescod, a Consulting Engineer for ISOGEN and the [Python evangelist] 'St. Paul' of Python in the XML world . . . Next question [sample Q/A]: (Q) why is Python so great. . . (A) #1) Python has a really great standard library; Now, languages like Python, Perl and Java are in a race to have the most robust standard libraries. In Python, you can build an HTTP server in three lines of code by subclassing an HTTServerBase class; #2) There are add-on libraries for everything in the world. The Python community is smaller than the Java or Perl communities, but I think that Python's library support is as good as those other languages because Python programmers are very prolific and share everything; #3) Python is interpreted, dynamic and really flexible. There is no compilation step and no need to design an entire type system before you start hacking; #4) Python is easy to integrate with other stuff. Python talks COM, CORBA, HTTP, FTP, SMTP, CGI, WDDX and almost everything else."

[April 16, 1999] A.M. Kuchling has announced the availability of the Python/XML distribution release 0.5.1. "The Python/XML distribution contains the basic tools required for processing XML data using the Python programming language, assembled into one easy-to-install package. The distribution includes parsers and standard interfaces such as SAX and DOM, along with various other useful modules. The code is being developed bazaar-style by contributors from the Python XML Special Interest Group. The Python/XML package currently contains: [1] XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius Garshol), xmllib.py (Sjoerd Mullender) using the sgmlop.c accelerator module (Fredrik Lundh); ]2] A SAX interface (Lars Marius Garshol); [3] DOM interface (Stefane Fermigier, A.M. Kuchling); [4] xmlarch.py, for architectural forms processing (Geir Ove Grønmo); [5] Unicode wide-string module (Martin von Löwis); [6] Various utility modules and functions; [7] Documentation and example programs." Additions in this version include a sizable DOM test suite, updates to various subpackages, added marshalling into various XML-based formats: a generic one for Python objects, WDDX, and XML-RPC. See also the Python/XML Documentation.

See the database entry Python for XML/SGML Processing. This section references work by Paul Prescod, including: "SGML Processing in Python"; "Using SGML Groves from Python, Visual Basic and other OLE client scripting languages"; "PySgml: A Module for SGML Processing in Python"; "An Introduction to Groves for Python Programmers."

"Python and SGML" - By W. Eliot Kimber. ". . . Its easy-to-use object orientation, its built-in list semantics, and the fact that it's interpreted make it really easy to create the same sorts of programs you might use DSSSL or Balise for, but with a general-purpose programming language that is easy to learn and much more familiar that DSSSL or Omnimark. Python is a free, publicly-developed language, not a commercial product. . ."

[May 14, 1998] Lumberjack. Work in progress (Sean McGrath). "Lumberjack is a toolkit for SGML/XML programming developed in the Python programming language. It will be made freely available as soon as it a) works and b) is sufficiently documented/packaged to be usable."

[September 30, 2000] "Content Management Provider PyBiz Announces Strategic Partnership With BeOpen in Utilizing Python Programming Language."PyBiz Announces BeOpen.com as a Technology and Service Partner for co-promotion of products and services. BeOpen.com, the leader in Python technologies, employs the core Python development team, including Python's creator and open source luminary Guido van Rossum. PyBiz strongly evangelizes Python and believes in BeOpen's initiatives, which include sponsorship of the core Python team and PythonLabs Professional Services. BeOpen believes that PyBiz's products and services, particularly eContentMgr, and XDisect, can have a major impact on the market for next-generation XML-based web solutions. Upon seeing a demo of eContentMgr, BeOpen's CEO, Mark Kaleem noted. PyBiz solves customer problems in the areas of distributed content management and personalized web publishing. Our products, eContentMgr and XDisect, help customers address their evolving e-business requirements. eContentMgr provides an XML-based open solution for content management built on XDisect. XDisect is our next-generation XML repository and search engine technology."

[December 14, 1998] "Python Slithers Forward." By Jeff Walsh. In InfoWorld (December 11, 1998). "Like its namesake's tendency to squeeze its prey, the success of the Python scripting language is applying pressure to vendors through open-source projects, adding Extensible Markup Language (XML) support and serving as the scripting language for a new Web application platform. The Corporation for National Research Initiatives, in Reston, Va., is in charge of Python's development, while the Web application platform -- called the Z Object Publishing Environment (Zope) is an open-source project overseen by Digital Creations, in Fredericksburg, Va. Digital Creations previously developed Principia, a Web application platform, and Bobo, a toolkit for publishing objects. These products have been rolled into Zope. . . An XML strategy for Zope is being solidified, because it is intended to support WebDAV and other XML-based standards. Zope can also serve as a repository for HTML pages, so users would not need to know how to write any code."

[October 20, 1998] Sean McGrath (Digitome Electronic Publishing) posted an announcement for a half-day XML tutorial prior to International Python Conference, "Programming XML from Python." Tuesday, November 10, 1998. The tutorial will be presented by Sean McGrath and Paul Prescod. McGrath will also be presenting a paper "A Python Based Production System for High Volume Electronic Publishing" (Lumberjack, XML). Note that "the XML-SIG is closing in on its first deliverable, a package containing the fundamental components: a few XML parsers, the SAX (and soon DOM) interfaces for processing XML documents, and various other useful things. At the Python Developer's Day session, we will discuss the current status of the software, and try to determine where we should go after version 1.0 is released."

[November 09, 1998] Uche Ogbuji posted an announcement for 4DOM Version 0.6.0. 4DOM is a CORBA-aware implementation of the W3C's Document Object Model written in Python. "4DOM is a close implementation of the DOM, including DOM Core level 1, DOM HTML level 1, and a few utility and helper components. 4DOM was designed from the start to work in a CORBA environment. Currently, the open-source Python orb, Fnorb is supported, indeed required. 4DOM is designed to allow developers rapidly design applications that read, write or manipulate HTML and XML."