Clone this wiki locally

Towards Full Support of Python Syntax in Cython

Abstract

The Cython language makes writing C extensions for the Python language as
easy as writing in Python itself, but it currently has some shortcomings.

I propose to improve Cython's compiler design by developing and integrating:

a new Parser,

an Abstract Syntax Tree ,and

a Flow Graph

These enhancements will allow:

full parsing of all valid Python 2.5 code

easier transition to Python 3000

simpler support of new features

better reasoning about the code for optimizations

This is not a from-scratch implementation. I worked on similar projects before and many of
the proposed ideas have been already tested in the CPython compiler and in pypy, another open source python compiler.
(I participated in a pypy sprint in 2006 and worked on an AST peephole optimizer)

By modifying and adapting existing open source code and leveraging from my
experience, I shall accomplish this project in the available 12 weeks.

The ability to maintain Cython in the long term will be significantly enhanced by these modifications.

Detailed description

This document is meant to be a detailed technical version of the submitted Soc proposal.
Some content may be repeated.

Intro

The Python language has been demonstrated to be successful for
its extendibility and his clean C API, besides at its intuitive and simple syntax.
Many of the existing extensions are developed using the C language and this can be error prone and tedious which means longer development cycles or the rejection of developing the extension. Cython, as Pyrex before, focuses on solving this problem.

What is the Objective ?

What is Cython ? How it is related to Python?

The Cython language makes writing C extensions for the Python language as easy as writing in Python.

The Cython Compiler generates C code that makes full use of the CPython API, making it easy
to mix compiled and non-compiled code. It does not convert Python code to a pure C program.

Cython is based on the well-known Pyrex, but supports more cutting edge functionality
and optimizations.

Other solutions exist, although generated code is ugly and poorly optimized.

What about the proposal ?

The current Cython implementation has some shortcomings.
After studying Cython's design, I propose to improve Cython's compiler design by developing and integrating:

1. a new Parser
1. an Abstract Syntax Tree
1. a Flow Graph

These enhancements will allow full parsing of all valid Python 2.5 code.

The central idea in my design is the construction of an Abstract Syntax Tree from the input source.
The benefit of this approach are further underlined by noticing that Python itself adopted it during the 2.4 -> 2.5 transition
[http://www.python.org/dev/peps/pep-0339/ PEP 0339]

(More in the "How it will be done?" section)

Why the need of this project ? Why should be the project approved?

Cython's long term maintainability will be significantly enhanced by the project

The project would benefit the python, Cython and Open Source community in several ways

What will the accomplished project deliver? What is in it for the Cython project ?

These enhancements will allow:

1. Simpler support of new features
1. Full parsing of Python 2.5 syntax
1. Better reasoning about the code for optimizations
1. Pave the way to support Python 3000 features
1. Improve performance: many well know optimizing techniques could be implemented, boosting extension speed execution

What is in it for the Python community ?

1. Better usability: Faster prototyping, development and code re-use
1. Better module support for the different Python API versions
1. Easier porting for existing C code, giving the python community more extensions to use and leverage
1. Improve performance: existing python modules could be translated to Cython

What is in it for the Open Source community ?

1. Easier way of integrating python with existing C/C++ applications
1. C Programs could leverage from the power of python in a faster way

What is in it for me?

This work will be the base of my master's thesis on
"automatic code generation from dynamic language for high performance computers"

1. Gain a better understanding of Cython and Python internals
1. Improve my skill in writing optimizing python applications
1. Learn more on Python and C / C++ integration
1. Have fun with the community of Cython
1. Be sponsored by my favorite company (Google)

How it will be done ?

Introduction

In the current compilation phase, from Cython source to C code, two steps are involved:

#!python
ast = parser.parse (source)
ast.visit(AstVisitor())
class AstVisitor:
...
def visitIfStmt (self, node):
node.test.accept (self)
node.then.accept (self)
node.else_.accept (self)
This is NOT a "from-scratch" implementation. I have done this before and many of
the proposed ideas are have been already tested in CPython compiler and in pypy (another open source python compiler)(I know this, because I participated to a pypy sprint in 2006 and worked on an Ast peephole optimizer).
By modifying and adapting existing open source code and leveraging from my
experience, I shall accomplish this project in the available 12 weeks.

Does the Project have sub-tasks ?

The project is divided in three sub tasks:

Parser

Abstract Syntax Tree

Flow Graph

Parser

Task:: Adapt existing implementation of python parsers to Cython:
Goal:: The Parser is able to parse the full Cython Test Suite (~200 tests, ~4176 line of code)
Estimated Time:: 1 - 2 weeks

Summary Description::
* creating the grammar
* parsing the grammar
Task Page:
I can adapt this from a previous experience in a pypy sprint (see further section: Why me?)
The parser will be generated from the standard Python grammar file, with additions for Cython keywords.

Abstract Syntax Tree

Task:: Use the Parser to generate an Abstract Syntax Visitor
Goal:: The Visitor runs and passes the entire Python and Cython test Suites (~200 tests, ~4176 line of code)
Estimated Time:: 3 - 6 weeks
Description::
From the parsed structure

Flow Graph

Task:: Adapt the Flow Model to implement a Builder from the AST
Goal:: The FlowGraph Logic is able to resolve the Locals test
Estimated Time:: 6+ weeks
Description::
At this phase we have functional AST. To build a function's flow graph you create
a Visitor, that hooks when he enter in a visitFuncDef Node:

class FlowGraphGen(AstVisitor):
def visitFuncDef(self,node):
graph = FlowGraph(node)
#...
The Flow Graph's model is described in [:enhancements/FlowGraph: ]
The proposed model is largely inspired by Pypy's one.
The main difference is how it will be assembled:
Their implementation assembles the Flow Graph visiting the Source-Compiled Byte Code
I will assemble it visiting the FunctionNode's AST.

How do you know that your project is feasible?

During this week I spent time on the IRC Cython channel discussing with the Cython community the pros, cons, caveats and pitfalls
of my proposal.

With the objective to understand my idea, to explore the entity of the project, and obtain an inside understanding of the written code, I started modifying the existing code, and wrote some first tests.

Why Me ?

Many of the ideas of the Flow Graph and the base code comes from the pypy project.

In Geneve 2006, I participated in the EuroPython Conference's pypy sprint and worked on the stackless back-end support.
During the sprint I proposed to implement a peephole optimizer for the pypy AST.
For evidence see http://codespeak.net/pipermail/pypy-svn/2006-July.txt and search for "misto" (my nick)

I am an experienced Python / C/C++ programmer

Cython is written in python and I have 3+ years of experiance as a python
programmer. The task also involves knowledge of C/C++ that I have.
Finally, I have experience in writing compilers.

I am competent about project's topic

I'm graduating with a master's degree in "Computer Science" from the University of Rome "La Sapienza".
I am writing a thesis on "automatic code generation from dynamic language for high performance computers", where I dealt with and
studied many of the project pitfalls.