Friday, July 17, 2009

Because PyPy will be presenting at the upcoming euroscipy conference, I have been playing recently with the idea of NumPy and PyPy integration. My idea is to integrate PyPy's JIT with NumPy or at least a very basic subset of it. Time constraints make it impossible to hand write a JIT compiler that understands NumPy. But given PyPy's architecture we actually have a JIT generator, so we don't need to write one :-)

Our JIT has shown that it can speed up small arithmetic examples significantly. What happens with something like NumPy?

I wrote a very minimal subset of NumPy in RPython, called micronumpy (only single-dimension int arrays that can only get and set items), and a benchmark against it. The point of this benchmark is to compare the performance of a builtin function (numpy.minimum) against the equivalent hand-written function, written in pure Python and compiled by our JIT.

The goal is to prove that it is possible to write algorithms in Python instead of C without loss of efficiency. Sure, we can write some functions (like minimum in the following example), but there is a whole universe of other ufuncs which would be cool to have in Python instead, assuming this could be done without a huge loss in efficiency.

Here are the results. This is comparing PyPy svn revision 66303 in the pyjitpl5 branch against python 2.6 with NumPy 1.2.1. The builtin numpy.minimum in PyPy is just a naive implementation in RPython, which is comparable to the speed of a naive implementation written in C (and thus a bit slower than the optimized
version in NumPy):

NumPy (builtin function)

0.12s

PyPy's micronumpy (builtin function)

0.28s

CPython (pure Python)

11s

PyPy with JIT (pure Python)

0.91s

As we can see, PyPy's JIT is slower than the optmized NumPy's C version, but still much faster than CPython (12x).

Why is it slower? When you actually look at assembler, it's pretty obvious that it's atrocious. There's a lot of speedup to be gained out of just doing simple optimizations on resulting assembler. There are also pretty obvious limitations, like x86 backend not being able to emit opcodes for floats or x86_64 not being there. Those limitations are not fundamental in any sense and can be relatively straightforward to overcome. Therefore it seems we can get C-level speeds for pure Python implementations of numeric algorithms using NumPy arrays in PyPy. I think it's an interesting perspective that Python has the potential of becoming less of a glue language and more of a real implementation language in the scientific field.

Cheers,
fijal

Because PyPy will be presenting at the upcoming euroscipy conference, I have been playing recently with the idea of NumPy and PyPy integration. My idea is to integrate PyPy's JIT with NumPy or at least a very basic subset of it. Time constraints make it impossible to hand write a JIT compiler that understands NumPy. But given PyPy's architecture we actually have a JIT generator, so we don't need to write one :-)

Our JIT has shown that it can speed up small arithmetic examples significantly. What happens with something like NumPy?

I wrote a very minimal subset of NumPy in RPython, called micronumpy (only single-dimension int arrays that can only get and set items), and a benchmark against it. The point of this benchmark is to compare the performance of a builtin function (numpy.minimum) against the equivalent hand-written function, written in pure Python and compiled by our JIT.

The goal is to prove that it is possible to write algorithms in Python instead of C without loss of efficiency. Sure, we can write some functions (like minimum in the following example), but there is a whole universe of other ufuncs which would be cool to have in Python instead, assuming this could be done without a huge loss in efficiency.

Here are the results. This is comparing PyPy svn revision 66303 in the pyjitpl5 branch against python 2.6 with NumPy 1.2.1. The builtin numpy.minimum in PyPy is just a naive implementation in RPython, which is comparable to the speed of a naive implementation written in C (and thus a bit slower than the optimized
version in NumPy):

NumPy (builtin function)

0.12s

PyPy's micronumpy (builtin function)

0.28s

CPython (pure Python)

11s

PyPy with JIT (pure Python)

0.91s

As we can see, PyPy's JIT is slower than the optmized NumPy's C version, but still much faster than CPython (12x).

Why is it slower? When you actually look at assembler, it's pretty obvious that it's atrocious. There's a lot of speedup to be gained out of just doing simple optimizations on resulting assembler. There are also pretty obvious limitations, like x86 backend not being able to emit opcodes for floats or x86_64 not being there. Those limitations are not fundamental in any sense and can be relatively straightforward to overcome. Therefore it seems we can get C-level speeds for pure Python implementations of numeric algorithms using NumPy arrays in PyPy. I think it's an interesting perspective that Python has the potential of becoming less of a glue language and more of a real implementation language in the scientific field.

Thursday, July 16, 2009

Last week (from 6th to 10th of July) Anto, Armin and me (Carl Friedrich) were in
the magnificent city of Genova, Italy at the ECOOP conference. In this blog
post I want to give a (necessarily personal) account of what we did there.

Nearly all the other talks were rather interesting as well. I particularly liked
the one by Hans Schippers, who presented a machine model built on delegation
called delMDSOC. The model is meant implement most features that a language
would need that makes it possible to separate cross-cutting concerns. In the
talk at ICOOOLPS he presented an extension to the model that adds concurrency
support, using a combination of actors and coroutines. He then showed that the
concurrency mechanisms of Java, Salsa (and extension of Java adding actors) and
Io can be mapped to this model.

Furthermore there were two interesting invited talks, one by Andreas Gal
(Mozilla), and one by Cliff Click (Azul Systems). Andreas explained how
TraceMonkey works. This was very useful for me, because his talk was just before
mine and I could thus kill most of my introduction about tracing JIT compilers
and have more time for the really interesting stuff :-). Cliff talked about
implementing other languages on top of the JVM and some of the pitfalls in
getting them perform well.

All in all, ICOOOLPS was a very enjoyable workshop, also with many interesting
discussions.

On Tuesday there were more workshops, but also the PyPy tutorial, so I only went
to a few talks of the COP workshop and spent the rest of the morning
preparing the tutorial (see next section).

Tutorial

On Tuesday afternoon we gave a PyPy Tutorial, as part of the ECOOP summer
school. The first lesson we learned was that (as opposed to a community
conference) people don't necessarily want to actually take their laptop out and
try stuff. We gave a slow walk-through about the full life-cycle of development
of a dynamic language interpreter using PyPy's tool-chain: Starting from writing
your interpreter in RPython, testing it on top of CPython to translating it to
C, .NET or Java to actually adding hints to get a JIT inserted.

There were about seven people attending the tutorial, a couple of which were
very interested and were asking questions and discussing. Some of the
discussions were even very technical, e.g. one about the details of our
type-inference algorithm for RPython and why we cannot do a bottom-up analysis
but have to use forward-propagation instead.

Jan Vitek of Purdue University told of some of the problems of the OVM
project, which is (among other things) a Java implementation in Java (OVM also
wants to support implementing VMs for other languages with it, if I understood
correctly). He said that the project has
essentially gotten too large and complicated, which means that it is very hard
for new people to get into the project. While PyPy doesn't have some of the
problems of a full Java implementation (e.g. right now our concurrency support
is minimal) I definitely think that some of these risks apply to PyPy as well
and we should find ways to improve the situation in this regard. Channeling
Samuele: Somewhere inside the large lumbering blob of PyPy there is an elegant
core trying to get out.

Main Conference

From Wednesday till Friday the main conference was happening. Many of the
talks were not all that interesting for me, being quite Java centric. One talk
that I liked a lot was "Making Sense of Large Heaps", which was presented by
Nick Mitchell (IBM). He presented a tool called "Yeti" that can be used to
analyze large heaps of Java programs. The tool uses some clever algorithms and
heuristics to summarize the heap usage of data structures in intelligent ways to
make it easier to find possible memory-wasters in a program. Nick also gave Anto
and me a demo of the tool, where we tried to apply it to pypy-jvm (we found
out that a fifth of the static data in there belongs to the parser/compiler :-(
).

On each of the days of the conference there was a keynote. I missed the one by
Simon Peyton-Jones on Wednesday about type classes in Haskell. On Thursday,
David Ungar was awarded the Dahl-Nygaard-Prize for his work on the Self
programming language. Subsequently he gave a really inspiring keynote with the
title "Self and Self: Whys and Wherefores" where he recollected Self's history,
both on a technical as well as on a social level. Parts of the talk were
snippets from the movies Self: The Movie and Alternate Reality Kit, both
of which I highly recommend.

The keynote on Friday was by Cliff Click with the title "Java on 1000 Cores:
Tales of Hardware/Software Co-design". He described the custom CPU architecture
that Azul Systems has developed to run Java server applications on hundreds of
cores. The talk mostly talked about the hardware, which I found very interesting
(but some people didn't care for too much). Azul's CPU is essentially 54 in-order
RISC cores in a single processor. The cores have a lot of extensions that make
it easier to run Java on them, e.g. hardware read- and write-barriers,
hardware-transactional-memory and hardware escape-detection (!).

In addition to the talks, there is of course always the hallway track (or coffee
track) which is the track where you stand in the hallway and discuss with
people. As usual, this was the most interesting part of the conference. One of
those talks was Anto and me giving a PyPy demo to David Ungar. We had a very
interesting discussion about VM implementation in general and the sort of
debugging tools you need to write in particular. He liked PyPy a lot, which
makes me very happy. He also liked the fact that I have actually read most Self
papers :-).

Last week (from 6th to 10th of July) Anto, Armin and me (Carl Friedrich) were in
the magnificent city of Genova, Italy at the ECOOP conference. In this blog
post I want to give a (necessarily personal) account of what we did there.

Nearly all the other talks were rather interesting as well. I particularly liked
the one by Hans Schippers, who presented a machine model built on delegation
called delMDSOC. The model is meant implement most features that a language
would need that makes it possible to separate cross-cutting concerns. In the
talk at ICOOOLPS he presented an extension to the model that adds concurrency
support, using a combination of actors and coroutines. He then showed that the
concurrency mechanisms of Java, Salsa (and extension of Java adding actors) and
Io can be mapped to this model.

Furthermore there were two interesting invited talks, one by Andreas Gal
(Mozilla), and one by Cliff Click (Azul Systems). Andreas explained how
TraceMonkey works. This was very useful for me, because his talk was just before
mine and I could thus kill most of my introduction about tracing JIT compilers
and have more time for the really interesting stuff :-). Cliff talked about
implementing other languages on top of the JVM and some of the pitfalls in
getting them perform well.

All in all, ICOOOLPS was a very enjoyable workshop, also with many interesting
discussions.

On Tuesday there were more workshops, but also the PyPy tutorial, so I only went
to a few talks of the COP workshop and spent the rest of the morning
preparing the tutorial (see next section).

Tutorial

On Tuesday afternoon we gave a PyPy Tutorial, as part of the ECOOP summer
school. The first lesson we learned was that (as opposed to a community
conference) people don't necessarily want to actually take their laptop out and
try stuff. We gave a slow walk-through about the full life-cycle of development
of a dynamic language interpreter using PyPy's tool-chain: Starting from writing
your interpreter in RPython, testing it on top of CPython to translating it to
C, .NET or Java to actually adding hints to get a JIT inserted.

There were about seven people attending the tutorial, a couple of which were
very interested and were asking questions and discussing. Some of the
discussions were even very technical, e.g. one about the details of our
type-inference algorithm for RPython and why we cannot do a bottom-up analysis
but have to use forward-propagation instead.

Jan Vitek of Purdue University told of some of the problems of the OVM
project, which is (among other things) a Java implementation in Java (OVM also
wants to support implementing VMs for other languages with it, if I understood
correctly). He said that the project has
essentially gotten too large and complicated, which means that it is very hard
for new people to get into the project. While PyPy doesn't have some of the
problems of a full Java implementation (e.g. right now our concurrency support
is minimal) I definitely think that some of these risks apply to PyPy as well
and we should find ways to improve the situation in this regard. Channeling
Samuele: Somewhere inside the large lumbering blob of PyPy there is an elegant
core trying to get out.

Main Conference

From Wednesday till Friday the main conference was happening. Many of the
talks were not all that interesting for me, being quite Java centric. One talk
that I liked a lot was "Making Sense of Large Heaps", which was presented by
Nick Mitchell (IBM). He presented a tool called "Yeti" that can be used to
analyze large heaps of Java programs. The tool uses some clever algorithms and
heuristics to summarize the heap usage of data structures in intelligent ways to
make it easier to find possible memory-wasters in a program. Nick also gave Anto
and me a demo of the tool, where we tried to apply it to pypy-jvm (we found
out that a fifth of the static data in there belongs to the parser/compiler :-(
).

On each of the days of the conference there was a keynote. I missed the one by
Simon Peyton-Jones on Wednesday about type classes in Haskell. On Thursday,
David Ungar was awarded the Dahl-Nygaard-Prize for his work on the Self
programming language. Subsequently he gave a really inspiring keynote with the
title "Self and Self: Whys and Wherefores" where he recollected Self's history,
both on a technical as well as on a social level. Parts of the talk were
snippets from the movies Self: The Movie and Alternate Reality Kit, both
of which I highly recommend.

The keynote on Friday was by Cliff Click with the title "Java on 1000 Cores:
Tales of Hardware/Software Co-design". He described the custom CPU architecture
that Azul Systems has developed to run Java server applications on hundreds of
cores. The talk mostly talked about the hardware, which I found very interesting
(but some people didn't care for too much). Azul's CPU is essentially 54 in-order
RISC cores in a single processor. The cores have a lot of extensions that make
it easier to run Java on them, e.g. hardware read- and write-barriers,
hardware-transactional-memory and hardware escape-detection (!).

In addition to the talks, there is of course always the hallway track (or coffee
track) which is the track where you stand in the hallway and discuss with
people. As usual, this was the most interesting part of the conference. One of
those talks was Anto and me giving a PyPy demo to David Ungar. We had a very
interesting discussion about VM implementation in general and the sort of
debugging tools you need to write in particular. He liked PyPy a lot, which
makes me very happy. He also liked the fact that I have actually read most Self
papers :-).