Emily Morehouse

Co-Founder, Director of Engineering

@ Cuttlesoft

Cuttlesoft is a digital product development firm where I get to work on anything and everything from
CI/CD pipelines and system architectures to web and mobile development, UX and IOT.

We're going to cover:

Life cycle of a piece of Python code

Interacting with your code at various stages

Current Python optimizations

Practical applications

That said, what we're going to cover in this talk:

10,000 foot view of how we get from source code to execution of a piece of Python code

We're going to cover:

Life cycle of a piece of Python code

Interacting with your code at various stages

Current Python optimizations

Practical applications

What we're going to cover in this talk:

How to interact and inspect each step of that process

We're going to cover:

Life cycle of a piece of Python code

Interacting with your code at various stages

Current Python optimizations

Practical applications

What we're going to cover in this talk:

Current Python optimizations in the compilation process

We're going to cover:

Life cycle of a piece of Python code

Interacting with your code at various stages

Current Python optimizations

Practical applications

What we're going to cover in this talk:

AND Practical applications of using ASTs both in and outside of the Python world.

Should you care about language internals?
Should you care about programming language internals?
Should you care about language internals?
Yes, yes you should.

** DRINK SOME WATER! 🚰 **

A Peek Under The Hood

Let's take a peek under Python's hood.

From Source Code To Execution:

We'll start with our journey from source code to execution to build an understanding of the broad steps it takes to compile a piece of Python code

From Source Code To Execution:

Q: Interpreted or compiled?

From Source Code To Execution:

A: Both!

Compiler → Generates Bytecode

Interpreter → Executes Bytecode

Compiler generates bytecode

Compiler → Generates Bytecode

Interpreter → Executes Bytecode

Interpreter makes sense of the bytecode in order to execute your code

One of the ways in which Python is “dynamic” is that the same bytecode doesn’t always have the same effect.

Life Cycle of a Piece of Python Code

Let's zoom in a bit to see more of this process.
Life cycle of a Piece of Python Code
So we'll start with our Python source code.
Life cycle of a Piece of Python Code

We tokenize and parse that source code...

Life cycle of a Piece of Python Code

...into a parse tree.

Parse trees are a bit more detailed than we want, you'll see a digram in a moment.

Life cycle of a Piece of Python Code
So we transform our parse tree...
Life cycle of a Piece of Python Code
...into an Abstract syntax tree.
Life cycle of a Piece of Python Code
Another transformation into...
Life cycle of a Piece of Python Code
... a control flow graph, or CFG, which is a directed graph that models the flow of a program
Life cycle of a Piece of Python Code
From here, we can now emit...
Life cycle of a Piece of Python Code
... our bytecode.
Life cycle of a Piece of Python Code
The CPython virtual machine then executes the bytecode..
Life cycle of a Piece of Python Code
... to get our final output.

Now, this is quite the process with some very intricate steps,
so we're going to focus on a few key parts.
Life cycle of a Piece of Python Code
Our source, the AST, and generated bytecode.

What is an Abstract Syntax Tree?

What is an Abstract Syntax Tree?

Tree representing the structure of your source code.

An AST is a structural representation of your code in a tree format where every node represents a language construct (e.g. expressions, statements, variables, literals etc)

What is... a tree?

What is... a tree?
This is a tree.
What is... a tree?
This is ALSO a tree.
What is... a tree?

Trees have one root -- the top node

What is... a tree?

Nodes can branch off...

What is... a tree?

...to other nodes.

What is... a tree?

But each node except the root has a single unique parent

So when we read this..

What is... a tree?

... we start at the top node...

What is... a tree?

... and work our way as far down our first branch...

What is... a tree?

...as we can, before we start on the next side...

What is... a tree?

...performing a ...

What is... a tree?

...depth first..

What is... a tree?

...traversal. (pause to show full tree)

Parse Tree vs AST(1 + 2) * 3
Let's look at the difference between our two types of trees.
Parse Tree vs AST(1 + 2) * 3
First, we have a parse tree for a very simple piece of code.

Here we have an AST for the same piece of code.
We leave behind syntactical specifics so we can instead focus on how underlying objects are structured and the true meaning of the code

It's important to note that there is information that you lose when moving from a parse tree to an abstract syntax tree, and that's very much on purpose.

A CST has all of the information from your source code, including things like comments, whereas an AST is sort of stripped down and simplified. Thus, ASTs are simpler and easier to work with, but may not have all the information you want for certain applications.

Let's dig in.

Now that we have an understanding of the life cycle of a piece of Python code
let's see how we can interact with the process ourselves.

** DRINK SOME WATER! 🚰🚰🚰 **

Primary Tools:

These are built-in to the Python standard library.
Primary Tools:

ast module

dis module

Primary Tools:

ast module

dis module

Secondary Libraries:

Secondary Libraries:

astor

meta

codegen

Secondary Libraries:

astor

meta

codegen

Secondary Libraries:

astor

meta

codegen

Some of these don't fully work with Python 3, but they work for the most part.
We'll first import the built-in modules that we need.
Then we'll start with a very simple piece of Python code - a print statement
that takes a string as its argument.

We can then generate its AST...

...using `ast.parse`. We specify the source, as well as the mode that we want to process our code in.

It's important for us to use exec here, as it allows us to run any sort of valid Python code,

whereas eval only allows us to execute expressions

So now we have a tree generated from our code, stored as an _ast.Module.
So what does this look like? Well, we can dump the AST, but it doesn't give us anything that's extremely legible right off the bat,

but if we look closely, we see that we have a function called "print" that is being passed a string.
We'll push forward. We can now take this AST and compile it into bytecode by calling `compile`.
Awesome! We now have a something called a code object.
We can poke at that and get a little more information out of it.

Code Objects:

What's a code object?
Code objects:

Contains instructions and information needed to run the code.

Internal representation of a piece of Python code.

Code objects are immutable structures used as internal representations of Python code that are generated by the Python compiler...
Code objects:

Contains instructions and information needed to run the code.

Internal representation of a piece of Python code.

... that store necessary information to run the code.
There are a lot of parts to a code object, and most of them are only meaningful
to the bytecode interpreter. BUT, there are a few interesting parts for us to touch on.
Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

A name stored as a string for this code object.

For a function this would be the function’s name.

For a class this would be the class’ name.

Since our code object was generated using the compile method, we can't specify the name, it automatically gets named module.

Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

A tuple containing the names of the local variables (including arguments).
Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

The maximum size required of the value stack when running this object. This size is statically computed by the compiler.
Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

A tuple containing the code literals used by the bytecode.
Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

The number of positional arguments the code object expects to receive, including those with default values.
Code objects:

co_name

co_varnames

co_stacksize

co_consts

co_argcount

co_code

And our most important part for us -- the string representing the sequence of bytecode instructions.
We know we can peek inside the code object and see a bunch of different things, so let's take a look at our raw bytecode. Awesome, this definitely looks like something a computer could understand.
We can also run our compiled code directly using exec
Let's go back and take a closer look at out bytecode. We can force the interpreter to show us bytecode that's more legible to humans.

Somewhere along the lines, our compiler makes sense of these bytes, but it would take a lot of time for us to go through the giant switch statement that handles opcodes to figure it our ourselves.
We can use Python's disassembler to help us out. We can now see a clear depiction of what is going on -

(elaborate)

And there's a built-in helper function to print out a bunch of this useful information for us.

(elaborate)

But... what does all of this look like in tree form?
print("may the force be with you")
Here's the AST and disassembled bytecode for a print statement, side by side.

This is pretty readable at this point! We can see that there's a single statement
so we don't need any nesting in our AST.
We'll call LOAD_NAME for our print function, LOAD_CONST for our string that we
passed in, and then simply execute and return.

Simple enough?
if a == 23:
print("may the force be with you")
But as we continue to add code, even just a simple if statement...
a = 32
if a == 23:
print("may the force be with you")
... or variable declaration, we see that our examples get progressively more complex,
but our AST still helps us visualize the path that our code takes.

However, a caveat to all of this is that what we wind up with in our AST or bytecode is not necessarily going to match our
source code 1 to 1. There are certain shortcuts that our compiler takes when creating our
bytecode, so let's take a look....

Current Optimizations

... at what sorts of compiler optimizations are affecting our bytecode.
Current Optimizations

Python's compiler is purposefully simple

Peephole Optimizer

Few AST optimizations besides constant folding

Python's compiler is purposefully simple, or as simple as can be.

The best way to optimize Python is to replace the implementation for another interpreter,
like PyPy, Jython, Cython, IronPython, Stackless Python, even MicroPython.

Python, the language, can be completely independent of the implementation used to bring the language to life.
Current Optimizations

Python's compiler is purposefully simple

Peephole Optimizer

Few AST optimizations besides constant folding

Current Optimizations

Python's compiler is purposefully simple

Peephole Optimizer

Few AST optimizations besides constant folding

Peephole Optimizations

Peephole Optimizations

Looking around without moving your head.

Peephole Optimizations
x = 1
y = x + 2
Peephole Optimizationsy = 1 + 2Peephole Optimizations
One of the other interesting examples of a peephole optimization is the example seen here. Essentially,
the optimzation takes excessive or redundant logic and simplifies it for you.

Constant Folding

Constant Folding

Evaluating constant expressions at compile time.

Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime. Terms in constant expressions are typically simple literals, such as the integer literal 2, but they may also be variables whose values are known at compile time
a = 2 * 3
Now, not all languages do this, and sometimes don't evaulate all expressions into constants, but a simple example of this is taking 2 times 3...
a = 6
...and replacing it with 6.

How The AST Affects Your Code

How The AST Affects Your Code

Bytecode is created using an AST

Bytecode is stored in *.pyc files

VERY micro speed-ups

After 3.2, *.pyc and *.pyo files are neatly organized into __pycache__ directories