Looks like we can give a PyDict of global variables where we give None now.

Setting variables inside a PyDict looks easy enough: PyDict::set_item
only requires the key and value to implement ToPyObject,
which is already implemented for a lot of types, including strings, integers, vectors, and many more.

Let’s see how it works, by defining five = 5 from Rust:

fnrun_python(code: &str){letpy=pyo3::Python::acquire_gil();letglobals=pyo3::types::PyDict::new(py.python());// "five" and 5 are automatically converted to a PyObject by ToPyObject.
globals.set_item("five",5).unwrap();ifletErr(e)=py.python().run(code,Some(globals),None){e.print(py.python());panic!("Python code failed");}}

Transparent syntax

Now that we know converting objects from Rust to Python is not going to be a problem (thanks to PyO3’s ToPyObject),
we can move on to the problem of how the user will indicate which of their variables need to be converted and injected in the globals dictionary.

The most ergonomic way would probably be something that’s completely transparent:

leta=5;python!{b=10print(a+b)}

If we could make this work, users could just refer to any variable in scope as if they didn’t even switch languages.
Sounds perfect.

However, our procedural macro does not have access to the surrounding code,
so would not know that a even exists in the Rust code.
In order to know a comes from Rust, but b does not,
it’d have to understand the Python code and see that a is used without being initialized,
but b is initialized inside the Python code.
This goes much further than parsing b = 10 and understanding that defines b, meaning it should not be captured.
For example, print is not explicitly defined anywhere, but it does not refer to anything from Rust.

If we think a bit more about this, it only gets worse:

leta=5;python!{fromsomelibraryimport*globals()['c']=10print(a+b+c)}

Without fully parsing and running all of the Python code, we can’t possibly do this.

So, there needs to be some way the user can tell the macro which variables need to be captured.

Capture list

Capturing variables..
That sounds like something closures (sometimes referred to as lambdas) do.
Maybe we can draw some inspiration from there.

In Rust, a closure is defined using ||:

leta=5;letf=|b|a+b;assert_eq!(f(10),15);

Here, f behaves like a function. It takes one argument (b), and returns a + b.
a was implicitly captured from the environment.
This implicit capturing is exactly what we discussed before,
which is not feasible for python!{}.

I’ve omitted commas or any other syntax apart from the [] and the names,
to keep parsing as simple as possible.
If it turns out to work well, we can always improve the syntax later.

Implementation

Alright, let’s see if we can implement this.

We’ll have to parse the capture list
and generate code that calls globals.set_item("var", var) for each variable.
This code should end up in run_python and executed after making the globals dictionary,
but before executing Python::run.

Passing arbitrary code to a function is easy in Rust:
using impl Fn, run_python can accept a closure containing the code with all the set_item calls.
We’ll have to give the closure access to the PyDict:

That #(..)*-syntax is a quote!-feature
which will repeat its contents as many times as needed. In this case, for each element of captures.
We again use stringify!() to turn the variable name into a string.

Hold up. a unused? b unused? name 'a' not defined? What is going on?
Warnings aside, the code compiled fine. So the macro did generate valid code.

But it’s wrong, somehow.

Now how do we debug our procedural macro?

There is an unstable rustc option called --pretty=expanded, which will show us the code after all macro expansions.
Unlike what the name suggests, the output is usually not very pretty, so it’s a good idea to pass the output through rustfmt for readability.
The cargo rustc command allows us to pass custom options to rustc:

// What it generated:
globals.set_item("\"a\"","a").expect("Conversion failed");// What we wanted:
globals.set_item("a",a).expect("Conversion failed");

Instead of just the identifier a, quote!() produced the string literal "a".
So instead of (stringify!(a), a), we got (stringify!("a"), "a"), which expands to ("\"a\"", "a").

We can’t really blame quote!() here, because we did ask it to insert a String (from captures, which is a Vec<String>).
It makes sense that it’ll turn Strings into string literals.

To fix this, we should instead give it Idents, which represent identifiers.

Looking at the documentation of Ident,
we see that Idents are made from a string (the name) and a Span.
This span is used not only for compiler errors to display their location, but also for macro hygiene.
It makes sure that names defined in macros do not get mixed up with names on the outside.

In this case, we do want to refer to variables on the outside, and it’d be nice
if errors (e.g. about a variable not existing) would point to the place where
the user named it in the capture list.

This means we should not make our own Idents, but simply use the original ones we got out of the TokenStream.
This is a pretty simple change to get_captures. All we have to do is change the return type:

The documentation of proc-macro2
explains it is simply a wrapper around proc_macro,
but one which also allows using it outside procedural macros, where the compiler-provided proc_macro crate doesn’t exist.
Useful for unit testing and more.

For procedural macros, it suggests converting proc_macro::TokenStreams
directly to proc_macro2::TokenStreams before doing anything else,
since crates like quote and syn use proc_macro2 for everything.

Thanks to quote, generating the right code was quite easy.
Quite ergonomic how it allows placeholders like #a to substitute variables.

Uh. Wait. Placeholders. That gives me an idea.

Placeholder syntax

What if we use quote's solution, instead of the one we stole from C++'s closures?

leta=3;letb=20;python!{c=100print(#a+#b+c)}

I like it.

Now, is #a the best option? Or should we use @a, $a, ^a, rust:a, «a», or something else?

First of all, it needs to be something that the Rust tokeniser allows.
As we’ve seen in part 1A, there’s no way around that.

Then, it’s important to pick something that doesn’t already have a meaning in Python.
# is used for comments, @ is used for annotations, etc.

Another consideration is syntax highlighting.
Users of our macro will probably be writing their code in an editor that knows how to syntax-highlight Rust code,
but knows nothing about our python!{} macro.
It’d be nice if our syntax doesn’t completely break syntax highlighting,
or better: if our placeholders would show up in some recognizable way.

To do that, we need to pick some existing Rust syntax, which has no meaning in Python.

The first thing that comes to mind is lifetime syntax: 'a.
Most Rust editors will understand the ' and the name afterwards belong together,
and although single quoted strings are already a thing in Python,
those are already unusable in our macro anyway.
(See part 1A for details.)

Sounds like this can work!

Here’s how that would look:

leta=3;letb=20;python!{c=100print('a+'b+c)}

Nice. Let’s do this.

Implementation

What do we need to do to implement this? Let’s see…

Throw out the get_captures function and call.

Modify our reconstruction function to detect 'a-syntax, replace it by a variable, and remember the Ident for later.

Modify the quote!() to use our new list of Idents.

Done?

Users might refer to the same variable multiple times, but we should capture them only once.
So it’s probably a good idea to use a set or map to store all the placeholders we find:

structSource{// <<snip>
captures: BTreeMap<String,Ident>,}

In our Source::reconstruct_from function,
we now need to look for '-tokens and extract the token after it.

We now no longer want to process exactly one token per iteration,
but sometimes consume the next one as well (to see what’s after a ').
This means we can’t easily use a for-loop anymore,
as it doesn’t give us access to the underlying iterator.
Using a while let lets us define the iterator ourselves:

letmutinput=input.into_iter();whileletSome(t)=input.next(){// Now we can use input.next() to consume more tokens.
}

Note how it points to both the ' and c.
rustc knows these two tokens belong together,
because we used a syntax it is already familiar with.

What’s next

Now we have a way to very easily get data into our python!{} blocks.
It works for everything that implements ToPyObject, including strings, numbers,
and all kind of collections.

What’s still missing, is a nice way to get data out.
We’ll look at that in a later post,
but first we’re going to look at a very different topic.

In part 3,
we’re going to ‘compile’
the Python code into Python byte-codeat compile time, to catch errors like invalid Python syntax before even running it.
This also speeds up execution times,
by moving part of the work into the compilation step.