Planet Scheme

June 05, 2020

Given an array A consisting of N integers, return the maximum sum of two numbers whose digits add up to an equal sum. If there are not two numbers whose digits have an equal sum, the function should return -1. For example, A = [51, 71, 17, 42] would output 93 because there are two sets of numbers with the same digit-sum, (51, 42) with a digit-sum of 6 and (17, 71) with a digit-sum of 8, and the first pair has the maximum sum of two numbers of 93.

Your task is to write a program to calculated the requested maximum sum. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

June 03, 2020

Greets, my peeps! Today's article is on a new compiler for Guile. I made things better by making things worse!

The new compiler is a "baseline compiler", in the spirit of what modern web browsers use to get things running quickly. It is a very simple compiler whose goal is speed of compilation, not speed of generated code.

Honestly I didn't think Guile needed such a thing. Guile's distribution model isn't like the web, where every page you visit requires the browser to compile fresh hot mess; in Guile I thought it would be reasonable for someone to compile once and run many times. I was never happy with compile latency but I thought it was inevitable and anyway amortized over time. Turns out I was wrong on both points!

The straw that broke the camel's back was Guix, which defines the graph of all installable packages in an operating system using Scheme code. Lately it has been apparent that when you update the set of available packages via a "guix pull", Guix would spend too much time compiling the Scheme modules that contain the package graph.

The funny thing is that it's not important that the package definitions be optimized; they just need to be compiled in a basic way so that they are quick to load. This is the essential use-case for a baseline compiler: instead of trying to make an optimizing compiler go fast by turning off all the optimizations, just write a different compiler that goes from a high-level intermediate representation straight to code.

So that's what I did!

it don't do much

The baseline compiler skips any kind of flow analysis: there's no closure optimization, no contification, no unboxing of tagged numbers, no type inference, no control-flow optimizations, and so on. The only whole-program analysis that is done is a basic free-variables analysis so that closures can capture variables, as well as assignment conversion. Otherwise the baseline compiler just does a traversal over programs as terms of a simple tree intermediate language, emitting bytecode as it goes.

Interestingly the quality of the code produced at optimization level -O0 is pretty much the same.

This graph shows generated code performance of the CPS compiler relative to new baseline compiler, at optimization level 0. Bars below the line mean the CPS compiler produces slower code. Bars above mean CPS makes faster code. You can click and zoom in for details. Note that the Y axis is logarithmic.

The tests in which -O0 CPS wins are mostly because the CPS-based compiler does a robust closure optimization pass that reduces allocation rate.

At optimization level -O1, which adds partial evaluation over the high-level tree intermediate language and support for inlining "primitive calls" like + and so on, I am not sure why CPS peels out in the lead. No additional important optimizations are enabled in CPS at that level. That's probably something to look into.

Note that the baseline of this graph is optimization level -O1, with the new baseline compiler.

But as I mentioned, I didn't write the baseline compiler to produce fast code; I wrote it to produce code fast. So does it actually go fast?

Well against the -O0 and -O1 configurations of the CPS compiler, it does excellently:

Here you can see comparisons between what will be Guile 3.0.3's -O0 and -O1, compared against their equivalents in 3.0.2. (In 3.0.2 the -O1 equivalent is actually -O1 -Oresolve-primitives, if you are following along at home.) What you can see is that at these optimization levels, for these 8 files, the baseline compiler is around 4 times as fast.

If we compare to Guile 3.0.3's default -O2 optimization level, or -O3, we see bigger disparities:

Also of note is that -O0 and -O1 take essentially the same time, with -O1 often taking less time than -O0. This is because partial evaluation can make the program smaller, at a cost of being less straightforward to debug.

Similarly, -O3 usually takes less time than -O2. This is because -O3 is allowed to assume top-level bindings that aren't exported from a module can be transformed to lexical bindings, which are more available for contification and inlining, which usually leads to smaller programs; it is a similar debugging/performance tradeoff to the -O0/-O1 case.

But what does one gain when choosing to spend 10 times more on compilation? Here I have a gnarly graph that plots performance on some microbenchmarks for all the different optimization levels.

Like I said, it's gnarly, but the summary is that -O1 typically gets you a factor of 2 or 4 over -O0, and -O2 often gets you another factor of 2 above that. -O3 is mostly the same as -O2 except in magical circumstances like the mbrot case, where it adds an extra 16x or so over -O2.

worse is better

I haven't seen the numbers yet of this new compiler in Guix, but I hope it can have a good impact. Already in Guile itself though I've seen a couple interesting advantages.

One is that because it produces code faster, Guile's boostrap from source can take less time. There is also a felicitous feedback effect in that because the baseline compiler is much smaller than the CPS compiler, it takes less time to macro-expand, which reduces bootstrap time (as bootstrap has to pay the cost of expanding the compiler, until the compiler is compiled).

The second fortunate result is that now I can use the baseline compiler as an oracle for the CPS compiler, when I'm working on new optimizations. There's nothing worse than suspecting that your compiler miscompiled itself, after all, and having a second compiler helps keep me sane.

Although this work has been ongoing throughout the past month, I need to add some words on the now before leaving you: there is a kind of cognitive dissonance between nerding out on compilers in the comfort of my home, rain pounding on the patio, and at the same time the world on righteous fire. I hope it is clear to everyone by now that the US police are an essentially racist institution: they harass, maim, and murder Black people at much higher rates than whites. My heart is with the protestors. Godspeed to you all, from afar. At the same time, all my non-Black readers should reflect on the ways they participate in systems that support white supremacy, and on strategies to tear them down. I know I will be. Stay safe, wear eye protection, and until next time: peace.

June 02, 2020

A number n may have squares hidden among its digits. For instance, in the number 1625649, the consecutive digits 1, 4, 9, 16, 25, 49, 64, 256 and 625 are all squares.

Your task is to write a program that takes a positive integer n and finds all of its hidden squares. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

May 29, 2020

Given an array of integers, determine if it is in decreasing-increasing order, with some initial segment of the array in decreasing order and the remainder of the array in increasing order. If the array is in decreasing-increasing order, return the pivot element (the minimum element in the array); otherwise, return an indication the array is not in decreasing-increasing order. Array elements may be duplicated. For example, the array (10 10 10 8 8 6 4 4 3 12 13 22 31 40 59 68) is in decreasing-increasing order, with pivot element 3, but the array (1 2 4 8 12 32 64) is not in decreasing-increasing order.

The student who asked the question suggests a solution using binary search that takes O(log n) time, but can’t get it to work.

Your task is to write a program to determine if an array is in decreasing-increasing order as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

How many numbers below fifty million can be expressed as the sum of a prime square, prime cube, and prime fourth power?

Your task is to solve Project Euler 87; in the spirit of Project Euler, show only your code but not the solution. When you are finished, you are welcome to read
or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

May 19, 2020

[ My apologies that this exercise is so long delayed. We have been rewriting all of our business practices in the light of the pandemic, and work has been madness. The biggest projects are now complete, and we have retreated to merely super-busy, so maybe I will have time to write some more exercises. I hope everyone is well; I am. ]

Today’s task is to implement a minstack data structure that provides the normal stack operations push, pop and top and also the operation least which returns the smallest item in the stack without altering the stack in any way. All four operations must operate in O(1) time and O(n) space, where n is the size of the stack.

Your task is to implement a minstack as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

May 17, 2020

One of the things which sets Guix apart from other GNU/Linux
distributions is that it uses GNU
Shepherd instead of the now
ubiquitous systemd. A side effect of this is that user systemd units do
not work on Guix System. Love, hate or extremely ambivalent toward
systemd, this means that users cannot rely on already written systemd
unit files for their regular user-level services.

There are a couple of benefits to using GNU Shepherd, and not all of
them are due to it already being installed on Guix. Becoming comfortable
with using Shepherd and understanding how to write and edit Shepherd
service configurations makes the transition from other GNU/Linux
distributions to Guix System easier. More complex services with their
own logic tree, using the full power of GNU
Guile, are also possible. This
means you can have one service that behaves differently if it's running
on a different system or architecture without needing to call out to
shell scripts or using minimally different service definitions.

The GNU Shepherd manual
suggests
putting all the services inside a
monolithic init.scm file, located by default at
$XDG_CONFIG_DIR/shepherd/init.scm. While this does make it easy to keep
everything in one place, it does create one glaring issue: any changes
to the file mean that all the services need to be stopped and restarted
in order for any changes to take place.

Luckily there's a nice function called scandir hiding in ice-9 ftw
which returns a list of all files in a specified directory (with options
for narrowing down the list or sorting it). This means that our init.scm
can contain a minimum of code and all actual services can be loaded from
individual files.

As with any other shepherd service it is defined and registered, and in
this case it will start automatically. When the file is loaded by
shepherd after being discovered by scandir everything works exactly as
though the service definition were located directly inside the init.scm.

Now lets make a change. Since syncthing already has a -logfile flag and
it has built-in log rotation that sounds better than using shepherd's
#:log-file option. First we'll make our changes to the service:

In this example I want to refresh my font cache but I don't want to
actually install fontconfig either system-wide or in my profile.

$ which fc-cache
which: no fc-cache in (/home/user/.config/guix/current/bin:/home/user/.guix-profile/bin:/home/user/.guix-profile/sbin:/run/setuid-programs:/run/current-system/profile/bin:/run/current-system/profile/sbin)
$ herd start fccache
Service fccache has been started.

Of course we can import other modules and leverage the code already
written there. In this case, instead of using the string "guix
environment --ad-hoc fontutils -- fc-cache -frv" let's use the guix environment function already available in guix scripts environment:

The problem with this approach is that guix-environment returns the
exit
code
of the programs it calls and #:start expects a
constructor
to return #t or #f so there's some work to be done here.

This was just a quick peek into what's possible with GNU Shepherd when
run as a user. Next time we'll take a look at integrating
mcron to replicate some of
systemd's timer functionality.

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the kernel Linux, or it
can be used as a standalone operating system distribution for i686,
x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

May 15, 2020

While I was writing a manpage for Scheme’s quasiquote,
something I saw surprised me and changed my understanding of
quasiquote. It turns out that a new language, with semantics that are
interesting to PLT enthusiasts, hides behind the innocent backtick
character. Starting with R6RS Scheme, quasiquote became total
magic.

Background

It is not going to be easy to understand the argument in this article
if you lack some background knowledge, so here’s a brief explanation
of quasiquote, Scheme’s concept of locations, Scheme’s handling of
literal constants, and finally referential transparency.

Briefly on quasiquote

Quasiquote is a language feature in Scheme that lets you write a
template for a structure of lists and vectors. These templates are
more like web templates than C++ templates; don’t let the terminology
confuse you.

Basically you write a backtick character to start a template. The code
immediately following the backtick is the template. You write a comma
wherever you want to fill in some variable or other expression.
(There’s also a list-splicing version of the comma which often comes in
handy).

Quasiquote is very useful to build SXML and SHTML. Forget learning a
new templating system every week; this one’s a keeper. But this being
Scheme, the most popular use is likely to write S-expressions that
represent code in some language. It’s used for just that
in the nanopass framework.

Location, location, location!

Making a language is a difficult job. Everything should ideally work
smoothly together as a coherent whole, like pineapples on a pizza.
Objects in Scheme programs implicitly refer to locations. There are
many details around that which affect the whole language, and they
have interesting consequences for quasiquote.

What’s a location? It’s just a place where you can store a value. The
vector #(1 2 3) has three locations where values are stored, and
currently it’s the numbers 1, 2 and 3. If you make a vector using
make-vector then the vector is mutable and you can change the
values. Later when you see the same vector again it will still contain
the new values.

In practice a location is some address in memory, but the garbage
collector might move it around, so its address changes, but it is
still the same location.

Other objects in Scheme do not have locations. The number 1 does not
have any locations that hold values that you can change. This is only
because of the wisdom, kindheartedness and foresight of the language
designers, because it is possible to design things differently.

As a consequence of numbers not having locations, there is also very
little point in worrying about which number object you have. Suppose
that numbers did have locations and you could store important
information in them. You’d be very concerned that the 1 number
object where you stored all your passwords is the same 1 that you
now have in your secrets variable. (Nobody will ever think to look
inside the number 1 for your passwords, so your secrets are safe
there.) But number objects do not actually have locations, so it
doesn’t matter if the Scheme implementation fiddles with them behind
your back and gives you a different 1 object the next time you’re
looking.

Literal constants

Pairs and vectors have locations, but the rules for their locations
are much relaxed if they are literal constants in the program code.

Constants in Scheme are allowed to have read-only locations. If you
compile a program with Loko Scheme, you will notice that you get an
assertion if you try to change any of the constants. From R6RS:

It is desirable for constants (i.e. the values of literal
expressions) to reside in read-only memory. To express this, it is
convenient to imagine that every object that refers to locations is
associated with a flag telling whether that object is mutable or
immutable. Literal constants, the strings returned by
symbol->string, records with no mutable fields, and other values
explicitly designated as immutable are immutable objects, while all
objects created by the other procedures listed in this report are
mutable. An attempt to store a new value into a location referred to
by an immutable object should raise an exception with condition type
&assertion.

Literal constants can also share locations. If the same constant
appears in different places in the program then the compiler is
allowed to create a single shared instance of the constant, here
explained as it applies to structures, in the section on eqv?:

Implementations may share structure between constants where
appropriate. Thus the value of eqv? on constants is sometimes
implementation-dependent.

So when it comes to literal constants, Scheme’s normal storage rules
do not apply. A program might find that two different locations have
become the same location, so that changing the value in a quoted
vector ends up changing the value in another quoted vector. It’s also
very likely that the program gets an exception when it tries to change
the value in such a location. The last example shows that going the
other way is not allowed: the compiler is not allowed to create two
different versions of the list (a) in that program.

Referential transparency

Referential transparency is a concept that is important in purely
functional programming languages. An expression that is referentially
transparent can be replaced by the values it returns.

Some expressions in Scheme are referentially transparent. Constants,
references to variables that are never mutated, arithmetic in general,
type predicates, etc. A Scheme compiler is allowed to replace (+ 1
2) with 3. It doesn’t matter that the program might actually have
returned a “different” 3 each time the expression runs. In the same
way it doesn’t matter if the compiler turns two different constants
into the same constant.

Most parts of Scheme are not referentially transparent. As an example,
a Scheme compiler cannot replace (vector 1 2 3) with '#(1 2 3). The
locations created by the vector procedure need to be fresh and
mutable. But it can replace (vector-ref '#(1 2 3) 0) with 1, so
this expression is referentially transparent. And as we previously
saw, it can replace (cdr '(a b)) with '(b).

All of this should be fairly widely known, but now comes the
interesting part.

There is a crack in everything

So far I have explained Scheme’s notion of locations, referential
transparency and that the rules are different for literal constants.

A quasiquote expression may return either fresh, mutable objects or
literal structure for any structure that is constructed at run time
during the evaluation of the expression. Portions that do not need to
be rebuilt are always literal. Thus,

This part of R6RS originally came from Kent
Dybvig’s formal comment #204. The same type of language was
adopted in R7RS.

The meaning is that a quasiquoted expression can be turned into a
literal, or parts may be turned into literals. Where there was code in
the quasiquote expression, there can now be a literal. Going the other
direction is not allowed: literals cannot be turned into code that
returns fresh, mutable structure. But as the example '((1 2) 3 4 five
6) shows, a compiler is allowed to even propagate constants into
quasiquote.

There is a very deep rabbit hole here! Have a look again: return […]
literal structurefor any structure that is constructed atrun
timeduring the evaluation of the expression.

There is now a way to construct literals from run time code, but to do
so ahead of run time.

Literal magic

Let me demonstrate the power of Scheme’s magic quasiquote. Let -->
mean “equivalent to”. It can be the result of an expansion or another
compiler pass, such as a partial evaluator. Here is the original,
innocuous-looking example:

(let ((a3)) `((12) ,a ,4 ,'five6))
; -->
'((12) 34 five 6)

Can we get literal structure copied into the constant part? Easy:

(let ((a '(3))) `((12) ,a ,4 ,'five6))
; -->
'((12) (3) 4 five 6)

But we’re just getting started. Can we construct a structure at
runtime and have that appear as a constant? Of course!

`((1 2) ,(list 3) ,4 ,'five 6)
;; -->
'((1 2) (3) 4 five 6)

Those Schemers who are paying attention will be thinking I’ve gone mad
now. Maybe I have, but this example simply returned literal structure
for the (list) structure that was constructed at run time during the
evaluation of the expression, to paraphrase the quasiquote
specification. Let’s increase the volume:

`((1 2) ,(map + '(1 1) '(2 3)) ,'five 6)
;; -->
'((1 2) (3 4) five 6)

Hang on, shouldn’t map return a fresh, mutable list? Not anymore,
this is quasiquote. The map function constructs a list at run time
during the evaluation of the quasiquote expression, so the structure
no longer needs to be fresh. (Besides, the R6RS and R7RS definitions
of map do not actually say that the list needs to be fresh and
mutable, but everyone probably assumes it does.)

Is this valid? I think most would say it isn’t; the list created by
iota is constructed with cons, which returns fresh, mutable pairs.
But this is happening inside quasiquote where the normal rules of
society break down. The iota procedure constructs a list structure
at runtime during the evaluation of a quasiquote expression, so a
compiler is allowed to return literal structure for that list
structure.

These go to 11

Let’s crank it up and make quasiquote transform code.

Compilers like Chez Scheme, Loko Scheme, Guile Scheme and many others
use partial evaluators to perform source-level optimizations. A
partial evaluator runs code symbolically. The output is a new program
that is hopefully more efficient than the program that went into it.

The partial evaluators used by Scheme compilers are pretty mild as far
as partial evaluators go, mostly because of the semantics of Scheme.
Doing more powerful transformations on Scheme programs would require
quite powerful static analysis, and that is both slow and difficult.

To get quasiquote to work with code, we need something that enables a
partial evaluator to run a given procedure in such as way that it’s
always inside a quasiquote expression. If we have such a procedure
then the partial evaluator can start using the tricks described above,
and treat all code that constructs new structures as if their
structures were literals. This makes the partial evaluator very happy,
so here is happy:

(define (happy proc)
(lambdax
`,(apply proc x)))

In a normal Scheme implementation this operator doesn’t do much more
than maybe waste a little time and space. But in a Scheme that knows
the magic nature of quasiquote, it would enable powerful program
transformations on lists and vectors, without the need for as much
analysis as it would normally require. In particular, it should no
longer be necessary to analyze if intermediate results are mutated,
nor to analyze if programs check results for pointer equality with
eq?.

Here is an illustrative example of a potential program transformation,
based on Philip Wadler’s 1990 paper Deforestation: Transforming
programs to eliminate trees:

After deforestation (or fusion), the intermediate lists used in
f have been eliminated. This is beneficial in that you can write
high-level code, but still have the compiler produce the efficient
loop you would have had to write by hand. Scheme compilers normally
don’t do these transformations due to the required analysis.

The happy operator does not completely open the barn doors: the
transformation still needs to not change other program side effects,
such as I/O and exceptions.

The fly in the ointment

Imagine a program written according to this template:

(define (main)
...)
(define main* (happy main))
(main*)

What are the consequences for the main program? It would seem that
everything in it follows the rules of quasiquote and it can’t use
cons in the normal way. This is bad news for the main program.

Conclusions

I don’t know where this leads. What is the precise limit for what a
compiler can and can’t turn into literal structure in quasiquote? The
example with the main program makes it seem that quasiquote actually
gives the compiler a bit too much freedom.

Perhaps it’s actually just poor wording, so R6RS and R7RS will get a
new erratum that clarifies what is and what isn’t allowed. I suspect
that this is the most likely outcome.

But it doesn’t stop someone who is working on a partial evaluator or
another program transformation from proposing a happy operator as a
SRFI, giving it semantics that enable even more powerful
transformations, but without the need to rely on language lawyering.

There is one conclusion I can draw from all this: don’t assume that
what comes of out quasiquote can be mutated.

May 12, 2020

We are thrilled to announce that three people will join Guix as interns
over the next few months! As part of Google’s Summer of Code (GSoC),
under the umbrella of the GNU Project, one person is joining us:

Brice Waegeneire (liberdiko) will work on network booting Guix System.
This will involve both making Guix System network bootable,
and making it easy to set up a network boot server on Guix System.

Through Outreachy, the internship program
for groups underrepresented in free software and tech, two people will
join:

Danjela will work on improving internationalization support for the
Guix Data Service,

Raghav Gururajan will work on integrating desktop environments
into Guix System.

Christopher Baines and Danny Milosavljevic will be their
primary mentors, and the whole Guix crowd will undoubtedly help and
provide guidance as it has always done.

We welcome all three interns, exciting things are sure to come!

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the kernel Linux, or it
can be used as a standalone operating system distribution for i686,
x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

May 06, 2020

Guix includes a mechanism called grafts that allows us to provide
users with security
updates
in a timely fashion, even for core packages deep down in the dependency
graph. Most users value the benefits of grafts, but newcomers were also
unavoidably struck by what turned out to be the undesirable side effect
of our graft implementation on user experience. This had been a
well-known problem for a while, but 1.1.0 finally addressed these
issues.

This article recaps how grafts are implemented, what problems that
caused, and how we solved it. It’s a deep dive into core Guix, and I
hope it’ll be insightful to all and intriguing to the functional
programming geeks among us!

What’s this “graft” thing anyway?

Grafts were introduced in the early days of Guix to address probably
the main practical shortcomings of functional software
deployment.
In a nutshell, functional deployment as implemented by Nix and Guix
means that, when a package changes, everything that depends on it must
be rebuilt (or re-downloaded). To deploy a security fix in the C
library or in Bash, you would thus need to rebuild everything. Even
with a huge build farm, that can significantly delay the deployment of
fixes; users would find themselves either rebuilding things locally or,
at best, re-downloading binaries of everything.

To address this, Guix developers can instead specify a replacement in
a package
definition.
If we have a bug-fix for, say, libc, developers would (1) define a
package for the fixed libc, and (2) add a replacement field in the
original libc package pointing to that fixed package. The effect is
that only the bug-fix libc needs to be built. When building a
package, the bug-fix libc is automatically grafted onto that package,
such that the resulting package refers to the bug-fix libc. See the
manual
for more.

When “lowering” a high-level package
definition
to a low-level
derivation,
Guix traverses the package dependency graph and identifies a set of
potentially applicable grafts. Why “potentially applicable”? Consider
this scenario: assume perl has a replacement; coreutils has a
dependency on perl, but it’s a build-time dependency: coreutils does
not depend on perl at run time. Thus, coreutils can be used as is,
there is no need to graft it.

But how do we know whether a dependency is a built-time-only dependency?
The native-inputs
field
of a package usually lists build-time dependencies, but it’s more of a
hint. Ultimately, the set of run-time dependencies, which we call the
references, is the subset of the build-time dependencies that the
garbage collector (GC) in the build daemon finds in the build
result—Section 5.5.1 of Eelco Dolstra’s PhD
thesis describes how the
GC
scans for references. In our example, we first have to actually build
coreutils before we can tell whether it depends on perl at
run time.

Guix arranges to graft only when necessary. In this example, guix build coreutils would return the same as guix build coreutils --no-grafts. Conversely, since inkscape has a run-time dependency on
perl, guix build inkscape returns a derivation that grafts the
perl replacement onto the original inkscape build result, the one
returned by guix build inkscape --no-grafts. The (simplified)
dependency graph of the derivation for the grafted inkscape looks like
this:

Grafts are a form of what Build Systems à la
Carte
by Mokhov et al. (a good read!) refers to as dynamic dependencies:
grafting depends on intermediate build results.

Still here? With the background in place, let’s look at the problems
that arose.

Grafts, the user interface, and performance

Conceptually, to decide whether to graft a package, we examine the
references of the build result of the ungrafted package. However, we
usually want guix install & co. to first display an overall build
plan, especially when invoked with --dry-run:

To accommodate that, the pre-1.1.0 implementation of grafts did the
following: when
substitutes
were enabled, it would get the list of references of ungrafted packages
from substitutes; only when substitutes for an ungrafted package are
missing would it first try to build that package. Thus, when
substitutes are available, guix install and similar commands would be
able to display the build plan upfront. However, when a packages had no
substitutes, you’d see Guix start building it without saying a word
about the build plan, which was arguably
confusing.

But it’s worse than that. Grafting is per-package, so every time you
would lower a package to a derivation, you would need to answer the
question “does this specific package have substitutes, and if so,
should it be grafted?” The end result was poor resource usage and
terrible user interface
feedback. For every package
that is a graft candidate, the user would see that infamous line:

updating substitutes from 'https://ci.guix.gnu.org'...

The problem was particularly acute when building whole systems with
guix system because there would typically be a large number of such
packages. Furthermore, each of these lines would correspond to
(roughly) a single HTTP GET request on a fresh TLS connection. That can
be slow… and annoying. Perhaps to some users this updating substitutes stuttering was the proof of the developers’ incompetence
and perhaps, truth be told, to some of us developers it was a small
price to pay for the sophistication of grafts.

For users who disable substitutes and build everything locally, the
situation wasn’t much better: all the packages candidate for grafting
would be built one by one, thereby missing parallelization opportunities
as specified by
--max-jobs.

Gathering dynamic dependencies

To address this, all these individual dynamic dependencies need to be
gathered somehow instead of being treated one by one. Conceptually, we
would like to, roughly, do a first pass lowering packages to derivations
as if grafting was disabled, build all these derivations, and then do a
second pass to determine which packages in the graph need to be grafted and
to compute the relevant grafting derivation. That would address the
performance issue: we’d now have as much parallelism as possible, so we
wouldn’t query substitutes or build packages one by one. If we reify
that second pass to the user interface code, it also addresses the user
interface issue by allowing it to display, possibly, two build plans:
the “ungrafted” one followed by the grafted one.

The problem is that our API is inherently serial: the
package-derivation function takes one package, lowers it, and
returns its derivation:

Lowering includes dealing with grafts, and
that’s why we ended up with one-by-one inefficiencies. An option would
be to make all the API “plural”: have package-derivation and its
friends accept a list of packages instead of a single one. That would
be a huge amount of work and the end result would be unpleasant to use:
it’s easier to reason one-by-one.

The solution implemented in 1.1.0 instead starts from this observation:
the call graph of package-derivation mirrors the package graph. Thus,
we could gather dynamic dependencies using monad
trickery
or using “control effects”. We went for the latter, which didn’t have
the “contamination” problem of monads and led to simpler code.

The starting point is that, by definition, code with dynamic
dependencies necessarily calls
build-derivations.
Taking advantage of delimited continuations in
Guile,
build-derivations is instrumented to abort to a “build handler”
prompt
when it’s called. The build handler receives the list of derivations to
build along with a continuation to invoke to resume the aborted
computation and start building things. User interface code can install
a build handler that displays what’s going to be built:

To implement dry runs, simply omit the call to continue and nothing
will be built. (This is a slightly simplified artist view, see
build-notifier
for the real thing.)

Now, we need to take advantage of this mechanism to gather the
individual build-derivations calls so we can later emit a single
build-derivations call for all the gathered derivations. The goal is
to effectively gather all the calls for ungrafted packages, build them
all at once, and then resume graft computation.

To achieve that, we write a build handler that, when invoked, returns an
<unresolved> object that captures what to build and the continuation.
In addition, we provide a primitive to introduce parallelism such
that, if a dynamic dependency is encountered, we keep going and attempt
to compute as much as possible without resolving that dynamic
dependency. These are build-accumulator and
map/accumulate-builds.
map/accumulate-builds is like map, except that it accumulates and
gathers build-derivations request.

By using map/accumulate-builds instead of map in a few
keyplaces,
we obtain a good approximation of what we wanted, as illustrated in this
run:

What we see above is first a build plan that downloads binaries for the
two ungrafted packages, followed by a build plan for one grafting
derivations: we have successfully preserved parallelism.

The solution resembles the suspending scheduler discussed in the à
la Carte paper, though decomposition is not as principled as what the
paper describes. It remains an approximation and not the
optimal way to deal with dynamic dependencies. There are still
situations where that shows,
but overall, it’s a significant improvement. Unlike other solutions
prototyped before, this one
has the advantage of being orthogonal and simple: less than 100 new
lines of
code,
and even about 30 lines
removed
from the graft implementation. That alone contributes a lot to the
author’s satisfaction. :-)

Interlude: a scatter/gather pattern?

In the end, we’re just gathering all the build-derivations calls,
turning them into a single call, and finally calling all the original
site continuations with the result. The same kind of issue shows up
when dealing with sequences of remote procedure calls (RPCs) and HTTP
requests, and it seems there’s a more general pattern lurking here.
Consider code like this:

(map (lambda (thing)
(http-get (thing->url thing)))
lst)

Wouldn’t it be nice if we could somehow capture all the http-get
calls, turn them into a series of pipelined GET
requests, and resume the
continuations with their result?

I haven’t found a standard functional pattern to address this and would
welcome ideas!

Dynamic dependencies of all shapes

We have seen how Guix deals with dynamic dependencies. Nix supports a
similar but limited form of dynamic dependencies through
the import primitive of the
Nix language, which can take the result of a derivation
build;
it does not attempt to gather the resulting buildPaths calls.

Another form of dynamic dependency is derivation-building derivations
or recursive derivations, which were recently implemented in
Nix. It supports another form
of dynamic dependency where the build process of a derivation can itself
create and build derivations (these are moldable
tasks
in scheduling parlance). It’s a great feature because in a nutshell, it
allows Nix to be used not only to compose packages, but also at a finer
grain as part of a package build process.

Guix supports yet another form of dynamic dependencies. The newfangled
guix deploy
tool
works by evaluating g-expressions (gexps)
remotely.
For example, before actually deploying an operating system, it first
runs code on the remote node to perform sanity checks: checking whether
the declared file system UUIDs or labels are valid, checking whether
additional kernel modules should be added to the initial RAM disk, and
so forth. To do that,
remote-eval
first builds a derivation that produces a Scheme program, deploys it
along with all its dependencies on that target machine, runs it, and
retrieves the result. This form of dynamic dependency also benefits
from the gathering machinery discussed above.

Conclusion

This is a long article on what may look like a fine point of Guix design
and implementation, but there’s much to say about it! Grafts are key to
the use of functional deployment in production because they enable quick
security updates, and it’s a lot better if they don’t harm the user
experience.

The pre-1.1.0 implementation of grafts had a negative impact on the user
interface and on performance, which was due to the sequential handling
of grafts, one package at a time. In 1.1.0 we addressed it by using
delimited continuations to gather dynamic dependencies such as grafts,
perform builds in bulk, and resume each derivation computation.

As it turned out, the implementation of dynamic dependencies raises lots
of interesting design and implementation issues, and it’s probably not
the end of the road!

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the kernel Linux, or it
can be used as a standalone operating system distribution for i686,
x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

May 04, 2020

This blog post is to announce a change of hands in the Guix
co-maintainer collective: Ricardo Wurmus is stepping down from his
role, and Mathieu Othacehe will be filling in to ensure continuity,
after being elected by the other Guix co-maintainers.

Ricardo has been around since the start, and has been invaluable to
the project. He has been key in maintaining the infrastructure Guix
runs on, contributed countless packages, core APIs and tools
(importers, build systems, and Docker image creation to name a few).
Over the years, he's also brought us a fair share of cool hacks such
as a nifty issue tracker, and generously
spent time helping Guix users in the IRC channel and mailing lists.
Equally important was his taking care of many administrative tasks
such as expanding the build farm and organizing Outreachy
participation. We're sad to let him go, and hope he'll stick around
as time permits :-).

On the happier side of things, the appointment of Mathieu Othacehe as
a co-maintainer means Guix will benefit from renewed energy and vision
to grow further. Mathieu has already made valuable contributions to
Guix; the graphical installer that allows users to easily install the
Guix System on their machine is one of them. He has also
demonstrated the qualities we expect from a co-maintainer. We're
thrilled to make official his new role as a Guix co-maintainer!

Let's take a moment to show our gratitude to Ricardo and welcome
Mathieu in his new role!

The Guix co-maintainers

The Guix maintainer collective now consists of Marius Bakke, Maxim
Cournoyer, Ludovic Courtès, Tobias Geerinckx-Rice and Mathieu
Othacehe. You can reach us all by email at
guix-maintainers@gnu.org, a private alias.

For information about the responsibilities assumed by the Guix
co-maintainers, you are encouraged to read a previous blog
post
that covered the topic.

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the kernel Linux, or it
can be used as a standalone operating system distribution for i686,
x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

It’s been 11 months since the previous release, during which 201 people
contributed code and packages. This is a long time for a release, which
is in part due to the fact that bug fixes and new features are
continuously delivered to our users viaguix pull. However, a
number of improvements, in particular in the installer, will greatly
improve the experience of first-time users.

It’s hard to summarize more than 14,000 commits! Here are some
highlights as far as tooling is concerned:

The new guix system describe
command tells you which commits of which channels were used to
deploy your system, and also contains a link to your operating
system configuration file. Precise provenance tracking that gives
users and admins the ability to know exactly what changed between
two different system instances! This feature builds upon the new
provenance service.

guix pack
has improved support for generating Singularity and Docker images,
notably with the --entry-point option.

3,514 packages were added, for a total of more than 13K
packages. 3,368 packages were
upgraded. The distribution comes with GNU libc 2.29, GCC 9.3,
GNOME 3.32, MATE 1.24.0, Xfce 4.14.0, Linux-libre 5.4.28, and
LibreOffice 6.4.2.2 to name a few.

The remote-eval procedure in (guix remote)
supports remote execution of Scheme code as G-expressions after
having first built and deployed any code it relies on. This
capability was key to allowing code sharing between guix deploy,
which operates on remote hosts, and guix system reconfigure.
Similarly, there’s a new
eval/container
procedure to run code in an automatically-provisioned container.

The new lower-gexp procedure returns a low-level intermediate
representation of a G-expression. remote-eval, eval/container,
and gexp->derivation are expressed in terms of lower-gexp.

The
with-parameters
form allows you, for instance, to pin objects such as packages to a
specific system or cross-compilation target.

That’s a long list! The NEWS
file
lists additional noteworthy changes and bug fixes you may be interested
in.

Enjoy!

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the kernel Linux, or it
can be used as a standalone operating system distribution for i686,
x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

April 10, 2020

[ I continue working at home. It’s no fun, and less productive than working in my office, but I’m coping. There remains much emergency work to be done related to moving all of our students online. I hope everyone is doing well during this time. ]

A student recently asked for help on a beginning-programmer forum to write a Scheme program that finds the nth item in a linked list, mimicking the list-ref built-in function. He posted his code, which was awful; I won’t repost it here. Instead of engaging him, I sent a private email suggesting that he consult either his professor or his teaching assistant, as his posted code showed several misconceptions about Scheme. He wrote back, saying he was sure there was only one thing wrong with his code and I could easily point it out. I didn’t respond, as there was far more than one thing wrong with his code.

Your task is to write a program to find the nth item in a linked list; your program must be recursive, as was required by the original assignment. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

April 08, 2020

There’s been a bit of speculation as to whether our April 1st
post
was a joke. Part of it was a joke: we’re not deprecating Linux-libre,
fear not! But when we published it, it was already April 2nd in Eastern
parts of the world and thus, not surprisingly, the remainder of the post
was less of a joke.

Getting to a bootable system

For all you who tried our April 1st image and ran guix we sure hope
you had a good laugh. We set out to cross-build that virtual machine
(VM) image using Guix and while we made some good progress on Wednesday,
in the end we decided to cheat to make the release deadline.

What we got stuck on for a while was to get past the ext2fs
translator
(the user-land process that implements the ext2 file system) seemingly
freezing on boot, saying:

start ext2fs:

and then nothing... Running ext2fs cross-built with Guix on
Debian GNU/Hurd would hang similarly. The kernel debugger would show an
intriguing backtrace in ext2fs suggesting that ext2fs was not
handling page fault
messages.
Long story short: we eventually realized that the server interfaces were
compiled with a 64-bit MiG whereas
we were targeting a 32-bit platform. From there on, we embarked on a
delightful hacking journey ensuring the Hurd boot process would
correctly run in our VM up to a proper login prompt.

Today we have a humble gift for you: On the wip-hurd-vm branch
(Update: this has now been
merged!)
we have an initial
hurd.scm
system description that can be used to cross build a VM running the
Hurd.

Running:

./pre-inst-env guix build -f gnu/system/hurd.scm

cross-compiles all the relevant packages for GNU/Hurd—specifically the
i586-pc-gnutriplet—and produces a VM image:

/gnu/store/yqnabv1zmlkviwzikc23w9qvfnyfwvj7-qemu-image

You can build it and start it from your GNU/Linux machine with this
command:

Woohoo! (Actually we already have more stuff not shown here, such as
guix itself running… for a future post! :-))

Why bother?

Why bother with the Hurd anyway? Isn’t it a pipe dream or “vaporware”,
depending on one’s perspective? There’s some unquestionable truth in
that: we know that Hurd development started in the early 90’s, months
before Linux development started, and yet it still lacks so much in
terms of hardware support, even though significant progress was made in
recent years in particular with the use of Rump
kernels.

The more we witness how new features are retrofitted in the kernel
Linux, the more we think the Hurd’s design is better suited to today’s
needs. Linux
namespaces, the
foundation of “containers”, are such an example of an afterthought;
unprivileged user namespaces, which allow unprivileged users to benefit
from lightweight “container” virtualization, are still often disabled by
distros due to a lack of confidence. This is in sharp contrast with the
Hurd’s inherent unrestricted support for fine-grain virtualization: a
PID namespace is just another proc server, and file system name space
is just another root file system server, and so on. Container-like
lightweight virtualization is native on the Hurd.

Last but not least, with an eye on the security and transparency of free
software systems, a microkernel-based systems seems to naturally lend
itself well to bootstrapping from a reduced trusted base. This is one
of the topics we discussed on the last Reproducible Builds
Summit.

The question is not so much whether 2020 or 2021 will be the year of the
Hurd. It’s more about the kind of systems we want to build. A lot of
work remains to be done, but we think, in 2020 more than ever, that this
is a promising approach for the betterment of the security of our
systems and the freedom of users.

We also have to admit that this is an amazing system to hack on, even
more so when combined with Guix, so… happy hacking! :-)

About GNU Guix

GNU Guix is a transactional package
manager and an advanced distribution of the GNU system that respects
user
freedom.
Guix can be used on top of any system running the Hurd or the Linux
kernel, or it can be used as a standalone operating system distribution
for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.

About the GNU Hurd

The GNU Hurd is the GNU project's
replacement for the Unix kernel. It is a collection of servers that
run on the Mach microkernel to implement file systems, network
protocols, file access control, and other features that are
implemented by the Unix kernel or similar kernels (such as Linux).
More
info.

The mission of the GNU
Hurd
project is to create a general-purpose kernel suitable for the GNU
operating system, which is viable for everyday use, and gives users
and programs as much control over their computing environment as
possible.

April 03, 2020

[ I continue working from home. It is interesting that my commute has gone from a daily 75-minute round-trip by car to a 75-foot round-trip by foot, so I think I should have more energy in the evening, but the opposite is true; I am exhausted by the end of the day, just from sitting in front of my laptop. ]

Today’s exercise is somebody’s homework:

Given positive integers C and N, find N numbers that sum up to C and the difference between the highest and the lowest of these number should not be more than one. For example: with C = 26 and N = 7, the desired output is [4 4 4 4 4 3 3].

Your task is to write a program to solve the student’s homework. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.