This post introduces how mdoc evaluates Scala code examples with good
performance while reporting clear error messages. mdoc is a markdown
documentation tool inspired by tut.

Like tut, mdoc reads markdown files as input and produces markdown files as
output with the Scala code examples evaluated. Unlike tut, mdoc does not use the
Scala REPL to evaluate Scala code examples. Instead, mdoc translates each
markdown file into a regular Scala program that evaluates in one run. In this
post, we look into the implications of this change and how it can deliver up to
27x faster performance when processing invalid documents.

REPL semantics

A key feature of the REPL is that it shows you the value of an expression right
after you type it. Although this feature is great for explorative programming,
it can be limiting when writing larger programs.

It's not possible for object User to be a companion of class User because
they're defined in separate objects $line3 and $line4. This encoding is
required for the REPL because we need to eagerly evaluate each expression as its
typed. However, this limitation can be lifted if we know the entire program
ahead of time, which is the case when evaluating all Scala code examples in
markdown files.

Program semantics

Instead of using the REPL to eagerly evaluate individual expressions, mdoc
builds a single Scala program from all code examples in the markdown file and
evaluates them in one run. This approach is possible because we know which
statements appear in the document. For example, consider the following markdown
document.

```scala mdoc
val x = 1
``````scala mdoc
println(x)
```

This document gets translated by mdoc into roughly the following instrumented
Scala program.

To report readable error messages, mdoc translates positions in the synthetic
program to positions in the markdown source. To translate positions, mdoc
tokenizes both the original source code and the synthetic source code and aligns
the tokens using edit distance.

It's expected that there are compile errors because mdoc uses program semantics
instead of REPL semantics. For example, in this particular example
meteredClient was already defined in this document, which is not a problem for
the REPL but is invalid in normal programs.

We rename a few conflicting variables and comment out two ambiguous implicits.

Observe that the error is reported within one second, faster than it takes to
process the document when it's valid. The position of the error message points
to line 449 and column 41 which is exactly where year identifier is
referenced. In some terminals, you can cmd+click on the error to open your
editor at that position.

We compare the performance with tut by checking out the master branch and run
docs/tutOnly dsl.md four times.

Observe that it took 22 seconds to report the compile error, about as long as it
takes to process the valid document. Also, the position points to line 458,
which is the last line of the code fence containing the closing }, but it's is
not the exact line where year is referenced.

Some observations:

we had to make changes in the document to migrate from REPL semantics to
program semantics. The migration can't be automated because it requires
renaming variables and reorganizing the implicit scope.

for cold performance, mdoc takes 10 seconds while tut takes 21 seconds to
process a 500 line markdown document with 32 evaluated code fences. My theory
is that the primary reason for this difference is REPL semantics vs. program
semantics.

for hot performance, mdoc processes the same document in 2.4 seconds while tut
takes between 21 and 28 seconds. Under --watch mode, mdoc reuses the same
compiler instance between runs allowing the JVM to warm up. I suspect tut can
enjoy similar speedups by introducing a --watch mode.

mdoc reports compile errors for invalid documents in 0.8 seconds while it
takes 22 seconds for tut. The reason for this difference is likely the fact
that the REPL needs to compile and evaluate each leading statement in the
document to reach the compile error (which appeared late in the document)
while mdoc typechecks the entire document before evaluating the statements.

Conclusion

In this post, we looked into the difference between REPL semantics used by tut
and program semantics used by mdoc. Program semantics enable mdoc to process
valid markdown documents up to 2x faster under cold compilation, and report
compile errors for invalid documents up to 27x faster when combined with
--watch mode under hot compilation.

To report clear error messages, mdoc uses edit distance to align tokens in the
original markdown source with tokens in the instrumented program. This technique
enables mdoc to generate instrumented Scala source code while reporting
positions in the original markdown source.

Migrating from REPL semantics to program semantics requires manual effort. If
you write a lot of documentation and want a tight edit/preview feedback loop,
the migration might be worth your effort.