tag:blogger.com,1999:blog-356090232014-10-07T06:10:02.001+02:00nominolo's BlogThomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.comBlogger15125tag:blogger.com,1999:blog-35609023.post-772869541695983122012-08-17T17:14:00.001+02:002012-08-17T17:14:41.233+02:00Beyond Package Version PoliciesWhen I read the announcement of the latest GHC release candidate, I did not feel excitement but rather annoyance. The reason is that now I have to go and check all my packages' dependency specifications and see if they require a version bump. <a href="http://www.haskell.org/pipermail/haskell-cafe/2012-August/102885.html">I was not the only one.</a>&nbsp;In the following I sketch an approach that I think could take out most of the pain we currently experience in Hackage ecosystem.<br />
<br />
<h3>The Problem</h3><br />
The reason for this annoyance is the rather strict <a href="http://www.haskell.org/haskellwiki/Package_versioning_policy">Hackage package versioning policy</a> (PVP). The PVP specifies the format of the version number of a Hackage package as a sequence of four integers separated by dots:<br />
<br />
<div style="text-align: center;"><i>majorA</i>.<i>majorB</i>.<i>minor</i>.<i>patchlevel</i></div><br />
The top two digits denote the major version number, and must be incremented if a potentially breaking change is introduced in a new package release. The minor number must be incremented if new functionality was added, but the package is otherwise still compatible with the previous version. Finally, a patchlevel increment is necessary if the API is unchanged and only bugfixes or non-functional changes (e.g., documentation) were made.&nbsp;This sounds fairly reasonable and is basically the same as what is used for shared libraries in the C world.<br />
<br />
When specifying dependencies on other packages, authors are strongly encouraged to specify upper bounds on the major version. This is intended to avoid breaking the package if a new major version of the dependency is released (Cabal-install always tries to use the latest possible version of a dependency). If the package also works with a newer version of the dependency, then the author is expected to release a new version of his/her library with an increased upper bound for the dependency version.<br />
<br />
&nbsp;In Haskell, unfortunately, this system doesn't work too well for a number of reasons:<br />
<br />
<ol><li>Say, my package P depends on package A-1.0 and I now want to test if it works with the newly released version A-1.1. My package also depends on package B-0.5 which in turn also depends on A-1.0. GHC currently cannot link two versions of the same package into the same executable, so we must pick one version that works with both -- in this case that's A-1.0. D'oh!<br />
I now have two options: (a) wait for the author of package B to test it against A-1.1, or (b) do it myself. If I choose option (b) I also have to send my patch to the author of B, wait for him/her to upload the new version to Hackage and only then can I upload my new version to Hackage. The problem is multiplied by the number of (transitive) dependencies of my package and the number of different authors of these packages. This process takes time (usually months) and the fast release rate of GHC (or many other Haskell packages, for that matter) doesn't make it any easier.</li>
<li>Packages get major-version upgrades rather frequently. One reason is that many Haskell libraries are still in flux. Another is that if a package adds a new instance, a major version upgrade is required. We can protect against new functions/types being added to a package because we can use explicit import lists. New instances are imported automatically, and there's no way to hide them when importing a module.</li>
<li>A package version is a very crude and conservative approximation that a dependent package might break.</li>
</ol><div>Generally, I think it's a good thing that Haskell packages are updated frequently and improved upon. The problem is that the current package infrastructure and tools don't work well with it. The PVP is too conservative.</div><div><br />
</div><h3>A Better Approach</h3><div><br />
The key notion is to <i>track dependencies at the level of individual functions, types, etc. rather than at the level of whole packages</i>.<br />
<br />
When a package P depends on another package A it usually doesn't depend on the whole package. Most of the time P just depends on a few functions and types. If some other part of A is changed, that shouldn't affect P.&nbsp;We have so much static information available to us, it's a shame we're not taking advantage of it. Consider the following system:</div><div><ol><li>When I compile my code, the compiler knows exactly which functions, types, etc. my program uses and from which packages they come from. The compiler (or some other tool) writes this information to a file (preferably in a human-readable format). Let's call this file: <code>dependencies.manifest</code></li>
<li>Additionally, the compiler/tool also generates a list of all the functions, types, etc. defined by code in my package. Let's call that file: <code>exports.manifest</code>. I believe GHC's ABI versioning already does something very similar to this, although it just reduces this all to a hash.</li>
</ol><div>The first use of this information is to decide whether a package is compatible with an expected dependency. So, if my package's "dependency.manifest" contained (for&nbsp;example)</div></div><div><br />
</div><div><pre>type System.FilePath.Posix.FilePath = Data.String.String
System.FilePath.Posix.takeBaseName :: System.FilePath.Posix.FilePath -&gt; System.FilePath.Posix.FilePath</pre></div><br />
then it is compatible with any future (or past) version of the filepath package that preserves this API and that defines FilePath as a type synonym for Strings.<br />
<br />
Of course, this only checks for API name and type compatibility, not actual semantic compatibility. This requires some hints from the package authors, as described below. Together with annotations from the package author about semantic changes, the only information we need to check if a newer package is a compatible dependency are the versions the original versions of the dependencies used and the manifest of the new package.<br />
<br />
For example, let's say version 0.1 of my package looks as follows:<br />
<br />
<pre>module Gravity where
bigG :: Double -- N * (m / kg)^2
bigG = 6.674e-11
force :: Double -&gt; Double -&gt; Double -&gt; Double -- N
force m1 m2 r = (bigG * m1 * m2) / (r * r)</pre><br />
Its manifest will look something like this:<br />
<br />
<pre>Gravity.bigG :: Double, 0.1
Gravity.force ::&nbsp;Double -&gt; Double -&gt; Double -&gt; Double, 0.1</pre><br />
The version of each item is the version of the package at which it was introduced or changed its semantics.<br />
<br />
Now I add a new function in version 0.1.1:<br />
<br />
<pre>standardGravity :: Double -- m/s^2
standardGravity = 9.80665</pre><br />
The manifest for version 0.1.1 now will be<br />
<br />
<pre>Gravity.bigG :: Double, 0.1
Gravity.force :: Double -&gt; Double -&gt; Double -&gt; Double, 0.1
Gravity.standardGravity :: Double, 0.1.1</pre><br />
<div>Now, let's say I want to improve the accuracy of bigG in version 0.2:</div><div><br />
</div><pre>bigG = 6.67384</pre><br />
<div>Since <code>bigG</code> was changed and force depended upon it, by default the new manifest would be:</div><br />
<pre>Gravity.bigG :: Double, 0.2
Gravity.force :: Double -&gt; Double -&gt; Double -&gt; Double, 0.2
Gravity.standardGravity :: Double, 0.1.1</pre><br />
<div>However, one could argue that this is a backwards compatible change, hence the manifest would be adjusted by the author (with the help of tools) to:</div><br />
<pre>Gravity.bigG :: Double, 0.1
Gravity.force :: Double -&gt; Double -&gt; Double -&gt; Double, 0.1
Gravity.standardGravity :: Double, 0.1.1</pre><br />
<div>That is the same manifest as version 0.1.1, thus 0.2 is 100% compatible with all users of 0.1.1 (according to the package author), and even all users of 0.1 because no functionality has been removed.</div><br />
<div>Even if manifests didn't include the version number (for now)&nbsp;I believe just the API information is precise enough for most cases. It will still be necessary to constrain the allowed range of package dependencies, but that should be the rare exception (e.g., a performance regressions) rather than the current state where dependencies need to be adjusted every few months.<br />
<br />
</div><h4>Upgrade Automation</h4><br />
This mechanism alone only helps with being less conservative when checking whether a package can work with an updated dependency. The other issue is that Haskell package APIs are often moving quickly and thus breaking code is unavoidable. If a package only has a few dependents this may not be such a big deal, but it becomes a problem for widely used packages. For example, during the discussions for including the vector package into the Haskell Platform some reviewers asked for functions to be moved from one module into the other. Roman, vector's maintainer, argued against this noting it would break many dependencies -- a valid concern. Even if this was only a small issue, fear of breaking dependent packages can slow down improvements in package APIs.<br />
<br />
The Go programming language project has a tool called "<a href="http://blog.golang.org/2011/04/introducing-gofix.html">gofix</a>", which can automatically rewrite code for simple API changes and generates warnings for places that require human attention. Haskell has so much static information, that such a tool is quite feasible (e.g., HaRe can already do most of the important bits).<br />
<br />
So, I imagine that a newly-released package specifies up to two additional pieces of information:<br />
<br />
<ul><li>An annotated manifest indicating where semantic changes were made while retaining the same API. This can be seen as bumping the version of a single function/type, rather than of the whole API. To avoid the impact of human error this, too, should be tool supported. For example, if we compute an ABI hash for each function, we can detect which functions were modified. The package author can then decide if that was just a refactoring or an actual semantic change.<br />
(This has to be done with the help of tools. Imagine we refactor a frequently used internal utility function. Then all functions that use it would potentially have changed semantics. However, as soon that function is marked as backwards compatible, so will all its users. So it's important that a tool asks the package author for compatibility by starting with the leaf nodes.)</li>
<li>Optionally, the author may specify an upgrade recipe to be used by an automated tool or even just a user of the library. This could include simple instructions like renaming of functions (which includes items moved between modules or even packages), or more complicated things like a definition of a removed function in terms of newly-added functions. For more complicated changes a textual description of the changes can give higher-level instructions for how to manually upgrade. Since this should be human-readable anyway, we may as well specify this upgrade recipe in a (formally defined) format that looks like a Changelog file.</li>
</ul><br />
<ul></ul><div><h3>Summary</h3><br />
The PVP doesn't work well because it is too conservative and too coarse-grained. Haskell contains enough static information to accurately track dependencies at the level of functions and types. We should take advantage of this information.<br />
<br />
The ideas presented above certainly require refinement, but even if we have to be conservative in a few places (e.g., potentially conflicting instance imports), I think it will still be much less painful than the current system.<br />
<br />
Comments and constructive critiques welcome!<br />
<br />
</div>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com1tag:blogger.com,1999:blog-35609023.post-82635297258985339192012-07-31T18:25:00.000+02:002012-07-31T18:25:33.297+02:00Implementing Fast Interpreters: Discussion & Further ReadingMy <a href="http://nominolo.blogspot.co.uk/2012/07/implementing-fast-interpreters.html">last article on "Implementing Fast Interpreters"</a> was posted to both Hacker News and the Programming subreddit. Commenters pointed out some valid questions and some related work. This post aims to address some of the issue and collect the related work mentioned in the HN and Reddit comments and some more.<br />
<br />
<h3>
Why not write a simple JIT?</h3>
<br />
The main benefit of an interpreter usually is simplicity and portability. Writing the interpreter in assembly makes it a bit harder to write and you lose portability. So, if we have to go down to the level of assembly, why not write a simple template-based code generator? Wouldn't that be faster? The answer is: it depends.<br />
<br />
As Mike Pall (author of LuaJIT) pointed out, LuaJIT v1 was based on a simple JIT and does not consistently beat the LuaJIT v2 interpreter (which is written in assembly).<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-AXBSxaMupD4/UBWLvBmLOoI/AAAAAAAAAXo/7y1Jue3UzjE/s1600/luajit-v1jit-vs-v2interp.png" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="640" src="http://3.bp.blogspot.com/-AXBSxaMupD4/UBWLvBmLOoI/AAAAAAAAAXo/7y1Jue3UzjE/s640/luajit-v1jit-vs-v2interp.png" width="632" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Performance comparison of LuaJIT v1.1 (simple JIT) vs. LuaJIT v2 interpreter.</td></tr>
</tbody></table>
A JIT compiler (even a simple one) needs to:<br />
<div>
<ul>
<li>Manage additional memory. If the code is short-lived (e.g., there is another optimisation level) then unused memory must be reclaimed and fragmentation may become an issue.</li>
<li>Mark that memory as executable (possibly requiring flushing the instruction cache or operating system calls).</li>
<li>Manage the transition between compiled code segments. E.g., we need a mapping from bytecode targets to corresponding machine code. If the target hasn't been compiled, yet, this can be done on-demand and the branch instruction can be updated to directly jump to the target. If the target may ever change in the future, we also need a way to locate all the branches to it in order to invalidate them if necessary.</li>
<li>If the interpreter has multiple execution modes (e.g., a trace-based JIT has a special recoding mode which is only used for a short time), then code needs to be generated for each execution mode.</li>
</ul>
<div>
All this is quite a bit more complicated than a pure interpreter, even one written in assembler. Whether it's a good reason to take on this complexity depends on how high the interpreter overhead actually is. If the bytecode format is under our control then we can optimise it for our interpreter as done in LuaJIT 2. This isn't always the case, though. The DynamoRIO system is a runtime instrumentation framework for x86. The x86 instruction format is very complex and thus expensive to decode. DynamoRIO therefore does not use an interpreter and instead decodes the host program into basic blocks. (No code generation is really needed, because DynamoRIO emulates x86 on x86, but you could imagine using the same technique to emulate x86 on, say, ARM.) The challenges (and solutions) for this approach are discussed in <a href="http://www.burningcutlery.com/derek/phd.html">Derek Bruening's excellent PhD thesis on DynamoRIO</a>. Bebenita <i>et al</i>. describe an interpreter-less JIT-compiler design for a Java VM in their paper <a href="http://www.ics.uci.edu/~mbebenit/pubs/pppj-2010.pdf">"Trace-Based Compilation in Execution Environments without Interpreters"</a> (PDF).<br />
<br />
<h3>
DSLs for generating interpreters</h3>
<br />
The PyPy project generates a C interpreter and a trace-based JIT based on a description in RPython. "Restricted Python", a language with Python-like syntax, but C-with-type-inference like semantics. Laurence Tratt described his experiences using PyPy in his article <a href="http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time">"Fast Enough VMs in ast Enough Time"</a>. PyPy's toolchain makes many design decisions for you; its usefulness therefore depends on whether these decisions align with your goals. The <a href="http://morepypy.blogspot.com/">PyPy Status Blog</a> contains many examples of language VMs written in PyPy (e.g. <a href="http://morepypy.blogspot.com/2012/07/hello-everyone.html">PHP</a>, <a href="http://morepypy.blogspot.com/2011/04/tutorial-writing-interpreter-with-pypy.html">BF</a>, <a href="http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html">regular expressions</a>).<br />
<br />
A DSL aimed at developing a fast assembly interpreter can still be useful for other purposes. One problem in implementing a JIT compiler is ensuring that the semantics of the compiled code match the semantics of the interpreter (or baseline compiler). By having a slightly higher-level abstraction it could become possible to use the same specification to both generate the interpreter and parts of the compiler. For example, in a trace-based JIT, the code for recording and interpreting looks very similar:<br />
<pre>Interpreter: Trace Recorder:
mov tmp, [BASE + 8 * RB] Ref r1 = loadSlot(ins-&gt;b);
Ref r2 = loadSlot(ins-&gt;c);
add tmp, [BASE + 8 * RC] Ref r3 = emit(IR_ADD_INT, r1, r2);
mov [BASE + 8 * RA], tmp writeSlot(ins-&gt;a, r3);
</pre>
<br />
If we treat the trace recorder as just another architecture, it may save us a lot of maintenance effort later on.<br />
<br />
<h3>
Further Reading</h3>
<div>
<br />
There are many other resources on this topic, but here's a short selection of interesting articles and papers on the topic.</div>
<ul>
<li>Mike Pall on <a href="http://article.gmane.org/gmane.comp.lang.lua.general/75426">why compilers are having such a hard time optimising bytecode interpreters</a>.</li>
<li>Mike Pall explains <a href="http://www.reddit.com/r/programming/comments/hkzg8/author_of_luajit_explains_why_compilers_cant_beat/c1w8xyz">some important features of LuaJIT's ARM interpreter</a>. On x86 LuaJIT uses SSE instructions which are no longer costly on modern desktop processors. On ARM fast floating point support is not guaranteed, so now the interpreter has to efficiently support two number types (integer and float).</li>
<li>Some <a href="http://www.emulators.com/docs/nx25_nostradamus.htm">notes on branch prediction of interpreters</a>.</li>
<li>A longer <a href="http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/">article explaining a direct-threaded interpreter in C</a>.</li>
<li>The CPython interpreter uses direct threading if possible. It has an interesting note on <a href="http://hg.python.org/cpython/file/b127046831e2/Python/ceval.c#l828">how to stop GCC from unoptimising direct-threaded code</a>. If we're not careful, GCC (at least in some versions) may decide to "optimise" direct threaded code to remove duplicated code which however ends up decreasing branch prediction accuracy and thus performance.</li>
<li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.1271">Context threading</a> is a technique that uses a small amount of code generation to expose interpreter jumps to the hardware branch predictor thus eliminating most overhead due to branch prediction failures. Also check out <a href="http://www.cs.toronto.edu/~matz/pubs.html">Mathew Zaleski's other publications</a> on related topics.</li>
<li>Wingolog did an <a href="http://wingolog.org/archives/2012/06/27/inside-javascriptcores-low-level-interpreter">article on WebKits assembly-based interpreter and the Ruby-based DSL to generate it</a>. Also check out Wingo's articles on V8.</li>
<li>There's quite a large set of optimisations that can be done for purely interpreter-based implementations. Stefan Brunthaler's PhD thesis <a href="https://students.ics.uci.edu/~sbruntha/cgi-bin/download.py?key=thesis">"Purely Interpretative&nbsp;Optimizations"</a> (PDF) discusses these.</li>
<li><a href="http://www.complang.tuwien.ac.at/anton/vmgen/">VMGen</a> is a tool that generates interpreters from a higher-level description. I think I read some papers referencing it, but I haven't tried it myself.</li>
<li>The <a href="http://www.webkit.org/blog/189/announcing-squirrelfish/">announcement of SquirrelFish</a> from the WebKit team discusses the reasoning behind a register-based bytecode design. They also link to relevant papers.</li>
</ul>
</div>
<div>
If you have more links to good articles on the topic, please leave a comment.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0tag:blogger.com,1999:blog-35609023.post-15258395982216144872012-07-26T18:30:00.000+02:002012-07-27T00:25:41.608+02:00ARM's New 64 Bit Instruction SetYou may have heard that ARM, whose CPUs are extremely popular for embedded devices, is trying to move into the low-power server market. One of the current main difficulties for using ARM processors in servers is that it is only a 32 bit architecture (A32). That means, that a single process can address at most 4GB of memory (and some of it is reserved for the OS kernel). That isn't a problem on current embedded devices, but it can be on large multi-threaded server applications. To address this issue ARM has been working on a 64 bit instruction set (A64). To my knowledge there is no commercially available hardware that implements this instruction set, yet, but ARM has already released <a href="http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/03025.html">patches to the Linux kernel</a> to support it.<br />
<br />
To my surprise, this new 64 bit instruction set is quite different from the existing 32 bit instruction sets. (Perhaps, I shouldn't be surprised since the two Thumb instruction sets were indeed quite different from the existing instruction sets.) It looks like a very clean RISC-style design. Here are my highlights:<br />
<ul>
<li>All instructions are 32 bits wide (unlike the Thumb variants, but like the original A32).</li>
<li>31 general purpose 64 bit wide registers (instead of 14 general purpose 32-bit registers in A32). The 32nd register is either hardwired to zero or the stack pointer. These registers can be accessed as 32 bit (called <code>w0, w1, ..., w31</code>) or 64 bit registers (called <code>x0, x1, ..., x31</code>).</li>
<li>Neither the stack pointer (SP) nor the program counter (PC) are general purpose registers. They are only read and modified by certain instructions.</li>
<li>In A32, most instructions could be executed conditionally. This is no longer the case.</li>
<li>Conditional instructions are not executed conditionally, but instead pick one of two inputs based on a condition. For example, the "conditional select" instruction&nbsp;<code>CSEL x2, x4, x5, cond</code> implements <code>x2 = if cond then x4 else x5</code>. This subsumes a conditional move: <code>CMOV x1, x2, cond</code> can be defined as a synonym for <code>CSEL x1, x2, x1, cond</code>. There are many more of these conditional instructions, but they all will modify the target register.</li>
<li>A conditional compare instruction can be used to implement C's short-circuiting semantics. In a conditional compare the condition flags are only updated if the previous condition was true.</li>
<li>There is now an integer division instruction. However, it does not generate an exception/trap upon division by zero. Instead (x/0) = 0. That may seem odd, but I think it's a good idea. A conditional test before a division instruction is likely to be cheaper than a kernel trap.</li>
<li>The virtual address space is 49 bits or 512TB. Unlike x86-64/AMD64, where the top 16 bits must all be zero or all one, the highest 8 bits may optionally be usable as a tag. This is configured using a system register. I'm not sure if that will require kernel support. It would certainly come in handy for implementing many higher-level programming languages.</li>
<li>A number of instructions for PC-relative addressing. This is useful for position independent code.</li>
<li>SIMD instruction support is now guaranteed. ARMv8 also support for crypto instructions. These are also available in A32.</li>
</ul>
<div>
All the existing ARM instruction sets (except perhaps Jazelle) will still be supported. I don't think you can dynamically switch between different instruction sets as was the case for A32/Thumb, though.</div>
<div>
<br /></div>
<div>
Further reading:</div>
<div>
<ul>
<li><a href="http://www.arm.com/files/downloads/ARMv8_Architecture.pdf">ARMv8 Technology Preview (PDF slides)</a></li>
<li><a href="http://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf">ARMv8 Instruction Set Overview (PDF)</a></li>
</ul>
</div>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com2tag:blogger.com,1999:blog-35609023.post-22677810551413902882012-07-22T21:44:00.002+02:002012-07-23T12:14:06.629+02:00Implementing Fast InterpretersMany modern virtual machines include either a fast interpreter or a fast baseline compiler. A baseline compiler needs extra memory and is likely slower for code that is only executed once. An interpreter avoids this memory overhead &nbsp;and is very flexible. For example, it can quickly switch between execution modes (profiling, tracing, single-stepping, etc.). So what, then, is the state of the art of building fast interpreters?<br />
<div>
<br /></div>
<div>
Modern C/C++ compilers are very good at optimising "normal" code, but they are not very good at optimising large switch tables or direct-threaded interpreters. The latter also needs the "<a href="http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Labels-as-Values.html">Labels as Values</a>" GNU extension. In a direct-threaded interpreter, each instructions takes the following form:</div>
<br />
<div>
<pre>op_ADD_INT:
int op1 = READ_OP1;
int op2 = READ_OP2;
int result = op1 + op2; // The actual implementation.
WRITE_RESULT(result);
unsigned int opcode = pc-&gt;opcode;
++pc;
goto *dispatch[opcode]; // Jump to code for next instruction.</pre>
</div>
<br />
<div>
The variable <code>dispatch</code>&nbsp;(the dispatch table) is an array of labels, one for each opcode. The last three lines transfer control to the implementation of the next bytecode instruction. If we want to change the implementation of an instruction dynamically, we can just update the pointer in the dispatch table, or change the dispatch table pointer to point to a different table altogether.<br />
<br />
Note that everything except for the line "<code>result = op1 + op2</code>" is interpreter overhead and its cost should be minimised. The program counter (<code>pc</code>) and the dispatch table are needed by every instruction, so they should be kept in registers throughout. Similarly, we want a variable that points to the stack and possibly a variable that points to a list of literals — two more registers. We also need further registers for holding the operands. These registers better be the same for each instruction, since otherwise some instructions need to move things around, leading to memory accesses. On x86 we only have 7 general-purpose registers which makes it very hard for the compiler to optimally use them for all instructions.<br />
<br />
<h3>
Interpreters Written in Assembly</h3>
<br />
For this reason, most interpreters are hand-optimised assembly routines with a specially designed calling convention that keeps as much state as possible in registers. A particularly nice and fast example is the LuaJIT 2 interpreter.<br />
<br />
Both the standard Lua interpreter and the LuaJIT 2 interpreter use a register-based bytecode format. Compared to the more well-known stack-based bytecodes, a register-based bytecode has larger instructions, but requires fewer instructions overall. For example, the expression "<code>s = x + (y * z)</code>" in a stack-based bytecode would translate to something like:<br />
<br />
<pre>PUSHVAR 0 -- push variable "x" onto operand stack
PUSHVAR 1 -- push variable "y"
PUSHVAR 2 -- push variable "z"
MUL -- top of operand stack is (y * z)
ADD -- top of operand stack is x + (y * z)
STOREVAR 3 -- write result into variable "s"</pre>
<br />
With a few optimisations, this can be encoded as only 6 bytes. In a register-based bytecode this would translate to something like this:<br />
<br />
<pre>MUL 4, 1, 2 -- tmp = y * z
ADD 3, 0, 4 -- s = x + tmp</pre>
<br />
Each instruction takes a variable indices, the (virtual) registers, and reads from and writes directly to the variable (stored on the stack). In LuaJIT 2, each instruction requires 4 bytes, thus the overall bytecode size is a bit larger. However, executing these instructions incurs the interpreter overhead only twice instead of 6 times in the stack-based bytecode. It may also avoid memory traffic by avoiding the separate operand stack.<br />
<br />
The LuaJIT 2 interpreter also uses a very simple bytecode format that avoids further bit shuffling (increasing the cost of decoding). Instructions can only have one of two forms:<br />
<br />
<pre>OP A, B, C
OP A, D</pre>
<br />
Here OP, A, B, and C are 8-bit fields. OP is the opcode and A, B, C are usually register IDs or sometimes literals. D is a 16bit field and overlaps with B and C. It usually holds a literal value (e.g., a jump offset).<br />
<br />
The LuaJIT 2 interpreter now combines this with a calling convention where part of the next instruction is decoded before actually transferring control to it. This tries to take advantage of superscalar execution on modern CPUs. For example, the an integer addition instruction implemented using this technique would look as follows:<br />
<br />
<pre>bc_ADD_INT:
-- edx = BASE = start of stack frame. E.g., virtual register 5
-- is at memory address BASE + 5 * BYTES_PER_WORD
-- esi = PC, always points to the next instruction
-- ebx = DISPATCH = the dispatch table
-- ecx = A = pre-decoded value of field A
-- eax = D = pre-decoded value of field D
-- ebp = OP, opcode of current instruction (usually ignored)
-- Semantics: BASE[A] = BASE[B] + BASE[C]
-- 1. Decode D into B (ebp) and C (eax, same as D)
movzx ebp, ah -- zero-extend 8-bit register ah into ebp = B
movzx eax, al -- zero-extend 8-bit register al into eax = C
-- 2. Perform the actual addition
mov ebp, [edx + 4*ebp] -- read BASE[B]
add ebp, [edx + 4*eax] -- ebp = BASE[B] + BASE[C]
mov [edx + 4*ecx], ebp &nbsp; -- BASE[A] = ebp
-- 3. Dispatch next instruction
mov eax, [esi] &nbsp; &nbsp;-- Load next instruction into eax
movzx ecx, ah &nbsp; &nbsp; -- Predecode A into ecx
movzx ebp, al &nbsp; &nbsp; -- zero-extend OP into ebp
add esi, 4 &nbsp; &nbsp; &nbsp; &nbsp;-- increment program counter
shr eax, 16 &nbsp; &nbsp; &nbsp; -- predecode D
jmp [ebx + ebp * 4] &nbsp;-- jump to next instruction via dispatch table</pre>
<br />
The reason for predecoding some of the arguments is that the final "jmp" instruction may be quite expensive because indirect branches cause difficulties for branch predictors. The previous instructions help keep the pipeline somewhat busy if the branch did indeed get mispredicted.<br />
<br />
These 11 instructions are typically executed in about 5 clock cycles on an Intel Core 2 processor. Based on a very simple benchmark of only simple (addition, branch, compare) instructions, interpreted code is roughly 7x slower than machine code. For more complicated instructions the interpreter overhead becomes less severe and the slowdown should be smaller.<br />
<br />
Note, though, that in this example the ADD operation was type-specialised to integers. In many dynamic languages the addition operator is overloaded to work over several types. A generic ADD bytecode instruction then must include a type check (e.g., int vs. float) and then dispatch to the relevant implementation. This can introduce severe execution overheads due to the high cost of branch mispredictions on modern (desktop) CPUs. However, this is partly a language design issue (or at least language/implementation co-design issue) and independent of how we choose to implement our interpreter.<br />
<br />
In addition to the performance advantages over C-based interpreters, there is size advantage. It's a good idea to ensure that the frequently-executed parts of the interpreter fit within the L1 instruction cache (typically around 32-64 KiB). Writing the interpreter in assembly helps with that. For example, the above code requires exactly 32 bytes. If we chose to use register ebp for BASE, it would have been a few bytes larger due to encoding restrictions on x86.<br />
<br />
<h3>
Portable Fast Interpreters?</h3>
<br />
The downside of writing an interpreter in assembly is that it is completely unportable. Furthermore, if we need to change the semantics of a bytecode instruction we have to update it for each architecture separately. For example, LuaJIT 2 has interpreters for 6 architectures of around <strike>4000K</strike>&nbsp;4000 lines per architecture (ARMv6, MIPS, PPC, PPCSPE/Cell, x86, x86-64).<br />
<br />
The WebKit project therefore built its own <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/offlineasm">custom "portable" assembly language as a Ruby DSL</a>. For an example of what it looks like, see <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/llint/LowLevelInterpreter.asm">LowLevelInterpreter.asm</a>. Occasionally, you probably still want to special-case some code for a specific architecture, but it seems to go into the right direction. It also seems to make code more readable than Dalvik's (Android's VM) template substitution method. I'd rather have all the code in the same file (possibly with a few #ifdef-like places) rather than spread over 200+ files. It also should be quite simple to translate this assembly into C to get a default interpreter for architectures that are not yet supported.<br />
<br />
So far, every project seems to have built its own tool chain. I guess it's too much of a niche problem with too many project-specific requirements to give rise to a reusable standard tool set.</div>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com8tag:blogger.com,1999:blog-35609023.post-41386047170565484102010-04-30T17:25:00.005+02:002010-04-30T18:07:51.894+02:00Haskell Tip: Redirect stdout in HaskellHave you ever wanted to make sure that a call to a library cannot print anything to <code>stdout</code>? The following does this except that it redirects stdout globally and not just across a library call. This should be doable, but I haven't needed it yet.
<pre>import GHC.IO.Handle -- yes, it's GHC-specific
import System.IO
main = do
stdout_excl <- hDuplicate stdout
hDuplicateTo stderr stdout -- redirect stdout to stderr
putStrLn "Hello stderr" -- will print to stderr
hPutStrLn stdout_excl "Hello stdout" -- prints to stdout</pre>
The above code first creates a new handle to the standard output resource using <code>hDuplicate</code>. The call to <code>hDuplicateTo</code> redirects any output to the <em>Haskell</em> handle <code>stdout</code> to go to the handle <code>stderr</code>. The Haskell handle <code>stdout_excl</code> is now our only handle to the standard output resource.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0tag:blogger.com,1999:blog-35609023.post-24740982222290302742008-05-12T08:22:00.009+02:002008-05-13T12:34:06.896+02:00The Thing That Should Not Be (Or: How to import 18500+ patches from Darcs into Git in less than three days)<p>I like <a href="http://darcs.net/">Darcs</a>. Really. It is easy to learn and use and for smallish projects I never had any real problems. Unfortunately, it still has some performance problems and it is likely that some operations will never be fast.</p>
<p>An extreme example of where you run into those problems is the <a href="http://haskell.org/ghc/">GHC</a> repository. It consists of over 18500 patches and spans over 12 years of history. When I tried to build the latest version I ran into a linker error which I know I didn't get with the snapshot from one month ago. As GHC builds take quite a while I wanted to use an efficient way to find which exact change introduced the problem. More precisely I wanted <code>git bisect</code>.</p>
<p>I know that <a href="http://www.cse.unsw.edu.au/~dons/">Don</a> had converted Darcs repositories to Git in order to get <a href="http://www.ohloh.net/">ohloh</a> statistics, but he reported that this process was rather painful. It took four weeks(!) to convert the GHC repository.</p>
<p>So, I looked what tools were out there, and how to improve them. I know that there is <a href="http://progetti.arstecnica.it/tailor">Tailor</a>, but I looked at <a href="http://www.sanityinc.com/articles/converting-darcs-repositories-to-git">darcs-to-git</a> by Steve Purcell first and found it very hackable. I didn't like that it saved the Darcs patch ID in the Git commit message, so I changed that and I extended it to properly translate Darcs escape sequences. I also added a parameter to only pull a number of patches at the same time, so that I can import a big repository in stages and I allowed custom mapping from committer names to other committer names. I used this to map various pseudonyms to (a unique) full name and email address. (I hope no one minds being credited with his or her full name. ;) )</p>
<p>It worked rather well for smallish repositories (a bit less than 2000 patches) but I had serious problems to get it to work with GHC.</p>
<ul>
<li>Darcs has a bug on case-insensitive volumes (which OS X uses by default), so Steve suggested using a case-sensitive sparse image. This works, but it is probably a bit slower. I tried running it on my FreeBSD home server, but it has only 256 MB of RAM (usually fine for a home file server) so Darcs ran out of space and eventually got killed by the OS. (Getting Darcs to compile on my server was an adventure in itself--first a few hours to update the ports tree, then one more to compile GHC 6.8 which then just failed to install...) Fortunately, my Laptop has 2 GB, so it works fine there.
</li>
<li>At startup darcs-to-git reads the full Darcs patch inventory. For such a big repo as GHC this takes over a minute (and lots of RAM). Caching it in a file didn't seem to help much. I could have lived with that, but there was a more serious problem: the approach used by darcs-to-git (and, it seems, also by Tailor) doesn't work!
</li>
<li>darcs-to-git pulls one patch at a time by giving it's ID to <code>darcs pull --match 'hash ...id...'</code>, then <code>git add</code>s the changes on the Git side and <code>git commit</code>s it with the appropriate commit message. The patches are pulled in the order in which they were applied in the source repo, so any dependencies should be fulfilled. Nevertheless, Darcs refused to apply some patches -- silently. Darcs just determined that I didn't want to pull any patches and didn't do anything. This is most likely a Darcs bug, but I heard it was only a known bug for some development version of Darcs 2 (I used Darcs 1.0.9 at that time). Anyways, that didn't work; it failed at about patch 30 of the GHC repository.
</li>
<li>OK. So instead of pulling patches by ID we could fake user interaction. Something like this:
<pre>
$ echo "yd" | darcs pull source-repo</pre>
The input corresponds to "<strong>Y</strong>es, I want to pull this patch" and "Ok, I'm <strong>d</strong>one and want to pull all the selected patches". This works reliably and also has the advantage that we don't have to read the whole history up front but instead can just retrieve the info for the last applied patch via
<pre>
darcs changes --last 1 --xml</pre>
</li>
<li>By now you might have guessed, though, that this still didn't work very well. It took about 60 seconds per patch (with about 1 second of this used by Git), resulting in estimated 13 days(!) CPU time for the full repository.
</li>
<li>Interestingly, most of those 60 seconds are spent before any patch choice is displayed, so apparently Darcs is doing something to calculate which patches to show. After that, displaying more choices is relatively quick. Apparently, the startup time depends on the number of patches <emph>not yet pulled</emph>.
</li>
</ul>
<p>This leads to the following trick.</p>
<p>We use two intermediate repositories. We use one to pull several patches at a time from the source repository. I use 15 patches, ie.:
<pre>
$ cd tmp/ghc.pull
$ echo "yyyyyyyyyyyyyyyd" | darcs pull /path/to/ghc</pre>
We now could import from this intermediate repository into Git, since the startup time to pull from this repo is now much lower. However, we'd like to already start and pull the next 15 patches into the temporary repository. Pulling from and into the same repo at the same time doesn't work (Dars locks the repo), so we also need to mirror this temporary repository. A <code>cp -r</code> would work, but as the repository grows larger, this would do unnecessary work. So I just pull the changes at once.
<pre>
$ cd /tmp/ghc.pull_mirror
$ darcs pull --all /tmp/ghc.pull # this is pretty quick now</pre>
Now we can import into our git mirror from there, and already start pulling the next 15 patches (proper term for this is "macro pipelining", I believe).
<pre>
$ cd /path/to/ghc.git
$ ./darcs-to-git /tmp/ghc.pull_mirror & # run in background
$ cd /tmp/ghc.pull
$ echo "yyyy..." # etc</pre>
Of course, before pulling from the first mirror into the second mirror we have to make sure that <code>darcs-to-git</code> has finished pulling from the second mirror. I have implemented this as a shell script on top of darcs-to-git, but I may move it into darcs-to-git at some point.</p>
<p><a href="http://github.com/nominolo/darcs-to-git">My fork of darcs-to-git</a> as well as <a href="http://github.com/purcell/darcs-to-git">Steve's main repo</a> are both available at Github. I haven't pushed all of my local changes yet, but I plan to implement pulling dars patches "interactively" as a possible option for darcs-to-git, so maybe check the repo in a week or two.</p>
<p>With this approach I am down to about 200 seconds per 15 patches or about 68 hours fo the 18500 patches of the GHC repo which is just below the promised three days. (Of course, YMMV)</p>
<p>So the moral of this story? Darcs is very slow for biggish repositories, especially for rarely used border cases (such as pulling patches one by one). It may be possible to fix them, but I doubt that this will be easy. I tried using the new hashed format and the darcs-2 format, but converting the GHC repo didn't work for me. I certainly hope that things get better, and I plan to help at least a little by submitting several bug reports in the next couple of days about the problems I ran into in the past days. Let's see what happens.</p>
<p>Oh, and Darcs needs a killer-app like <a href="http://github.com/">Github</a>!</p>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com17tag:blogger.com,1999:blog-35609023.post-45339340791188965172008-03-16T16:34:00.005+01:002008-03-16T20:36:15.960+01:00A short reminderFolds and maps are Haskell's default iteration combinators. Mapping is easy enough, but folds can often be rather messy, especially if nested. For example, given a map of sets of some values, we want to write a function to swap keys and values. The function's type will be:
<pre><code>
type SetMap k a = Map k (Set a)
invertSetMap :: (Ord a, Ord b) => SetMap a b -> SetMap b a
</code></pre>
The resulting map should contain a key for each value of type b, occurring in any set in the original map. The new values (of type <code>Set a</code>) are all those original keys for which the new key occurred in the value set. Intuitively, if the original set was representing arrows from values of type a to values of type b, this function should reverse all arrows.
We can easily implement this function using two nested folds.
<pre><code>
invertSetMap sm =
M.foldWithKey
(\k as r ->
S.fold (\a r' -> M.insertWith S.union a (S.singleton k) r')
r
as)
M.empty
sm
</code></pre>
That's not pretty at all!
I had written quite a bit of this kind of code (and hated it each time), until I finally remembered a fundamental Haskell lesson. Haskell uses lists to simulate iteration and specify other kinds of control flow. In particular list comprehensions are often extremely cheap, since the compiler can automatically remove many or all intermediate lists and generate very efficient code. So let's try again.
<pre><code>
invertSetMap sm = M.fromListWith S.union
[ (a, S.singleton k) | (k, as) <- M.assocs sm
, a <- S.toList as ]
</code></pre>
So much more readable!
A quick benchmark also shows that it's slightly faster (a few percent for a very big map).
Lesson to take home: If your folds get incomprehensible consider list comprehensions.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com6tag:blogger.com,1999:blog-35609023.post-74096875711684477572007-10-05T15:07:00.000+02:002007-10-05T15:18:02.508+02:00New Haskell Tutorial<a href="http://lisperati.com/">Conrad Barski</a> recently made a new <a href="http://lisperati.com/haskell/">Haskell Tutorial</a> available. A while ago I stumbled upon Conrad's excellent <a href="http://lisperati.com/casting.html">Lisp Tutorial</a> which was well-received in the community and actually lead to a pretty cool (Common) <a href="http://www.lisperati.com/logo.html">Lisp-logo</a>. I haven't yet read the tutorial completely, but they are usually very well-written, newbie-friendly and based on interesting problems. So, check it out!
<p>PS: Greetings from the second Hackathon 2007.</p>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0tag:blogger.com,1999:blog-35609023.post-40859902363489283852007-05-21T22:29:00.001+02:002009-04-05T16:38:56.768+02:00Network.HTTP + ByteStrings<p><strong>Update:</strong> I mixed some numbers. I wrote about 375 MB, but it were 175 MB. (Noone seemed to have noticed though. Anyways, the argument still holds.)</p>
Haskell's <a href="http://www.haskell.org/http">Network.HTTP package</a> isn't quite as good as it could be. Well, to be precise, it is <em>not at all</em> as good as it <em>should</em> be. In addition to API problems (for which I proposed a solution in <a href="http://nominolo.blogspot.com/2007/05/towards-better-error-handling.html">my previous blog entry</a>) there's also a major performance problem, due to strictness and use of regular list-based <code>String</code>s. A simple <code>wget</code>-style program written in Haskell used like <code>./get http://localhost/file.big</code> on a local 175 MB file almost locked my 1GB laptop due to constant swapping. I had to kill it, as it was using up more than 500 MB of RAM (still swapping). At this point it had run for 50 seconds at had written not a single byte to the output file. At the same time a normal <code>wget</code> completed after abound 10 seconds. Since the file was retrieved from a local server I assume overall performance was inherently limited by disk speed (or the operating system's caching strategy).
The current implementation performed so badly for two reasons:
<ul>
<li>Since it uses list-based strings each retrieved byte will take up (at least) 8 byte in program memory (one cons cell, or tag + data + pointer to tail).</li>
<li>It implements custom, list-based buffering. The buffer size is 1000 characters/bytes, which is rather OK for line-based reading, but if the HTTP protocol requests to read a large block of data, this block will be read in 1000 byte chunks and then be <em>appended</em> to the part that has alrady been read. So if we read a block of 8000 bytes, the first block will be read and consequently be copied 8-times(!). Let's not think about reading a block of 175000000 bytes. Also because we already know the answer.</li></ul>
But let's not flame the original author(s). It's better than nothing and it gave me and my project partner <a href="http://www.dtek.chalmers.se/~tox/site/">Jonas</a> an interesting project topic.
So we decided to overcome the evil at its root and replace Strings using <a href="http://www.cse.unsw.edu.au/~dons/fps.html">ByteString</a>s--this way we would get buffering for free. To give you a taste for what this accomplishes:
<table>
<tr><th>Program</th><th>Runtime</th><th>Memory Use</th></tr>
<tr><td>wget</td><td style="text-align:right">~10s</td><td style="text-align:right">~0.5MB</td></tr>
<tr><td>./get using strict ByteStrings</td><td style="text-align:right">~18s</td><td style="text-align:right">~175MB</td></tr>
<tr><td>./get using lazy ByteStrings</td><td style="text-align:right">~11s</td><td style="text-align:right">~3MB</td></tr>
</table>
Adding strict ByteStrings was relatively straightforward. <code>Network.HTTP</code> already implements a simple Stream abstraction with a simple interface:
<pre class="example">
class Stream x where
readLine :: x -> IO (Result String)
readBlock :: x -> Int -> IO (Result String)
writeBlock :: x -> String -> IO (Result ())
close :: x -> IO ()
</pre>
Implementing this for strict ByteStrings is just a matter of calling the corresponding functions from the ByteStrings module. With one small annoyance: The HTTP parsing functions expect <code>readLine</code> to return the trailing newline, which <code>hGetLine</code> does not include, so we have to append it manually, which in turn is an O(n) operation.
For simplicity, we also didn't convert the header parsing and writing functions to use ByteStrings, but instead inserted the appropriate calls to <code>pack</code> and <code>unpack</code>. This could become a performance bottleneck if we have many small HTTP requests. OTOH, we might soon have a <a href="http://code.google.com/soc/haskell/appinfo.html?csaid=B97EF4562EF3B244">Parsec version that works on ByteStrings</a>.
As could be seen from the above benchmarks, using strict ByteStrings still forces us to completely load a packet into memory before we can start using it, which may result in unnecessary high memory usage. The obvious solution to this problem is to use lazy ByteStrings.
For lazy ByteStrings things work a bit differently. Instead of calling <code>hGet</code> and <code>hGetLine</code> inside the stream API, we call <code>hGetContents</code> when we open the connection. This gives us a lazy ByteString which we store in the connection object and then use regular list functions on that string to implement the required API.
<pre class="example">
openTCPPort uri port =
do { s <- socket AF_INET Stream 6
-- [...]
; h <- socketToHandle s ReadWriteMode
; bs <- BS.hGetContents h -- get the lazy ByteString
; bsr <- newIORef bs -- and store it as an IORef
; v <- newIORef (MkConn s a h bsr uri)
; return (ConnRef v)
}
readBlock c n =
readIORef (getRef c) >>= \conn -> case conn of
ConnClosed -> return (Left ErrorClosed)
MkConn sock addr h bsr host ->
do { bs <- readIORef bsr
; let (bl,bs') = BS.splitAt (fromIntegral n) bs
; writeIORef bsr bs'
; return $ Right bl
}
readLine c =
readIORef (getRef c) >>= \conn -> case conn of
ConnClosed -> return (Left ErrorClosed)
MkConn sock addr h bsr host ->
do { bs <- readIORef bsr
; let (l,bs') = BS.span (/='\n') bs
; let (nl,bs'') = BS.splitAt 1 bs'
; writeIORef bsr bs''
; return (Right (BS.append l nl)) -- add '\n'
}
`Prelude.catch` \e -> [...]
</pre>
There are two main problems with this implementation, though:
<ul>
<li>ByteStrings currently only work on handles not on sockets. Thus we have to turn sockets into handles using <code>socketToHandle</code> which, according to the source code linked from the Haddock documentation will fail if we're in a multithreaded environment. (search for "PARALLEL_HASKELL" in <a href="http://darcs.haskell.org/packages/network/Network/Socket.hsc">Network.Socket's source</a>.</li>
<li>Furthermore, after converting a socket to a handle we should no longer use this socket. So we can't change any settings of the socket, but close it by calling <code>hClose</code> on the handle.
HTTP allows the user to specify whether a socket should be closed after the response has been received. This is a bit more tricky when we use lazy ByteStrings since our request function will return immediately with a lazy ByteString as a result but no data has been read (except, maybe, on block). We thus must not close the socket right away, but only after all its contents have been read. So we must rely on <code>hGetContents</code> to close our handle (and thus socket) -- which is does not! From recent #haskell comments this seems to be bug. In any case though we'd want to be able to specify the behavior, as we might as well keep the socket open.</li>
</ul>
There are further issues to consider. E.g., can we rely on the operating system to buffer everything for us if we don't read it right away? I don't know the details, but I assume this is handled by some lower layer, possibly dropping packages and re-requesting them if necessary. That's just guessing though.
Unfortunately, I will not have the time to work out these issues anytime soon, as I will be busy with my Google Summer of Code project (cabal configurations). There also is a <a href="http://code.google.com/soc/haskell/appinfo.html?csaid=D4DEE221DAC4E810">SoC project to replace Network.HTTP with libcurl bindings</a>, but it would probably be a good idea to still have a reasonable Haskell-only solution around. So if anyone wants to pick it up, you're welcome!
You can get the sources for the lazy version with
<code>darcs get <a href="http://www.dtek.chalmers.se/~tox/darcs/http">http://www.dtek.chalmers.se/~tox/darcs/http</a></code>
and for the strict version
<code>darcs get <a href="http://www.dtek.chalmers.se/~tox/darcs/http-strict">http://www.dtek.chalmers.se/~tox/darcs/http-strict</a></code>
If you're interested you can take a look at <a href="http://www.dtek.chalmers.se/~tox/site/http.php4">our project page</a>.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com10tag:blogger.com,1999:blog-35609023.post-38473206524892802032007-05-07T12:45:00.000+02:002007-05-07T15:59:01.358+02:00Towards Better Error Handling<p>A while ago Eric Kidd wrote a <a href="http://www.randomhacks.net/articles/2007/03/10/haskell-8-ways-to-report-errors">rant about inconsistent error reporting mechanisms</a> in Haskell. He found eight different idioms, none of which were completely satisfying. In this post I want to propose a very simple but IMO pretty useful and easy-to-use scheme, that works with standard Haskell.</p>
<p>The Haskell <a href="http://www.haskell.org/http/">HTTP Package</a> is a good test case for such scheme. The most immediate requirements are:</p>
<ul>
<li>It should work from within any monad (not just <code>IO</code>).</li>
<li>It should be possible to catch and identify any kind of error that happened inside a call to a library routine.</li>
<li>It should be possible to ignore the error-handling (e.g., for simple scripts that just die in case of error)</li>
</ul>
<p>So far, the public API functions mostly have a signature like</p>
<pre class="example">
type Result a = Either ConnError a
simpleHTTP :: Request -&gt; IO (Result Response)
</pre>
<p>This requires C-style coding where we have to check for an error after each call. Additionally, we might still get an <code>IOException</code>, and have to catch it somewhere else (if we want to). A simple workaround is to write a wrapper function for calls to the HTTP API. For example:</p>
<pre class="example">
data MyErrorType = ... | HTTPErr ConnError | IOErr IOException
instance Error MyErrorType where
noMsg = undefined -- who needs these anyways?
strMsg _ = undefined
instance MonadError MyErrorType MyMonad where ...
-- | Perform the API action and transform any error into our custom
-- error type and re-throw it in our custom error type.
ht :: IO (Result a) -&gt; MyMonad a
ht m = do { r &lt;- io m
; case r of
Left cerr -&gt; throwError (HTTPErr cerr)
Right x -&gt; return x
}
-- | Perform an action in the IO monad and re-throw possible
-- IOExceptions as our custom error type.
io :: IO a -&gt; MyMonad a
io m = do { r &lt;- liftIO $
(m &gt;&gt;= return . Right)
`catchError` (\e -&gt; return (Left e))
; case r of
Left e -&gt; throwError (IOErr e)
Right a -&gt; return a
}
</pre>
<p>We defined a custom error type, because we can have only one error type per monad. Exceptions in the IO monad and API error messages are then caught immediately and wrapped in our custom error type.</p>
<p>But why should every user of the library do that? Can't we just fix the library? Of course we can! Now, that we have a specific solution we can go and generalize. Let's start by commenting out the type signatures of <code>ht</code> and <code>io</code> and ask <code>ghci</code> what it thinks about the functions' types:</p>
<pre class="example">
*Main&gt; :t io
io :: (MonadIO m, MonadError MyErrorType m) =&gt; IO a -&gt; m a
*Main&gt; :t ht
ht :: (MonadIO t, MonadError MyErrorType t) =&gt;
IO (Either ConnError t1) -&gt; t t1
</pre>
<p>Alright, this already looks pretty general. There's still our custom <code>MyErrorType</code> in the signature, though. To fix this we apply the standard trick and use a type class.</p>
<pre class="example">
data HTTPErrorType = ConnErr ConnError | IOErr IOException
-- | An instance of this class can embed 'HTTPError's.
class HTTPError e where
fromHTTPError :: HTTPErrorType -&gt; e
</pre>
<p>Our wrapper functions now have a nice general type, that allows us to move them into the library.</p>
<pre class="example">
throwHTTPError = throwError . fromHTTPError
ht :: (MonadError e m, MonadIO m, HTTPError e) =&gt;
IO (Result a) -&gt; m a
ht m = do { r &lt;- io m
; case r of
Left cerr -&gt; throwHTTPError (ConnErr cerr)
Right a -&gt; return a
}
-- | Perform an action in the IO monad and re-throw possible
-- IOExceptions as our custom error type.
io :: (MonadError e m, MonadIO m, HTTPError e) =&gt;
IO a -&gt; m a
io m = do r &lt;- liftIO $
(m &gt;&gt;= return . Right)
`catchError` (\e -&gt; return (Left e))
case r of
Left e -&gt; throwHTTPError (IOErr e)
Right a -&gt; return a
</pre>
<p>After wrapping, all exported functions will have a signature of the form:
<pre class="example">
f :: (MonadError e m, MonadIO m, HTTPError e) =&gt;
... arguments ... -&gt; m SomeResultType
</pre></p>
<p>Now the user is free to choose whichever monad she wants (that allows throwing errors and I/O). The only added burden is for the user to specify how to embed a <code>HTTPError</code> in the respective error type of the monad. We can already specify the instance for <code>IO</code>, though.</p>
<pre class="example">
instance HTTPError IOException where
fromHTTPError (IOErr e) = e
fromHTTPError (ConnErr e) = userError $ show e
</pre>
<p>This way, our modified API works nicely out of the box whenever we just use the <code>IO</code> monad and we can use it in our custom monad by writing only one simple instance declaration.</p>
<pre class="example">
data MyErrorType = ... | HTTPErr HTTPErrorType
instance HTTPError MyErrorType where
fromHTTPError = HTTPErr
test1 req = do { r &lt;- simpleHTTP req
; putStrLn (rspCode r)
} `catchError` handler
where handler (HTTPErr (ConnErr e)) = putStrLn $ &quot;Connection error.&quot;
handler (HTTPErr (IOErr e)) = putStrLn $ &quot;I/O Error.&quot;
handler _ = putStrLn $ &quot;Whatever.&quot;
</pre>
<p>If we don't care about the error and thus don't want to implement the instance, we can still force our API to be in the <code>IO</code> monad and thus reuse <code>IOException</code> to embed possible HTTP errors.</p>
<pre class="example">
test2 req = do { r &lt;- liftIO $ simpleHTTP req
; putStrLn (rspCode r)
}
</pre>
<p>I think this is a very simple but useful scheme. I already implemented this with a friend in the HTTP package&mdash;and it works (<em>without</em> <code>-fglasgow-exts</code>).</p>
<p>In addition to the added type class, there is the further potential drawback that an <code>IOException</code> will always be wrapped in an API-specific error type. So when a program uses more than one API that uses this scheme, an <code>IOException</code> may be wrapped in either, which may or may not be what is desired. A more sophisticated system, that deals with this problem and provides additional features, is explain in Simon Marlow's paper <a href="http://www.haskell.org/~simonmar/papers/ext-exceptions.pdf">&quot;An Extensible Dynamically-Typed Hierarchy of Exceptions&quot; (PDF)</a>.</p>
<p>Comments welcome.</p>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com1tag:blogger.com,1999:blog-35609023.post-11961552031454626882006-12-20T22:18:00.000+01:002006-12-21T00:01:50.360+01:00More on SyntaxMy last post <a href="http://programming.reddit.com/info/utz4/comments">appeared on reddit</a>--thanks dons! This induced some comments I'd like to respond to.
First of all, there already is a macro system for Haskell, called (somewhat misleadingly) <a href="http://www.haskell.org/th/">Template Haskell</a>. It already provides the capabilities to generate arbitrary Haskell code. (More correctly, Haskell 98 code, since extensions like Generalized ADTs are not supported.) It also provides the given quasi-quotation mechanism I used in my last post's mock-ups: <code>[| ... |]</code>.
However it has two problems. Firstly, macros are marked specially using the <code>$(<em>macro</em> ...)</code> syntax, which is not as seamless as it could be, although there might be good reasons to keep it, namely to make it easily recognizable, when macros are involved. Secondly, its quasi-quotation syntax is very limited, i.e., you cannot introduce new bindings and it's hard to modularize code--but I might be wrong with here since I might not have pushed it as far as possible. The problem is: when you cannot use the quasi-quotation syntax then you're left building up the quite complex Haskell parse tree yourself. Due to limited documentation and Haskell's syntax rules you usually write your macros by first getting the AST of some sample code, e.g. using:
<pre>-- | print a human-readable representation of a given AST
printAST :: ExpQ -> IO ()
printAST ast = runQ ast >>= putStrLn . show
pp = printAST [| let x = $([|(4+)|]) in x 5 |]</pre>
which then (reformatted) looks like this:
<pre>$ pp
LetE [ValD (VarP x_0)
(NormalB (InfixE (Just (LitE (IntegerL 4)))
(VarE GHC.Num.+) Nothing))
[]]
(AppE (VarE x_0) (LitE (IntegerL 5)))</pre>
Then you try to customize this for your purposes. Not pretty.
My actual attempt was to take a type name as a parameter, inspect it, and then generate some boilerplate code. Well, I tried but gave up after being unable to construct some type. Maybe I didn't try hard enough. Anyways, macro-writing should be that hard!
My proposed solution certainly is just a sketch of an idea, essentially pointing to prior art. I don't claim that this will in fact work nicely or even that it will work at all. I am pretty confident that it <em>might</em>, though, and I am planning to give it a shot later on. Maybe extending Template Haskell with features similar t o Scheme's <code>syntax-case</code> might be enough, for a start.
And yet, I don't consider this a high-priority project, since a lot of uses for macros in Lisp can be solved differently in Haskell, as has also been mentioned in the comments to my previous post:
<ul><li>Controlling the order of evaluation is not necessary in Haskell since, due to lazyness. And if we have to control it somehow, we mostly use monads.</li>
<li>The whole category of <code>(with-<em>something</em> (<em>locally bound vars</em>) ...)</code> can be implemented almost as conveniently using <code>with<em>Foo</em> \<em>locally bound vars</em> -> do ...</code></li>
<li>A lot of cases for special syntax can be achieved using clever operator and constructor naming. E.g., in <a href="http://wxhaskell.sourceforge.net/">wxHaskell</a>: <code>t &lt;- timer f [interval := 20, on command := nextBalls vballs p]</code>, or, for an in-progress project of mine I simulate a convenient assembler syntax by allowing a notation like: <code>res &lt;-- a `imul` c</code>. However, I was not able to use the <code>&lt;-</code> notation, since I have different scoping rules than Haskell and I'm not in a monad.</li>
<li>Many cases of boilerplate code generation can be covered using generic programming, e.g. using <a href="http://www.cs.vu.nl/boilerplate/">Scrap Your Boilerplate</a>.</li>
</ul>
So where would (more usable) macros still make sense?
<ul><li>Allow more flexible syntax for domain-specific embedded languages (DSELs), e.g. an XML library or a parser library might profit from this right now. (Yes, I think <a href="http://www.cs.uu.nl/~daan/parsec.html">Parsec</a> could be more readable). Also, DSLs like Happy would be even nicer if embedded directly into Haskell. Arrows and Monads were considered general enough concepts to introduce new syntax for them, but I think there's more out there that deserves it. I also think that an upcoming project of mine might hit the limits of what's currently possible in Haskell. <a href="http://article.gmane.org/gmane.comp.lang.haskell.cafe/17735">Some people seem to agree</a>.</li>
<li>Speaking of ParseC there's still one common use for Lisp-macros: optimizing at compile-time. You can get quite far by carefully designing your combinators for your DSELs. However, combining nice syntax and performance is very hard. In Lisp, the <code>loop</code> embedded language, for example, does quite heavy transformations on the given code. Partial evaluation is probably the more general solution here, but it seems to be not quite ready for primetime, yet.</li>
<li>The point, that a powerful enough system would essentially make syntactic sugar a library can be seen as a positive side effect, too. But I think this doesn't have much practical significance.</li></ul>
Bottom line: There certainly are less uselful applications of macros in Haskell than, e.g. in Lisp, but there are serious enough arguments to at least consider them.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0tag:blogger.com,1999:blog-35609023.post-11609515434703078422006-12-14T12:29:00.000+01:002006-12-14T17:47:50.337+01:00SyntaxComing to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up. Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions.
You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax. I certainly agree with <a href="http://cgi.cse.unsw.edu.au/~dons/blog/2006/12/14#on-syntax">dons' post on this issue</a>. If only it weren't for the disadvantages!
Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first. But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler. This is a very powerful way of syntactic abstraction.
I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet. Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax.
This system is (partly) implemented in the <a href="http://www.opendylan.org/">Dylan programming language</a> and is called D-Expressions. It is essentially an extension of the Lisp syntax.
Lisp in its most primitive form it looks like this:
<code>Expr ::= Atom | ( Expr* )</code>
This represents a simple tree. However, why should we restrict ourselves to represent nested structure with parens? We might as well use brackets or braces or keywords. So:
<code>Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end</code>
This way we still retain the unambiguous nesting structure, but have a more varied syntax. Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating <code>{</code> ... <code>}</code> into <code>(<em>some-macro ...</em>)</code>. This however has one big problem: You can only have <em>one</em> macro for <code>{</code> ... <code>}</code>. Common Lisp "fixes" this by using some prefix to the parens, e.g. <code>#c(3 4)</code> denotes a complex number. This isn't exactly beautiful and doesn't scale well, though.
In fact we'd rather like to have the chance to use this more variable syntax <em>inside</em> our macros and we'd need a way to make it scalable. Dylan solves this by allowing three kinds of macros:
<ul><li>Definition-style macros have the form: <code>define <em>modifiers*</em> <em>macro-name</em> <em>Expr*</em> end</code></li>
<li>Function-style macros have the form: <code><em>macro-name</em>(<em>Expr*</em>)</code></li>
<li>Statement-style macros have the form: <code><em>macro-name</em> <em>Expr*</em> end</code></li>
</ul>
Macros are defined using pattern matching rewrite rules, e.g.:
<pre>define macro when
{ when ?cond:expr ?body:body end }
=> { if ?cond ?body else #f end }
end</pre>
Here <code>?cond:expr</code> states that the pattern variable <code>cond</code> matches a form of the <em>syntactic category</em> of an expression. Similarly, for <code>?body:body</code>.
Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this <a href="http://people.csail.mit.edu/jrb/Projects/dexprs.htm">paper on D-Expressions</a> .
Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity. This way embedding DSLs in Haskell should be even more seemlessly and <a href="http://syntaxfree.wordpress.com/2006/12/12/do-notation-considered-harmful/">rants about the do-notation</a> would not be needed.
Here's how the implementation of the syntactic sugar to list expressions could look like:
<pre>macro do {
[| do { ?e:expr } |] => [| ?expr |]
[| do { ?e:pat <- ?e:expr; ??rest:* } |]
=> [| ?expr >>= (\?pat -> do { ??rest ... }) |]
[| do { let ?pat = ?expr; ??rest:* } |]
=> [| let ?pat = ?expr in do { ??rest ... } |]
...
}</pre>
where <code>??rest:*</code> matches anything up to the closing "}" (resulting the pattern variable <code>rest</code> to be bound to a sequence of tokens) and <code>??rest ...</code> expands all tokens in <code>rest</code>.
Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actually achieve whitespace sensitivie macros--but I think it would be very useful extension.
Comments welcome! :)
<strong>Edit:</strong> The last example actually is just a mockup of some possible syntax and does not even make much sense in Dylan either. But you get the idea (I hope).
I've been pointed to a similar idea, called <a href="http://www.cs.uu.nl/people/arthurb/macros.html">Syntax Macros</a>. Seems to be along the same lines.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com4tag:blogger.com,1999:blog-35609023.post-79711378990273748812006-11-18T01:28:00.000+01:002006-11-18T02:02:09.629+01:00Specification-based TestingI just had the seldom opportunity to hear a live talk by <a href="http://www.cs.chalmers.se/%7Erjmh/">John Hughes</a> given to a small group of <a href="http://www.chalmers.se/">Chalmers</a> students which I happened to part of. He was re-giving his talk he held one week ago at the <a href="http://www.erlang.se/euc/06/">Erlang User Conference</a> about specification-based testing of Ericsson software (written in <a href="http://www.erlang.org/">Erlang</a>).
In the following I'll try to give a summary of what I consider the most essential parts of his very interesting talk. For further information see John's (et al.) <a href="http://www.ituniv.se/program/sem_research/Publications/2006/AHJW06/">paper</a> or the <a href="http://lambda-the-ultimate.org/node/1827">LtU discussion</a>.
Users of <a href="http://www.md.chalmers.se/%7Erjmh/QuickCheck/">QuickCheck</a> know what nice advantages specification-based testing has: Instead of lots and lots of test cases one writes <span style="font-style: italic;">properties</span> of functions, for example (in Erlang):
<pre> prop_reverse() ->
?FORALL(Xs, list(int())),
?FORALL(Ys, list(int())),
reverse(Xs++Ys) ==
reverse(Xs) ++ reverse(Ys).</pre>
In fact this specification is <em>wrong</em> as a quick check (pun intended) shows us:
<pre> Failed! After 12 tests.
[-3,2]
[3,0]
Shrinking....(10 times)
[0][1]</pre>
QuickCheck provides two important features:
<ul><li>It provides the interface to a <em>controlled</em> generation of test cases (as opposed to simply random ones, which usually are of little help).</li>
<li>It generates and runs tests of the specified properties and--this might be new to users of the <a href="http://www.haskell.org/ghc/">GHC</a> package <code>Test.QuickCheck</code>--<em>shrinks</em> them to smaller test cases. This is important as it often hard to find the actual cause of a bug, especially if test cases are very large. (John mentioned that the Mozilla developers actually offer T-Shirts to people just <em>reducing</em> test cases as they found this to be an extremely time consuming task.</li></ul>
In our examples the error was in the specification and can be fixed by substituting <code>Xs</code> and <code>Ys</code> in the last line.
<h2>Advantages of Specification-based Testing</h2>
The most obvious advantage of using QuickCheck over usual test cases is that it allows us to dramatically reduce the number of test cases. In fact it also does a great job in finding corner cases. In their field study at Ericsson, while spending only 6 days to implement all the test properties (and a library for state-machine-based testing), they found 5 bugs in an already well-tested soon-to-release product, one of which would have been really unlikely to be triggered by any test case and revealed 9 bugs in an older version of the project, of which only one has been documented at this time. (see the paper). Neil Mitchell also recently <a href="http://neilmitchell.blogspot.com/2006/11/systemfilepath-automated-testing.html">posted about the merits of QuickCheck</a> can be.
John added one note, though. The bugs they found were very small (due to shrinking) and in fact would have never been caused in the real system, because the command sequences that lead to the failures would have never been triggered by the real controller. However, they tested against the documented (400+ pages) specification. This leads us to one other important advantage of specification-based testing.
QuickCheck forces (or allows) us to actually state the specification as executable properties. therefore we might not only find errors in the program but, as seen in our example, also in the specification. This can be very useful and he gave a nice demonstration of buggy (or incomplete) specification for the Erlang functions <code>register/2</code>, <code>unregister/1,</code> and <code>whereis/1</code> that provide a sort of name server for Erlang processes.
To do this he generated a sequence of Erlang calls and checked if their results corresponded to the model of a state machine based on the specification. That is our current state consisted of a simple map, representing the (assumed) state of the name server. Using preconditions he controlled the allowable sequences of generated commands and checked their outcome using postconditions.
This small experiment showed quite a couple of incompletenesses in the Erlang specification, e.g., it does state that <code>register/2</code> will fail if you try to register the same process under different names, but it does not state that the process is not added to the list of registered processes (although sensible to assume--nevertheless, the specification is incomplete). (I think he had some more serious example, but I can't remember what exactly it was.)
<h2>Remarks</h2>
Having it used myself I am very convinced of the advantages of QuickCheck. In reply to my question about testability of non-deterministic faults, caused by the effects of concurrent execution of processes, John remarked, that you can get quite deterministic behavior (thus causing reproducible test results) by running the programs on a single CPU and rely on the deterministic scheduler. I am not so sure how far this reaches, but then again, you should use Erlang's behaviors as much as possible.Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0tag:blogger.com,1999:blog-35609023.post-1163118161940117772006-11-10T01:14:00.000+01:002006-11-11T14:34:16.597+01:00Being LazyLazy? Me? No. Noo. Never!
But here's a <a href="http://programming.reddit.com/info/pylx/comments">Reddit discussion</a> (warning: long!) that--even though blown up by an obvious troll--has some nice statements about performance, usability, and composability implications of lazy evaluation. Quite interesting (if you filter out the noise).Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com2tag:blogger.com,1999:blog-35609023.post-1160150871190961142006-10-06T18:01:00.000+02:002006-11-11T14:19:10.917+01:00UnoEverything has a beginning and an ending.
So, this is the beginning of my humble blog--let's hope it's not going to see its end soon.
This blog will (presumably) be about programming languages, compilers, concurrency, maybe a bit about a human-computer-interaction, and about some general life-related stuff. With time I'll also try to adopt to some <span style="text-decoration: underline;"><a href="http://www.useit.com/alertbox/weblogs.html">blog usability...</a>
</span>Thomas Schillinghttp://www.blogger.com/profile/04274984206279511399noreply@blogger.com0