gregory david collinshttp://gregorycollins.net/posts/feed.xml2011-10-01T18:31:00+0200Gregory Collinsgreg@gregorycollins.nethttp://gregorycollins.net/posts/2011/10/01/cufp-tutorial-slidesSlides from CUFP 2011: Snap Framework Tutorial2011-10-01T18:31:00+0200Gregory Collinsgreg@gregorycollins.net<p>I recently gave a <a href="http://cufp.org/conference/sessions/2011/t7-snap-framework-web-applications-haskell-gregory">tutorial</a> at <a href="http://cufp.org/">CUFP 2011</a> about web programming in Haskell using the Snap Framework.</p>
<p>The tutorial application is called “Snap Chat”, and is an implementation of a simple multi-user chat room, using Haskell concurrency primitives and HTTP long-polling.</p>
<p><a href="cufp2011/index.html">View the slides from the talk in HTML here.</a></p>
<div class="figure">
<img src="snap-chat-1.png" alt="Logging into the chat room" /><p class="caption">Logging into the chat room</p>
</div>
<div class="figure">
<img src="snap-chat-2.png" alt="An exciting discussion" /><p class="caption">An exciting discussion</p>
</div>2011-10-01T18:23:00+0200Slides for the tutorial I gave about web programming in the Snap Framework, from CUFP 2011 in Tokyo, Japan.http://gregorycollins.net/posts/2011/06/11/announcing-hashtablesAnnouncing: "hashtables", a new Haskell library for fast mutable hash tables2011-10-21T16:27:00+0200Gregory Collinsgreg@gregorycollins.net<!--[if lte IE 8]><script language="javascript" type="text/javascript" src="../excanvas.min.js"></script><![endif]-->
<script language="javascript" type="text/javascript" src="util.js"></script>
<script language="javascript" type="text/javascript" src="data.js"></script>
<script language="javascript" type="text/javascript" src="jquery.js"></script>
<script language="javascript" type="text/javascript" src="jquery.flot.js"></script>
<style type="text/css">
body {
font-family: sans-serif;
margin: 50px auto;
max-width: 960px;
}
figure { font-size: 80%; text-align: center; color: #777; margin: 2em auto; }
div.chart { width: 610px; height: 340px; }
table.chartTable { margin: 1em auto; }
table.chartTable td.yaxis { width: 60px; }
table.chartTable td { font-size: 70%; color: #888; }
</style>
<p>I’m very pleased to announce today the release of the first version of <a href="http://hackage.haskell.org/package/hashtables">hashtables</a>, a Haskell library for fast mutable hash tables. The <code>hashtables</code> library contains three different mutable hash table implementations in the <code>ST</code> monad, as well as a type class abstracting out the functions common to each and a set of wrapper functions to use the hash tables in the <code>IO</code> monad.</p>
<div id="whats-included"><h2>What’s included?</h2>
<p>The <code>hashtables</code> library contains implementations of the following data structures:</p>
<ul>
<li><p><code>Data.HashTable.ST.Basic</code>: a basic open addressing hash table using <a href="http://en.wikipedia.org/wiki/Linear_probing">linear probing</a> as the collision resolution strategy. On a pure speed basis, this should be the fastest currently-available Haskell hash table implementation for lookups, although it has a higher memory overhead than the other tables. Like many hash table implementations, it can also suffer from long delays when the table is grown due to the rehashing of all of the elements in the table.</p></li>
<li><p><code>Data.HashTable.ST.Cuckoo</code>: an implementation of <a href="http://en.wikipedia.org/wiki/Cuckoo_hashing">Cuckoo hashing</a>, as introduced by Pagh and Rodler in 2001. Cuckoo hashing features worst-case <em>O(1)</em> lookups and can reach a high “load factor”, meaning that the table can perform acceptably well even when more than 90% full. Randomized testing shows this implementation of cuckoo hashing to be slightly faster on insert and slightly slower on lookup than <code>Data.HashTable.ST.Basic</code>, while being more space-efficient by about a half word per key-value mapping. Cuckoo hashing, like open-addressing hash tables, can suffer from long delays when the table is forced to grow.</p></li>
<li><p><code>Data.HashTable.ST.Linear</code>: a <a href="http://en.wikipedia.org/wiki/Linear_hashing">linear hash table</a>, which trades some insert and lookup performance for higher space efficiency and much shorter delays during table expansion. In most cases, randomized testing shows this table to be slightly faster than <code>Data.HashTable</code> from the Haskell base library.</p></li>
</ul></div>
<div id="why-data.hashtable-is-slow"><h2>Why Data.HashTable is slow</h2>
<p><a href="http://flyingfrogblog.blogspot.com/2009/04/f-vs-ocaml-vs-haskell-hash-table.html">People often remark</a> that the hash table implementation from the Haskell base library is slow. Historically, there have been a couple of reasons why: firstly, Haskell people tend to prefer persistent data structures to ephemeral ones. Secondly, until <a href="http://hackage.haskell.org/trac/ghc/ticket/650">GHC 6.12.2</a>, GHC had unacceptably large overhead when using mutable arrays due to a lack of <a href="http://www.memorymanagement.org/glossary/c.html#card.marking">card marking</a> in the garbage collector.</p>
<p>However, performance testing on newer versions of GHC still shows the hash table implementation from the Haskell base library to be slower than it ought to be. To explain why, let’s examine the data type definition for <code>Data.HashTable</code>:</p>
<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">HashTable</span> key val <span class="fu">=</span> <span class="dt">HashTable</span> {<br /><span class="ot"> cmp </span><span class="ot">::</span> <span class="fu">!</span>(key <span class="ot">-&gt;</span> key <span class="ot">-&gt;</span> <span class="dt">Bool</span>),<br /><span class="ot"> hash_fn </span><span class="ot">::</span> <span class="fu">!</span>(key <span class="ot">-&gt;</span> <span class="dt">Int32</span>),<br /><span class="ot"> tab </span><span class="ot">::</span> <span class="fu">!</span>(<span class="dt">IORef</span> (<span class="dt">HT</span> key val))<br /> }<br /><br /><span class="kw">data</span> <span class="dt">HT</span> key val<br /> <span class="fu">=</span> <span class="dt">HT</span> {<br /><span class="ot"> kcount </span><span class="ot">::</span> <span class="fu">!</span><span class="dt">Int32</span>, <span class="co">-- Total number of keys.</span><br /><span class="ot"> bmask </span><span class="ot">::</span> <span class="fu">!</span><span class="dt">Int32</span>,<br /><span class="ot"> buckets </span><span class="ot">::</span> <span class="fu">!</span>(<span class="dt">HTArray</span> [(key,val)])<br /> }</code></pre>
<p>For now, let’s ignore the <code>HashTable</code> type, as it is essentially just an <code>IORef</code> wrapper around the <code>HT</code> type, which contains the actual table. The hash table from <code>Data.HashTable</code> uses separate chaining, in which keys are hashed to buckets, each of which contains a linked list of <code>(key,value)</code> tuples. To explain why this is not an especially smart strategy for hash tables in Haskell, let’s examine what the memory layout of this data structure looks like at runtime.</p>
<div class="figure">
<img src="Data.HashTable.png" alt="Memory layout of Data.HashTable" /><p class="caption">Memory layout of Data.HashTable</p>
</div>
<p>Each arrow in the above diagram represents a pointer which, when dereferenced, can (and probably does) cause a CPU cache miss. During lookup, each time we test an entry in one of the buckets against the lookup key, we cause <em>three</em> cache lines to be loaded:</p>
<ul>
<li>one for the cons cell itself</li>
<li>one for the tuple</li>
<li>one for the key</li>
</ul>
<p>If the average bucket has <code>b</code> elements in it, the average successful lookup causes <code>1 + 3b/2</code> cache line loads: one to dereference the buckets array to get the pointer to the first cons cell, and <code>3b/2</code> to find the matching key in the buckets array. An <em>unsuccessful</em> lookup is worse, causing a full <code>1 + 3b</code> cache line loads, because we need to examine every key in the bucket.</p></div>
<div id="why-this-new-library-is-faster"><h2>Why this new library is faster</h2>
<p>The datatype inside <code>Data.HashTable.ST.Basic</code> looks like this:</p>
<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">HashTable_</span> s k v <span class="fu">=</span> <span class="dt">HashTable</span><br /> { _<span class="ot">size </span><span class="ot">::</span> <span class="ot">{-# UNPACK #-}</span> <span class="fu">!</span><span class="dt">Int</span><br /> , _<span class="ot">load </span><span class="ot">::</span> <span class="fu">!</span>(<span class="dt">U.IntArray</span> s)<br /> , _<span class="ot">hashes </span><span class="ot">::</span> <span class="fu">!</span>(<span class="dt">U.IntArray</span> s)<br /> , _<span class="ot">keys </span><span class="ot">::</span> <span class="ot">{-# UNPACK #-}</span> <span class="fu">!</span>(<span class="dt">MutableArray</span> s k)<br /> , _<span class="ot">values </span><span class="ot">::</span> <span class="ot">{-# UNPACK #-}</span> <span class="fu">!</span>(<span class="dt">MutableArray</span> s v)<br /> }</code></pre>
<p>Here, to avoid pointer indirections, I’ve flattened the keys and values into parallel arrays, and stored the hash code for every cell in an unboxed array to speed lookups. Lookup in these three parallel arrays looks like this:</p>
<div class="figure">
<img src="Data.HashTable.ST.Basic.png" alt="Memory layout of Data.HashTable.ST.Basic" /><p class="caption">Memory layout of Data.HashTable.ST.Basic</p>
</div>
<p>The coloured region represents the location of the key we are looking for. Counting the cache line loads for a typical successful lookup in this structure:</p>
<ul>
<li>one, maybe two, to find the correct hash code in the hash codes array. Note that since this is a contiguous unboxed integer array, a cache line load causes eight (sixteen on 32-bit machines) hash codes to be loaded into cache at once.</li>
<li>one to dereference the key pointer in the keys array.</li>
<li>one to dereference the key.</li>
<li>one to dereference the value pointer in the values array.</li>
</ul>
<p>Unsuccessful lookups are even faster here, since we don’t touch the keys or values arrays except on hash collisions. Astute readers will wonder why we store the hash codes in the table, because this isn’t always done. There are a couple of reasons:</p>
<ul>
<li>we can use the hash code slot to mark the cell as empty or deleted by writing a privileged value (zero or one). If we didn’t do this, we would either have to use a datatype something like <code>Maybe</code> in the keys array to distinguish empty or deleted entries, causing an extra indirection, or we would have to play tricks with <a href="http://www.haskell.org/ghc/docs/7.0.3/html/libraries/base-4.3.1.0/Unsafe-Coerce.html"><code>unsafeCoerce</code></a> and/or <a href="http://www.haskell.org/ghc/docs/7.0.3/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html#v:reallyUnsafePtrEquality-35-"><code>reallyUnsafePtrEquality#</code></a> (the linear hash table actually uses these tricks to save indirections, but they’re marked “unsafe” for a reason!)</li>
<li>For the types we really care about (specifically <a href="http://hackage.haskell.org/package/bytestring"><code>ByteString</code></a>, <a href="http://hackage.haskell.org/package/text"><code>Text</code></a>, and <a href="http://hackage.haskell.org/package/smallstring"><code>SmallString</code></a>), the <code>Eq</code> instance is <em>O(n)</em> in the size of the key, as compared to the <em>O(1)</em> machine instruction required to compare two hash codes.</li>
<li>Keeping the hash codes in an unboxed array allows us to do super-efficient branchless <a href="https://github.com/gregorycollins/hashtables/blob/1.0.0.0/cbits/cfuncs.c">cache line lookups in C</a>. Moreover, we can take advantage of 128-bit SSE4.1 instructions on processors which support them (Intel chips, Core 2 and above) to make searching cache lines for hash codes even faster.</li>
<li>I tested it both ways, and storing the hash codes was consistently faster.</li>
</ul></div>
<div id="performance-measurements"><h2>Performance measurements</h2>
<p>I ran benchmarks for lookup and insert performance of the three hash table implementations included in the hashtables library against:</p>
<ul>
<li><code>Data.HashTable</code> from the <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-HashTable.html">base</a> library</li>
<li><code>Data.Map</code> from the <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Map.html">base</a> library</li>
<li><code>Data.HashMap.Strict</code> from the <a href="http://hackage.haskell.org/packages/archive/unordered-containers/0.1.3.0/doc/html/Data-HashMap-Strict.html">unordered-containers</a> library</li>
</ul>
<p>Unfortunately I cannot release the benchmark code at this time, as it relies on an unfinished data-structure benchmarking library based on <a href="http://hackage.haskell.org/packages/criterion">criterion</a>, for which I have not yet sought permission to open-source.</p>
<p>The methodology for lookups is:</p>
<ul>
<li>create a vector of <code>N</code> random key-value pairs, where the key is a random <code>ByteString</code> consisting of ASCII hexadecimal characters, between 8 and 32 bytes long, and the value is an <code>Int</code></li>
<li>load all of the key-value pairs into the given datastructure</li>
<li>create a vector of <code>N/2</code> random successful lookups out of the original set</li>
<li>perform all of the lookups, and divide the total time taken by the number of lookups to get a per-operation timing</li>
<li>repeat the above procedure using <a href="http://hackage.haskell.org/packages/criterion">criterion</a> enough times to be statistically confident about the timings</li>
</ul>
<p>The above procedure is repeated for a doubling set of values of <code>N</code> from 250 through to 2,048,000. Note that, to be fair and to ensure that random fluctuations in the input distribution don’t influence the timings for the different data structures, each trial uses the same input set for each data structure. We also force a garbage collection run between trials to try to isolate the unpredictable impact of garbage collection runs as much as possible.</p>
<p>The methodology for inserts is similar:</p>
<ul>
<li>create a vector of <code>N</code> random key-value pairs as described above</li>
<li>time how long it takes to load the <code>N</code> key-value pairs into the given data structure. Where applicable, the data structure is pre-sized to fit the data set (i.e. for the hash tables). Note here, though, that I’m not being 100% fair to <code>Data.HashTable</code> in this test, as the <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-HashTable.html#v:newHint"><code>newHint</code></a> function wasn’t called — when I tried to use it, the benchmark took forever. I’m ashamed to say that I didn’t dig too deeply into why.</li>
</ul>
<p>The benchmarks were run on a MacBook Pro running Snow Leopard with an Intel Core i5 processor, running GHC 7.0.3 in 64-bit mode. The RTS options passed into the benchmark function were <code>+RTS -N -A4M</code>.</p>
<div id="lookup-performance"><h3>Lookup performance</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per lookup (seconds)
</td>
<td><div class="chart" id="lookupChart"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="lookup-performance-log-log-plot"><h3>Lookup performance, log-log plot</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per lookup (seconds)
</td>
<td><div class="chart" id="lookupChartLog"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="insert-performance"><h3>Insert performance</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per insert (seconds)
</td>
<td><div class="chart" id="insertChart"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="insert-performance-log-log-plot"><h3>Insert performance, log-log plot</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per insert (seconds)
</td>
<td><div class="chart" id="insertChartLog"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div></div>
<div id="performance-measurements-round-2"><h2>Performance measurements, round 2</h2>
<p>My first thought upon seeing these graphs was: “what’s with the asymptotic behaviour?” I had expected lookups and inserts for most of the hash tables to be close to flat, especially for cuckoo hash, which is guaranteed <em>O(1)</em> for lookups in the worst case. I had some suspicions, and re-running the tests with the RTS flags set to <code>+RTS -N -A4M -H1G</code> (specifying a 1GB suggested heap size) seems to confirm them:</p>
<div id="lookup-performance-1"><h3>Lookup performance</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per lookup (seconds)
</td>
<td><div class="chart" id="lookupChartH1G"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="lookup-performance-log-log-plot-1"><h3>Lookup performance, Log-log plot</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per lookup (seconds)
</td>
<td><div class="chart" id="lookupChartLogH1G"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="insert-performance-1"><h3>Insert performance</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per insert (seconds)
</td>
<td><div class="chart" id="insertChartH1G"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
</div>
<div id="insert-performance-log-log-plot-1"><h3>Insert Performance, Log-log plot</h3>
<table class="chartTable">
<tr>
<td class="yaxis" valign="middle" align="right">
Avg time per insert (seconds)
</td>
<td><div class="chart" id="insertChartLogH1G"></div></td>
</tr>
<tr>
<td></td>
<td align="middle">
Input Size
</td>
</tr>
</table>
<p>These are more or less the graphs I had been expecting to see. The main impact of setting <code>-H1G</code> is to reduce the frequency of major garbage collections, and the difference here would suggest that garbage collection overhead is what’s causing the poorer-than-expected asymptotic performance. The linear probing and cuckoo hash tables included in this library use very large boxed mutable arrays. The GHC garbage collector uses a card marking strategy in which mutable arrays carry a “card table” containing one byte per <code>k</code> entries in the array (I think here <code>k</code> is 128). When the array is written to, the corresponding entry in the card table is marked “dirty” by writing a “1” into it. It would seem that the card table is scanned during garbage collection no matter whether the array has been dirtied or not; this would account for the small linear factor in the asymptotic time complexity that we’re seeing here.</p></div></div>
<div id="conclusion"><h2>Conclusion</h2>
<p>While Haskell people prefer to use immutable/persistent data structures in their programs most of the time, there are definitely instances in which you want a mutable hash table: no immutable data structure that I know of supports <em>O(1)</em> inserts and lookups, nor is there an immutable data structure that I know of which can match a good mutable hash table for space efficiency. These factors are very important in some problem domains, especially for things like machine learning on very large data sets.</p>
<p>The lack of a really good Haskell hash table implementation has been a sticking point for quite some time for people who want to work in these problem domains. While the situation is still not as good as it might eventually be due to continuing concerns about how the GHC garbage collector deals with mutable arrays, it’s my hope that the release of the <code>hashtables</code> library will go a long way towards closing the gap.</p>
<p>The source repository for the <code>hashtables</code> library can be found on <a href="http://github.com/gregorycollins/hashtables">github</a>. Although I’ve made substantial efforts to test this code prior to release, it is a “version 1.0”. Please send bug reports to the <code>hashtables</code> <a href="https://github.com/gregorycollins/hashtables/issues">github issues page</a>.</p></div>
<div id="update-oct-21-2011"><h2>Update (Oct 21, 2011)</h2>
<p>A fellow by the name of Albert Ward has <a href="http://www.fatcow.com/edu/announcing-hashtables-bl/">translated this blog post into Bulgarian</a>.</p>
<script type="text/javascript">
var axisFont = { family: 'arial',
size: 11,
style: 'normal',
weight: 'normal',
variant: 'normal' };
function showTooltip(x, y, contents) {
$('<div id="tooltip">' + contents + '</div>').css( {
position: 'absolute',
display: 'none',
top: y + 5,
left: x + 5,
border: '1px solid #fdd',
padding: '6px',
'background-color': '#fee',
opacity: 0.95
}).appendTo("body").fadeIn(200);
}
var previousDataIndex = null;
var previousSeriesIndex = null;
function hoverText(item) {
var i = item.seriesIndex;
var j = item.dataIndex;
var series = item.series;
var row = series.data[j];
var label = series.label;
var obj = row[2];
var isz = obj['size'];
var mean = humanSeconds(obj['mean'], 3);
var sd = humanSeconds(obj['stddev'], 3);
var n95 = humanSeconds(obj['n95'], 3);
return ("<b>" + label + "</b>" +
"<br/>input size: " + isz +
"<br/>mean: " + mean +
"<br/>stddev: " + sd +
"<br/>95th percentile: " + n95);
}
function hoverFunc(event, pos, item) {
if (item) {
if (previousDataIndex != item.dataIndex ||
previousSeriesIndex != item.seriesIndex) {
previousDataIndex = item.dataIndex;
previousSeriesIndex = item.seriesIndex;
$("#tooltip").remove();
showTooltip(item.pageX, item.pageY, hoverText(item));
}
} else {
$("#tooltip").remove();
previousDataIndex = null;
previousSeriesIndex = null;
}
}
function logSeries(seriesList) {
var outSeriesList = [];
for (var i in seriesList) {
var series = seriesList[i];
var outseries = {};
outseries['label'] = series.label;
var outdata = [];
var data = series.data;
for (var j in data) {
var row = data[j];
var x = row[0];
var y = row[1];
var z = row[2];
outdata.push([Math.log(x), Math.log(y), z]);
}
outseries['data'] = outdata;
outSeriesList.push(outseries);
}
return outSeriesList;
}
function mkChart(elem, series, lbl, logPlot) {
if (logPlot) series = logSeries(series);
var opts =
{
legend: { position: 'nw' },
series: {
lines: { show: true },
points: { show: true }
},
grid: { hoverable: true },
xaxis: {
autoscaleMargin: 0.02,
font: axisFont,
label: 'Input Size'
},
yaxis: { tickFormatter: function(x) { return humanSeconds(x,1); },
font: axisFont,
label: lbl
}
};
if (logPlot) {
var l = function (x) {
return Math.log(x);
};
var il = function (x) {
return Math.exp(x);
};
opts['xaxis']['tickFormatter'] = function (x) {
return Math.exp(x).toFixed(0);
}
opts['yaxis']['tickFormatter'] = function (x) {
return humanSeconds(Math.exp(x), 1);
}
/*
opts['xaxis']['transform'] = l;
opts['xaxis']['inverseTransform'] = il;
opts['yaxis']['transform'] = l;
opts['yaxis']['inverseTransform'] = il;
*/
}
var plot = $.plot(elem, series, opts);
elem.bind("plothover", hoverFunc);
return plot;
}
$(function () {
mkChart(
$("#lookupChart"),
lookupSeries,
'Avg time per\nlookup (s)',
false);
mkChart(
$("#insertChart"),
insertSeries,
'Avg time per\ninsert (s)',
false);
mkChart(
$("#lookupChartLog"),
lookupSeries,
'Avg time per\nlookup (s)',
true);
mkChart(
$("#insertChartLog"),
insertSeries,
'Avg time per\ninsert (s)',
true);
mkChart(
$("#lookupChartH1G"),
lookupSeriesH1G,
'Avg time per\nlookup (s)',
false);
mkChart(
$("#insertChartH1G"),
insertSeriesH1G,
'Avg time per\ninsert (s)',
false);
mkChart(
$("#lookupChartLogH1G"),
lookupSeriesH1G,
'Avg time per\nlookup (s)',
true);
mkChart(
$("#insertChartLogH1G"),
insertSeriesH1G,
'Avg time per\ninsert (s)',
true);
});
</script>
</div>2011-06-11T14:15:00-0400A new Haskell library for mutable hash tables, which is several times faster than any previous Haskell associative array datatype.http://gregorycollins.net/posts/2011/05/19/lyah-reviewReview: "Learn You a Haskell for Great Good!"2011-05-19T15:40:00+0200Gregory Collinsgreg@gregorycollins.net<p>A couple of weeks ago, the good folks at <a href="http://nostarch.com/">No Starch Press</a> were kind enough to send me a review copy of <a href="http://nostarch.com/lyah.htm">“Learn You a Haskell for Great Good!”</a>, by Miran Lipovača.</p>
<div class="float-right-img">
<img src="lyah.png"/>
</div>
<p>People often perceive Haskell to be overly academic and hard to learn, and in many cases this reputation is deserved. Its history as a research language and fertile proving ground for new ideas in type systems and compiler technology have spawned hundreds (or even thousands) of very computer sciencey papers, containing what in my grad school days we used to euphemistically call “a very high ratio of Greek letters to text.”</p>
<p>In the past few years this perception has been slowly changing, as more practically-minded software engineering types have cottoned on to the indisputably useful features Haskell gives you for “real-world” programming: a super-fast native code compiler, expressive code with fewer bugs, great testing tools like <a href="http://hackage.haskell.org/package/QuickCheck">QuickCheck</a>, and many other little nuggets of pure awesome. Our community has suffered something of a bootstrapping problem, however: people hear a lot of noise about Haskell but resources for true beginners have been fairly scarce.</p>
<p>The release of <a href="http://www.realworldhaskell.org/">Real World Haskell</a> a couple of years ago went a long way towards making Haskell more accessible to the curious, but that book is most suitable for the intermediate-level practitioner, or for experienced programmers who already know other languages. What’s been missing is a gentle, clearly written guide covering the basics. I think “Learn You a Haskell” might be the book to change all that.</p>
<p>The thing that’s most impressive about “Learn You a Haskell” is how damned <em>well-written</em> it is. I’m assuming that English isn’t Miran’s mother tongue, and he deserves lots of kudos for creating such a clear and easy-to-read text in a foreign language. The other thing Miran does very well in this book is <em>not</em> overwhelming the beginner with a lot of technical verbiage or jargon: I was especially chuffed to see that the word “monad” doesn’t even appear in the book until page 267! After having suffered through innumerably many clumsy “monad tutorials” which go through the material all backwards, it’s completely satisfying to see it finally be done <em>properly</em>.</p>
<p>The book starts off on the right foot by starting with expressions, the fundamental building blocks of Haskell programs. Slowly, thoroughly, and with lots of examples, Miran proceeds from there to cover the rest of the concepts: functions, types, recursion, higher-order functions, lazy evaluation, type classes, algebraic data types, I/O — all are covered methodically and clearly, without any confusing discontinuities or weird conceptual leaps. By the time the book digs into more advanced topics like applicative functors and monads, I believe the beginning reader will have built up enough confidence to clear the conceptual hurdles.</p>
<p>To be clear, this is a beginner’s book: experienced Haskell programmers won’t find much to dig into here, and those expecting to find a lot of information about the practical aspects and best practices of “real world” day-to-day Haskell programming might be disappointed. As a beginner’s guide, however, this book is just fantastic, and those spending the time to read “Learn You a Haskell” and “Real World Haskell” in sequence will find a clear and easy path from the very beginning all the way through to the intermediate level of Haskell programming.</p>2011-05-19T15:40:00+0200A review of "Learn You a Haskell for Great Good!", by Miran Lipova&ccaron;a.http://gregorycollins.net/posts/2010/05/30/snap-framework-updateSnap Framework: What's new this week?2010-05-30T14:00:00-0400Gregory Collinsgreg@gregorycollins.net<p>Hi all,</p>
<p>Since we put out <a href="http://snapframework.com/">the Snap framework</a> last weekend, we’ve been working like busy beavers on squashing correctness and performance bugs. Updated haddocks/etc should be up on our website by tomorrow afternoon. Here’s a short list of the changes in Snap this week:</p>
<ul>
<li><p><strong>WINDOWS SUPPORT</strong> thanks to Jacob Stanley (a.k.a. “jystic”).</p></li>
<li><p>A fix for a grave performance bug with <code>Transfer-encoding: chunked</code>; we weren’t buffering its input, causing lots of tiny http transfer chunks for certain pathological input, ruining performance. (This is the one <a href="http://www.snoyman.com/blog/entry/bigtable-benchmarks/">Michael Snoyman reported</a> btw.) Switching to buffering its input increased performance on this test by at least an order of magnitude.</p></li>
<li><p>Huge improvements to the <code>libev</code> backend for <code>snap-server</code>, including fixing a correctness/hang bug and an edge-/level-triggering issue. Performance should be improved to the point where the <code>libev</code> backend should be considered the “go-to” setup for production <code>snap-server</code> deployments.</p></li>
<li><p>Improved timeout handling in the “simple”/stock haskell <code>snap-server</code> backend. This costs us some performance on the stock backend, but correctness is more important (and users wanting maximum performance should stick with the <code>libev</code> backend).</p></li>
<li><p>Fixed an <code>attoparsec-iteratee</code> bug that resulted in spurious “parser did not produce a value” messages cluttering <code>error.log</code>.</p></li>
<li><p>Fixed a localtime/GMT timezone bug which prevented static files from being recognized as “not modified.”</p></li>
<li><p>Fixed an HTTP cookie reading bug in <code>snap-server</code>.</p></li>
<li><p>Killed several space leaks.</p></li>
<li><p>Fixes to the way Snap handles <code>accept-encoding</code> headers in the GZip code — requests from Konqueror and Links are no longer incorrectly rejected.</p></li>
<li><p>The <code>snap</code> command-line tool now has an option to not depend on heist.</p></li>
<li><p>Exposed error logging to the <code>Snap</code> monad.</p></li>
<li><p>..and a whole host of smaller additions/improvements….</p></li>
</ul>2010-05-30T02:42:00-0400Summarizes the changes between snap-core/-server v0.1.1 (released last week) and snap-core v0.2.5 (released today).http://gregorycollins.net/posts/2010/05/22/announce-snap-frameworkAnnouncing: Snap Framework v0.12010-05-22T01:25:00-0400Gregory Collinsgreg@gregorycollins.net<p>To coincide with <a href="http://www.haskell.org/haskellwiki/Hac_%CF%86">Hac Phi 2010</a>, the Snap team is happy to announce the first public release of the Snap Framework, a simple and fast Haskell web programming server and library for unix systems. For installation instructions, documentation, and more information, see our website at <a href="http://snapframework.com/">snapframework.com</a>.</p>
<p>Snap is well-documented and has a test suite with a high level of code coverage, but it is early-stage software with still-evolving interfaces. Snap is therefore most likely to be of interest to early adopters and potential contributors.</p>
<p>Snap is BSD-licensed and currently only runs on Unix platforms; it has been developed and tested on Linux and Mac OSX Snow Leopard.</p>
<p>Snap Features:</p>
<ul>
<li><p>A simple and clean monad for web programming, similar to happstack’s but simpler.</p></li>
<li><p>A <em>fast</em> HTTP server library with an optional high-concurrency backend (using libev).</p></li>
<li><p>An XML-based templating system for generating xhtml that allows you to bind Haskell functionality to XML tags in your templates.</p></li>
<li><p>Some useful utilities for web handlers, including gzip compression and fileServe.</p></li>
<li><p>Iteratee-based I/O, allowing composable streaming in O(1) space without any of the unpredictable consequences of lazy I/O.</p></li>
</ul>
<p>If you have questions or comments, please contact us on our <a href="http://mailman-mail5.webfaction.com/listinfo/snap">mailing list</a> or in the <a href="http://webchat.freenode.net/?channels=snapframework&amp;uio=d4">#snapframework</a> channel on the freenode IRC network.</p>2010-05-22T01:25:00-0400The first public release of the Snap Framework is now available. Snap is a simple and fast web development framework for unix systems, written in the Haskell programming language.