literal thoughtshttp://blog.nix.is/index.xml
Recent content on literal thoughtsHugo -- gohugo.ioen-usSun, 01 Mar 2015 20:30:00 +0000perl6.vim gets more lovehttp://blog.nix.is/perl6-vim-gets-more-love
Sun, 01 Mar 2015 20:30:00 +0000http://blog.nix.is/perl6-vim-gets-more-love<p>tl;dr: Some tuits came my way, so Vim&rsquo;s syntax highlighting of Perl 6 is
much better and faster now. Try it!
</p>
<blockquote>
<p>&lt;timotimo&gt; hoelzro: i just skimmed the perl6 vim syntax and &hellip; oh crap</p>
<p>&lt;timotimo&gt; there&rsquo;s just no way i could even make a dent in that thing</p>
<p>* psch usually turns off highlighting in vim because it&rsquo;s so slow :/</p>
<p>&lt;sorear&gt; perl 6 is not really syntax-highlightable</p>
</blockquote>
<p>So&hellip;yeah, perl6.vim is a bit of a beast.</p>
<h3 id="recent-history-of-perl6-vim">Recent history of perl6.vim</h3>
<p>Sometime in 2008 or so, I got interested in Perl 6, and I really wanted it
to be nicely higlighted in my editor (Vim) as I learned. As it turned out,
there was already a Vim syntax file for it, in the
<a href="http://svn.openfoundry.org/pugs/util/perl6.vim">Pugs SVN</a> repository. This
was mostly the work of <code>lpalmer++</code> and <code>moritz++</code>. It worked decently, but
as I learned about more nooks and crannies of Perl 6 (there sure are enough
of them) I started finding the limitations of the highlighting to be
frustrating.</p>
<p>I wanted to make it better, so I started learning all about Vim&rsquo;s
regexes and syntax highlighting support. I began hacking on <code>perl6.vim</code>,
and soon enough it grew to be the largest Vim syntax file in existence.
Perl 6 is hard enough to parse already. Having to do so with Vim patterns,
doubly so. I made some progress but eventually the performance of the
highlighting suffered as it grew to support ever more complicated syntax.</p>
<p>In 2009, <code>alester++</code> created the vim-perl repository on Github to consolidat
Perl support for Vim. <code>perl6.vim</code> gained a new home shortly thereafter
when he imported it from the Pugs repo. Around the same I discovered
<a href="https://metacpan.org/pod/Text::VimColor">Text::VimColor</a> and used it to
create a test harness for the syntax highlighting of <code>perl.vim</code> and
<code>perl6.vim</code>. I fixed a few Perl 5 highlighting bugs and added some tests
for them, but I never got around to doing the same for Perl 6.</p>
<p>Since then, a few contributors (<code>hoelzro++</code> et al) have continued to fix
bugs in <code>perl6.vim</code>. In the meantime, Perl 6 has also evolved, and some
of the assumptions <code>perl6.vim</code> has made are no longer valid.</p>
<p>A few weeks ago I went to FOSDEM 2015, where Larry announced that this
Christmas would be The One™. That rekindled my interest in <code>perl6.vim</code>,
and I&rsquo;d like to share with you some of the progress I&rsquo;ve made with it.</p>
<h3 id="performance">Performance</h3>
<p>Vim 7.4 ships with a <a href="http://vimhelp.appspot.com/syntax.txt.html#%3Asyntime">profiler</a>
for its syntax highlighting. I used it to identify a lot of inefficient
patterns in the syntax file. I tried to reduce the use of zero-width
assertions (lookarounds) and to make more use of
<a href="http://vimhelp.appspot.com/syntax.txt.html#%3Asyn-nextgroup"><code>nextgroup</code></a>
instead when possible. I also optimized many patterns to reject matches as
soon as possible. As a result, Vim now needs to attempt much fewer matches,
and the ones it does attempt are taking less time to do their work.
Editing most Perl 6 files feels pretty snappy now, even
<a href="https://github.com/perl6/std/blob/master/STD.pm6">STD.pm6</a>.</p>
<h3 id="identifiers">Identifiers</h3>
<p>In <code>ftplugin/perl6.vim</code>, I added the apostrophe and high-bit alphabetical
characters (<code>æóð...</code>) to the list of characters vim recognizes in keywords
(<a href="http://vimhelp.appspot.com/options.txt.html#%27iskeyword%27"><code>iskeyword</code></a>)
This allows all valid Perl 6 identifiers to be used in Vim commands that
depend on matching keywords.</p>
<p>I also cleaned up the highlighting of Perl 6 idenfifiers, in particular
adding support for the high-bit characters and better support for dashes,
so these are now highlighted correctly in sub/package/variable/etc names.</p>
<h3 id="multiline-comments">Multiline comments</h3>
<p>These have been updated to recognize the new <code>#&#x60;</code> prefix,
and to be allowed at the beginning of a line. Previously, multiline comments
with stacked delimiters (<code>««</code>, <code>&lt;&lt;</code>, <code>&lt;&lt;&lt;</code>, etc) were not highlighted by
default. Some profiling showed that doing so does not have a noticable
performance impact, so I changed it to always highlight them.</p>
<h3 id="heredocs">Heredocs</h3>
<p>I added highlighting of heredocs with <code>qto</code>, <code>qqto</code>, <code>q:to</code>, <code>qq:to</code>,
<code>q:heredoc</code>, and <code>qq:heredoc</code>, for most commonly used delimiters.</p>
<h3 id="setting-functions-and-methods">Setting functions and methods</h3>
<blockquote>
<p>* TimToady wishes all these highlighters wouldn&rsquo;t treat setting functions and methods as reserved words; they&rsquo;re just functions and methods`</p>
<p>&lt;moritz&gt; TimToady: agreed</p>
<p>* flussence wishes perl6.vim didn&rsquo;t highlight Test.pm functions as reserved words either&hellip;</p>
</blockquote>
<p>By popular request, I removed highlighting of builtin functions and
methods. This makes sense since there&rsquo;s no way to avoid highlighting
user-defined methods that happen to share the name of a builtin one.</p>
<h3 id="metaoperators">Metaoperators</h3>
<p>Highlighting these is one of the trickiest jobs of the syntax file. Yet
it&rsquo;s one of the most crucial because if you let e.g. a stray <code>R&lt;</code> or
<code>[/</code> go unhighlighted, they will screw up the highlighting of the rest
of the file by starting a string where there isn&rsquo;t one.</p>
<p>I managed to make great improvements to the highlighting of reduce
(<code>[+]</code>), hyper (<code>»+«</code>), post-hyper (<code>@foo».bar</code>), reverse/cross/sequence/zip
(<code>Rdiv</code>, <code>R&lt;</code>, etc), and set operators (<code>(&lt;)</code>, <code>(&gt;=)</code>). All the spec tests
for metaoperators I have looked at are now highlighted without issues.</p>
<h3 id="strings-and-patterns">Strings and patterns</h3>
<p>Another part of the language that presents a challenge is <code>/foo/</code>,
<code>&lt;bar&gt;</code>, <code>&lt;&lt;baz&gt;&gt;</code>, and <code>«quux»</code>. Determining when these are not actually
numeric division, less-than, or a hyperoperators is tricky. These things
are now matched much more accurately. I also added support for the <code>qqx</code>
and <code>qqw</code> operators.</p>
<h3 id="grammars">Grammars</h3>
<p>Grammars are a whole other language in their own right, and this is where
there is still the most room for improvement (of both performance and
accuracy) in the syntax file. I still don&rsquo;t know grammars well enough,
but I&rsquo;ve fixed all the highlighting issues I&rsquo;ve found. STD.pm6 in its
entirety is now highlighted properly, although Vim can still get confused
occasionally until you scroll a bit or tell it to redraw the screen.</p>
<h3 id="pod">Pod</h3>
<p>The biggest change here is that I added highlighting of indented Pod
blocks. As part of the indentifier improvements mentioned above, Pod
block names containing hyphens or high-bit alphabetical chars are now
correctly highlighted.</p>
<h3 id="folding">Folding</h3>
<p>I added rudimentary support for syntax-based folding (disabled by default).</p>
<h3 id="miscellaneous">Miscellaneous</h3>
<ul>
<li>I improved highlighting of numbers and version literals</li>
<li>Fixed a few issues where interpolated things weren&rsquo;t highlighted as such</li>
<li>Made big improvements to highlighting of transliteration (<code>tr///</code>,
<code>tr{}{}</code>, etc) operators</li>
<li>Added highlighting of bare/anonymous sigils in more places</li>
<li>Highlighted the Unicode set operators (<code>∈</code>, <code>≽</code>, et al)</li>
<li>Certain keywords (like <code>die</code>, <code>state</code>, etc) are no longer highlighted where
they shouldn&rsquo;t be, e.g. as method calls (<code>$foo.state</code>)</li>
</ul>
<h3 id="testing">Testing</h3>
<p>Of course, I added tests for every improvement and bugfix listed above, so
at least the highlighting should not be getting <em>worse</em> from now on.</p>
<h3 id="end">__END__</h3>
<p>All open <a href="https://github.com/vim-perl/vim-perl">vim-perl</a> Github issues
relating to Perl 6 highlighting bugs have been resolved. If you discover
any more problems, please open new issues.</p>
<p>I tried patching <a href="https://github.com/perl6/doc">perl6/doc</a> to add an option
to use Text::VimColor to highlight code blocks in the documentation. While
it does result in richer and more accurate highlighting than the default
pygments-based highlighter, it is painfully slow to generate. This is because
Text::VimColor (and <a href="http://ftp.vim.org/vim/runtime/syntax/2html.vim"><code>2html.vim</code></a>,
on which it is based) extracts Vim&rsquo;s highlighting by calling the
<a href="http://vimhelp.appspot.com/eval.txt.html#synID%28%29"><code>synID()</code></a> function
at every character position in the file, which is very slow. So while it may
only take Vim a few dozen milliseconds to highlight an entire file,
extracting the highlighting state into something usable by an external
tool takes several <em>seconds</em>. I wonder how difficult it would be to patch Vim
(or NeoVim) to export an entire file&rsquo;s highlighting in a more efficient
manner&hellip;</p>
<p>To close, here is a sample of some meaningless code that used to get
<code>perl6.vim</code> confused but just looks nice now:</p>
<p><link rel="stylesheet" type="text/css" href="http://blog.nix.is/css/vim_syntax.css"></p>
<div class="vimcolor"><pre><span class="synKeyword">class</span> Johnny's::Super-Cool::Module<span class="synOperator">;</span>
<span class="synSpecial">my</span> <span class="synIdentifier">$cööl-páttérn</span> <span class="synComment">#`[why hello there]</span> <span class="synOperator">=</span> <span class="synDelimiter">/</span><span class="synString">foo</span><span class="synSpecialChar">+</span><span class="synDelimiter">/</span><span class="synOperator">;</span>
<span class="synKeyword">sub</span> infix<span class="synOperator">:</span><span class="synDelimiter">«</span><span class="synString">foo</span><span class="synDelimiter">»</span> { }
<span class="synKeyword">sub</span> foo-bar (<span class="synType">Int</span> <span class="synIdentifier">$a</span> <span class="synPreCondit">where</span> <span class="synOperator">*</span> <span class="synOperator">&gt;</span> <span class="synNumber">3</span><span class="synOperator">,</span> <span class="synIdentifier">@bar</span><span class="synOperator">,</span> <span class="synType">Str</span> <span class="synIdentifier">$b</span> <span class="synOperator">=</span> <span class="synDelimiter">'</span><span class="synString">foo</span><span class="synDelimiter">'</span>) {
<span class="synSpecial">my</span> <span class="synIdentifier">$document</span> <span class="synOperator">=</span> <span class="synDelimiter">qto/END/</span><span class="synString">;</span>
<span class="synString"> you there</span>
<span class="synDelimiter"> END</span>
<span class="synSpecial">my</span> <span class="synIdentifier">$document</span> <span class="synOperator">=</span> <span class="synDelimiter">qq</span><span class="synOperator">:</span><span class="synString">heredoc</span><span class="synDelimiter">/END/</span><span class="synString">;</span>
<span class="synString"> and </span><span class="synIdentifier">$b</span>
<span class="synDelimiter"> END</span>
<span class="synSpecial">my</span> <span class="synIdentifier">$result</span> <span class="synOperator">=</span> <span class="synDelimiter">qqx/</span><span class="synString">ls -l bla</span><span class="synDelimiter">/</span><span class="synOperator">;</span>
<span class="synSpecial">my</span> <span class="synIdentifier">@bla</span> <span class="synOperator">=</span> <span class="synOperator">[+]</span> <span class="synIdentifier">@bar</span><span class="synOperator">;</span>
<span class="synSpecial">my</span> <span class="synIdentifier">@list</span> <span class="synOperator">=</span> <span class="synIdentifier">@bla</span> <span class="synOperator">Z,</span> <span class="synIdentifier">@bar</span><span class="synOperator">;</span>
say <span class="synDelimiter">&quot;</span><span class="synString">yes</span><span class="synDelimiter">&quot;</span> <span class="synConditional">if</span> <span class="synIdentifier">$set1</span> <span class="synOperator">(&lt;)</span> <span class="synIdentifier">$set2</span><span class="synOperator">;</span>
<span class="synStatement">=for</span> <span class="synType">Pro-tip</span>
<span class="synComment"> Pay careful attention to the following code</span>
<span class="synConditional">if</span> <span class="synIdentifier">$a</span> <span class="synOperator">!&lt;</span> <span class="synNumber">4</span> {
<span class="synIdentifier">@bar</span><span class="synOperator">».</span>bla<span class="synOperator">:</span> <span class="synDelimiter">&quot;</span><span class="synString">this </span><span class="synIdentifier">&amp;nice-func</span>()<span class="synString"> thing</span><span class="synDelimiter">&quot;</span><span class="synOperator">;</span>
}
<span class="synSpecial">my</span> <span class="synIdentifier">$bin</span> <span class="synOperator">=</span> <span class="synNumber">-0</span><span class="synSpecial">b</span><span class="synNumber">01010</span><span class="synOperator">;</span>
(<span class="synNumber">1</span><span class="synOperator">,</span><span class="synNumber">2</span>)[<span class="synOperator">*/</span><span class="synNumber">2</span>]<span class="synOperator">;</span>
say <span class="synIdentifier">$@foo</span><span class="synOperator">.</span>perl<span class="synOperator">;</span>
<span class="synDelimiter">m</span><span class="synOperator">:</span><span class="synString">s</span><span class="synDelimiter">/</span><span class="synString">foo</span><span class="synDelimiter">/</span><span class="synIdentifier">$bar</span><span class="synOperator">/;</span>
}
<span class="synKeyword">sub</span> foo (<span class="synType">Int</span><span class="synOperator">:</span><span class="synString">D</span> <span class="synIdentifier">$bla</span>) <span class="synPreCondit">is</span> <span class="synTag">cached</span> { }
</pre>
</div>Testing vim syntax fileshttp://blog.nix.is/testing-vim-syntax-files
Tue, 01 Sep 2009 13:13:00 +0000http://blog.nix.is/testing-vim-syntax-files<p>Yesterday, Andy Lester opened an <a href="http://github.com/petdance/vim-perl/issues#issue/15">issue</a>
for vim-perl on github about adding an automated test suite. I&rsquo;ve thought
about doing something like this before, so last night got busy with
prototyping a test harness.
</p>
<p>What I&rsquo;ve got so far (in <a href="https://github.com/hinrik/vim-perl/tree">my fork</a>)
is a test file that uses Text::VimColor to generate HTML and compare it
against a reference HTML document to determine if the syntax file is doing its
job. If a reference file can&rsquo;t be found, it will create it and skip that
test. Here&rsquo;s what it looks like:</p>
<pre><code>$ make test
prove -rv t
t/01_highlighting.t ..
ok 1 - Correct output for t_source/perl/basic.t
ok 2 # skip Created t_source/perl/advanced.t.html
ok 3 - Correct output for t_source/perl6/basic.t
1..3
ok
All tests successful.
</code></pre>
<p>In case of failure, it will use Test::Differences to show you what&rsquo;s wrong,
and write the incorrect output to disk for you to inspect:</p>
<pre><code>$ vim syntax/perl6.vim # make a bad change
$ make test
prove -rv t
t/01_highlighting.t ..
ok 1 - Correct output for t_source/perl/basic.t
ok 2 - Correct output for t_source/perl/advanced.t
not ok 3 - Correct output for t_source/perl6/basic.t
# Failed test 'Correct output for t_source/perl6/basic.t'
# at t/01_highlighting.t line 77.
#
«output from Test::Differences showing the offending lines»
# You can inspect the incorrect output at t_source/perl6/basic.t_fail.html
1..3
# Looks like you failed 1 test of 3.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
</code></pre>
<p>The only big downside to this is that <code>01_highlighting.t</code> tests all the source
files in one go. You currently can&rsquo;t tell it to only test one specific file.</p>More docs addedhttp://blog.nix.is/more-docs-added
Mon, 17 Aug 2009 23:52:00 +0000http://blog.nix.is/more-docs-added<p>I&rsquo;ve added Jonathan Scott Duff&rsquo;s introduction to Perl 6 regexes to Perl6-Doc
as <code>perlreintro</code>. In addition I put in a first draft of a <code>perlobjintro</code>.
Neither of these correspond to names from the set of Perl 5 man pages, which
is fitting since they are not directly based on them. So, Perl6-Doc now has
4 man pages, the aforementioned two as well as <code>perlintro</code> and <code>perlsyn</code>. I&rsquo;ll
be carefully adding to and polishing these in the near future. I&rsquo;m hesitant to
start on man pages relating to modules, IO, laziness/iterators, and other very
important features, as the specs (and implementation) for those are still
shifting heavily from month to month.
</p>
<p>I&rsquo;ve been thinking about using the Perl Table Index as initial source material
for the magical syntax recognition feature, the implementation of which has
remained elusive for a while. The current implementation is just a proof-of-
concept with only a few terms as of yet. However, Damian Conway just published
a significant update to Synopsis 26 on Perl documentation, focused on tying
docs to code, so it might be time to figure out some Pod-ish source format for
the terms to be stored in. Writing Pod is a breeze, so it should be easy to
contribute new terms.</p>
<p><code>grok</code> also needs more love. I&rsquo;m not as happy as I&rsquo;d like to be with some of
its implementation details, most notably how it picks out functions from
Synopses 29 and 32, using simple regexes (to be fair, this is exactly what
good ol&rsquo; <code>perldoc -f</code> does). It should parse the Pod and pick out parts of the
document tree instead, I think. There are also still a few visual differences
between textual output of Pod 5 and Pod 6 that I have to address. Pod 5
textual output is pleasingly indented (gradually depending on the current
heading level), which makes it very easy to follow, while Pod 6 is not.</p>
<p>As a sidenote, I believe I&rsquo;ve found what was causing intermittent FAIL reports
from CPAN testers. Some machines had a pretty old version of Pod::Simple which
didn&rsquo;t take well to subclassing. Another failure was caused by an old
Pod::Parser (deprecated, but relied on by Pod::Xhtml) version which didn&rsquo;t
recognize <code>=encoding</code> directives. The relevant distributions now depend on
newer versions of those troublesome modules.</p>Tagging, FAIL, regexeshttp://blog.nix.is/tagging-fail-regexes
Wed, 29 Jul 2009 19:03:00 +0000http://blog.nix.is/tagging-fail-regexes<p>I&rsquo;m currently working on making <code>grok</code> index all <code>X&lt;&gt;</code> (and maybe <code>C&lt;&gt;</code>) tags
in Pod documents. The user will then be able to look them up and see which
docs contain the search term in question.
</p>
<p>Gábor Szabó has already done <a href="http://perlcabal.org/syn/index_X.html">something like this</a>
for the Perl 6 Synopses. His implementation only looked for formatting codes
using the standard <code>&lt;&gt;</code> format, but I just patched it to look for <code>&lt;&lt; &gt;&gt;</code>,
<code>&lt;&lt;&lt; &gt;&gt;&gt;</code>, and <code>«»</code> tags as well. Ideally I would like to use a Pod parser to
get the tags, but it&rsquo;s just using regexes for now. Gábor also asked me to
have <code>grok</code> look up the predefined subrules from Synopsis 5, which I think
would be a good addition.</p>
<p>For a while, I&rsquo;ve been getting some rare <a href="http://static.cpantesters.org/distro/G/grok.html">FAIL reports</a>
about <code>grok</code> from CPAN testers. It seems to be two problems that always
manifest themselves at the same time. One of them is obviously because the
tester has an old version of Pod::Parser which doesn&rsquo;t understand the
<code>=encoding</code> directive, but the other one is more mysterious. I&rsquo;ll have to do
some more digging, as I haven&rsquo;t been able to reproduce it on my machines.</p>
<p>Jonathan Scott Duff pointed me to his unpublished
<a href="http://feather.perl6.nl/~duff/articles/perl6/p6-regex.pod">introduction</a> to
Perl 6 regexes which I could use as a starting point for a <code>perlretut</code> man
page. I haven&rsquo;t got it into a releasable state yet, but it will soon.</p>Grok refactorhttp://blog.nix.is/grok-refactor
Fri, 24 Jul 2009 13:30:00 +0000http://blog.nix.is/grok-refactor<p>As the title suggests, I reorganized the <code>grok</code> a bit, in addition to adding a
few new features.
</p>
<p>The code is now more modular, and much easier to maintain. I added support for
looking up individual functions/methods from all the chapters of Synopsis 32.
In other news, Perl6::Doc 0.42 is just out, which includes first drafts of
<code>perlintro</code> and <code>perlsyn</code> documents, which <code>grok</code> will now find.</p>
<p>This brings the total number of things known to <code>grok</code> up by almost a hundred:</p>
<pre><code>$ grok -i|wc -l
613
</code></pre>On to the man pageshttp://blog.nix.is/on-to-the-man-pages
Wed, 22 Jul 2009 18:31:00 +0000http://blog.nix.is/on-to-the-man-pages<p>This week I&rsquo;ve been working on the <code>perlintro</code> document found in the pugs
repository, as well as porting Perl 5&rsquo;s <code>perlsyn</code>. These are the most &ldquo;basic&rdquo;
man pages about the language, and should be ported first (especially since
many of the more specific bits in Perl 6 are still in flux).
</p>
<p>As for <code>perlintro</code>, I can see a few more-than-trivial things that need
changing. For one thing, the introduction to regular expressions is very
casual, but Perl 6&rsquo;s regex/grammar support has seen a major overhaul. Simply
translating the examples to the Perl 6 equivalent seems to just make them
longer, which gives the wrong impression. Focusing less on quick-and-dirty
examples and more on the big picture might help the newcomer here, as grammars
are integral to Perl 6, not just some special strings with weird syntax as
they were in ol&rsquo; Perl 5.</p>
<p>A lot of special cases and &ldquo;warts&rdquo; have been replaced with new, exciting
concepts that show up again and again in Perl (laziness, autothreading, all
blocks are closures, everything is an object) that need to be touched upon.
You could say that in Perl 6, the harmony / idiosyncrasy ratio has been
lowered, and that&rsquo;s a very <strong>good</strong> thing. Some of these can be saved for
<code>perlsyn</code>, but nonetheless, they need to be introduced well.</p>
<p>In other news, I noticed that <a href="http://perldoc.perl.org">perldoc.perl.org</a> has
seen a facelift. I wonder if the software behind it might be used to power a
Perl 6 doc site sometime in the future.</p>Lucky 0.13http://blog.nix.is/lucky-013
Thu, 16 Jul 2009 06:42:00 +0000http://blog.nix.is/lucky-013<p>I just uploaded <code>grok</code> 0.13 to PAUSE. It has the things I mentioned in my last
post, plus some bug fixes.</p>
<p><code>grok</code> can look up quite a few things now (527 including documents such as
Synopses), but many of the answers it provides lack thoroughness. According to
my project schedule, it&rsquo;s time to start writing new documentation. That means
I can focus on making these answers better.
</p>
<p>I should also write/update a tutorial pretty soon, though I&rsquo;m not exactly sure
where I&rsquo;ll start. I could update the
<a href="http://svn.pugscode.org/pugs/docs/Perl6/Tutorial/perlintro.pod6"><code>perlintro</code></a>
in the Pugs repository and add it to Perl6::Doc, or I could use the <a href="http://svn.pugscode.org/pugs/docs/tutorial/">Perl 6
part</a> of the book <em>Perl 6 and
Parrot Essentials</em> which was donated to the Perl Foundation by O&rsquo;Reilly. That
one is a bit longer, so I might just use it for a more detailed introduction
(something like Perl 5&rsquo;s <code>perlsyn</code>).</p>
<p>P.S. Since I&rsquo;ve got the Pod 5 ANSI-color renderer mostly working now, users of
Perl 5&rsquo;s <code>perldoc</code> who like colors might want to install Pod::Text::Ansi
and put the following in their <code>~/.bashrc</code> (or equivalent):</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #204a87">export </span><span style="color: #000000">PERLDOC</span><span style="color: #ce5c00; font-weight: bold">=</span><span style="color: #4e9a06">&quot;-MPod::Text::Ansi&quot;</span>
</pre></div>Half way therehttp://blog.nix.is/half-way-there
Sat, 11 Jul 2009 01:54:00 +0000http://blog.nix.is/half-way-there<p>With Summer of Code&rsquo;s mid-term evaluations coming up, some interesting things
are about to happen to <code>grok</code>. I was contacted by Herbert Breunung, author of
<a href="http://search.cpan.org/dist/Perl6-Doc/">Perl6::Doc</a> (formerly
Perl6::Bible), which is a project that shares some of <code>grok</code>&rsquo;s goals.
</p>
<p>He told me that he doesn&rsquo;t have enough time to maintain his project anymore
and would like some sort of merge to happen. I said I&rsquo;d look into it. His
project bundles all the Synopses, Apocalypses, Exegeses, and some other Perl 6
documentation, and ships with a <code>perldoc</code> wrapper for reading them. What I
would like to do is to move all documentation out of the <code>grok</code> distribution
and update/add more to Perl6::Doc while eliminating the <code>perldoc</code> wrapper
(in favor of <code>grok</code>).</p>
<p>He also brought to my attention the <a href="http://www.perlfoundation.org/perl6/index.cgi?perl_table_index">Perl Table Index</a>,
a sort of Perl 6 glossary. Ahmad Zawawi has <a href="http://padre.perlide.org/trac/changeset/5994">just patched</a>
his <a href="http://search.cpan.org/dist/Padre-Plugin-Perl6/">Padre::Plugin::Perl6</a> to
feed this index to <code>grok</code>. I will probably port the Perl Table Index to the
Perl6::Doc distribution and make <code>grok</code> look things up in it.</p>
<p>I&rsquo;ve started writing a Pod::Text subclass as well as amending
Perl6::Perldoc::To::Ansi to conform to a consistent color scheme when using
the default ANSI-colored output.</p>
<p>Something which <code>grok</code> should eventually be able to do is recognize arbitrary
Perl 6 syntax (at very fine level of granularity, that is) and tell you what
it means. A first stab at this will be to simply include a table of some
common ones and look those up. Stuff like <code>my</code>, <code>+</code>, and so on. Doing this
reliably is the original inspiration for the u4x project (Userdocs for
Christmas), of which <code>grok</code> is a part.</p>
<p>I hope to make a release of <code>grok</code> and Perl6::Doc shortly which will include
all of the above.</p>grok 0.09 is outhttp://blog.nix.is/grok-009-is-out
Wed, 01 Jul 2009 19:33:00 +0000http://blog.nix.is/grok-009-is-out<p>The code is starting to take a more stable form. I&rsquo;ve prepended an underscore
to all private/internal subroutines and documented the rest. Perl authors
wishing to use <code>grok</code>&rsquo;s functionality will now have an easier time doing so.
</p>
<p>I&rsquo;ve also added various author tests to the distribution to help keep the code
in shape (or at least remind me when it&rsquo;s not), notably Test::Pod,
Test::Pod::Coverage, and Test::Perl::Critic.</p>
<p>Since my last blog post, <code>grok</code> has gained a few features. It can print the
name of the target file (like <code>perldoc -l</code>), print an index of known
documentation files, output xhtml, and detect whether the target file has Pod
5 or Pod 6 in it. It&rsquo;s also got some Win32 fixes and more informative error
messages.</p>
<p>Pretty soon I will start bringing in some more docs to bundle with it.
Tutorials and such. I will also most likely implement function documentation
lookup (like <code>perldoc -f</code>) in grok.</p>grok updatehttp://blog.nix.is/grok-update
Fri, 26 Jun 2009 05:54:00 +0000http://blog.nix.is/grok-update<p>Some things have kept me very busy lately and I&rsquo;m a bit behind on my GSoC
schedule. I&rsquo;m starting to catch up now, though.
</p>
<p>First of all, as of version 0.05, you can now easily install <code>grok</code> from the
command line on most operating systems like so:</p>
<pre><code>$ cpanp -i App::Grok
</code></pre>
<p>It can handle Pod 5 files now (via Pod::Text) as well (which most of the
Perl 6 Synopses are written in). The ANSI-colored output looks a bit different
than the Pod 6 output because Pod::Text is very conservative in its use of
colors. I&rsquo;ll probably make them more similar later to keep it consistent.</p>
<p>When called interactively, <code>grok</code> now uses your system&rsquo;s pager to view the
output by default.</p>
<p>It will now treat an argument as a Pod 6 file to read if said argument doesn&rsquo;t
match a known documentation target. As for the targets, the only known ones so
far are the synopses (which are bundled with <code>grok</code> for now), so you can do
things like:</p>
<pre><code>$ grok s02
$ grok s32-rules
</code></pre>First GSoC posthttp://blog.nix.is/first-gsoc-post
Wed, 27 May 2009 17:25:00 +0000http://blog.nix.is/first-gsoc-post<p>I officially started on my Google Summer of Code project (project details
<a href="http://nix.is/gsoc/">here</a>) last weekend.</p>
<p>I&rsquo;ve been tasked with writing a <code>perldoc</code> equivalent for Perl 6. I&rsquo;ve decided
to write it in Perl 5 for now, since it&rsquo;s already got Perl6::Perldoc, which
is a fast and feature-complete parser for the Perl 6 version of Pod (see
<a href="http://perlcabal.org/syn/S26.html">specification</a>), as well as lots of other
useful CPAN modules which I won&rsquo;t have to rewrite in Perl 6 (yet).
</p>
<p>The program will be called <code>grok</code>
(<a href="http://github.com/hinrik/grok">repository</a>). So far it&rsquo;s just a barebones
command-line reader for Pod 6, but I did create an ANSI-colored terminal
renderer for Perl6::Perldoc. It might be too colorful for some people&rsquo;s
taste currently, so maybe I&rsquo;ll tone it down a little. I think Ruby&rsquo;s <code>ri</code>
documentation reader is the only such tool which colors the output, and I
thought it was a nice enough feature (one of many!) to copy. I&rsquo;ve also been
looking at <code>pydoc</code> and <code>javadoc</code> in search of interesting features to
implement.</p>
<p>The plan is to make it more modern and extensive than <code>perldoc</code>. One
interesting thing I might end up doing is making use of STD (the standard
Perl 6 grammar) to parse arbitrary syntax, so you could do stuff like <code>grok
'[*]'</code> and the program would tease the expression apart and show you
documentation for both the <code>[]</code> and the <code>*</code> operators.</p>
<p>Some portion of the project also includes writing new documentation for Perl
6. I haven&rsquo;t written any yet, but I bet it will be fun. Before doing so, we
first have to figure out how to organize the the docs. Since Perl 6 is a
specification, it&rsquo;s not as obvious as with Perl 5, where the docs are included
(sometimes inline) with the implementation.</p>POE::Component::IRC 6.00 is herehttp://blog.nix.is/poe-component-irc-600-is-here
Thu, 05 Mar 2009 03:03:00 +0000http://blog.nix.is/poe-component-irc-600-is-here<p><a href="http://search.cpan.org/dist/POE-Component-IRC">POE::Component::IRC</a> version
6.00 has just been released on CPAN. I&rsquo;ve neglected to blog about PoCo::IRC
since I started contributing to it, but since a new major release has been
rolled out[1], now would be a good time. Also, as it turns out, next May will
be the tenth anniversary of the project&rsquo;s first release.
</p>
<p>For the uninitiated, POE::Component::IRC is an event-driven IRC client library
built on top of POE. People mostly use it to write bots. Some have made that
even easier by creating a simpler interface suited to that task (see
Bot::BasicBot).</p>
<p>I became involved in the project about 14 months ago, fixing bugs and adding
features. There&rsquo;ve been about 50 releases during that time, so there&rsquo;s
something for everybody. Following is a list of the most prominent ones.</p>
<h3 id="important-squashed-bugs">Important squashed bugs</h3>
<ul>
<li>Quite a few DCC-related bugs have been fixed, error handling and diagnostics have been improved.</li>
<li>A bug causing the NickReclaim plugin to only try to reclaim the nick once has been fixed.</li>
<li>POE::Component::IRC::State was reacting incorrectly to some WHO replies sent by IRC servers that veered from the RFCs, causing it to hold inconsistent information. This has been fixed.</li>
<li>When raw messages were enabled, the raw line was not provided with CTCP-related events. Fixed.</li>
<li>POE::Component::IRC::State would issue more WHO commands than necessary when another user would join more than one of the component&rsquo;s channels. No more.</li>
</ul>
<h3 id="new-major-features">New major features</h3>
<ul>
<li>POE::Component::IRC::Common, which provides many helper functions, now has functions for identifying and stripping color/formatting from IRC messages. It also defines IRC color constants for use in messages.</li>
<li>We now handle FreeNode&rsquo;s IDENTIFY-MSG capability, which means that can you always know whether a user had identified with NickServ when s/he wrote a particular message.</li>
<li>Sending and receiving files with spaces in them over DCC is now supported.</li>
<li>All DCC-related events now provide the IP address of the peer.</li>
<li>DCC resume support has been implemented.</li>
<li>The BotTraffic plugin now send an event for every CTCP ACTION issued by the client.</li>
<li>We now guard against sending IRC protocol messages that are too long and might get us booted off the server.</li>
<li>The Connector plugin (takes care of maintaining the connection to the IRC server) now supports cycling through a list of servers when reconnecting.</li>
<li>The CTCP plugin can now respond to CTCP SOURCE requests for you.</li>
<li>POE::Component::IRC::State and can now track the away status of users for you.</li>
<li>POE::Component::IRC::State now keeps track of a channel&rsquo;s creation time.</li>
<li>Added NICKSERV, SERVLIST, and SQUERY commands.</li>
<li>Plugins can now respond to custom events which have not been explicitly defined by POE::Component::IRC.</li>
</ul>
<h3 id="new-plugins">New plugins</h3>
<p>I wrote 5 additional core plugins:</p>
<ul>
<li>First of all, the <strong>Logger</strong> plugin. It logs channel/private/dcc chat activity to files on disk like normal IRC clients do.</li>
<li>Then there&rsquo;s <strong>AutoJoin</strong>, which takes care of keeping you on your favorite channels, whatever happens.</li>
<li><strong>NickServID</strong> deals with identifying your user to NickServ.</li>
<li>A <strong>CycleEmpty</strong> plugin which reclaims ops on channels that become empty.</li>
<li><strong>BotCommand</strong>, which allows you to register commands that your bot handles, and get back an appropriate event when one is issued.</li>
</ul>
<h3 id="testing">Testing</h3>
<p>The test suite has been reorganized, many tests improved and more added. The
test coverage (as reported by Devel::Cover) has increased from 40% (version
5.48) to 61% (version 6.00).</p>
<h3 id="refactoring">Refactoring</h3>
<p>Much refactoring was done. The coding and indenting style has also been made
consistent across the project, and many spotty coding practices have been
eliminated (thanks, Perl::Critic).</p>
<p>POE::Filter::CTCP was merged with POE::Filter::IRC:Compat, and the former was
removed. DCC support has been moved into its own plugin, and the plugin system
itself has been ripped out in favor of POE::Component::Pluggable (which is
based on the aforementioned plugin system).</p>
<p>Using the project&rsquo;s current Perl::Critic parameters, version 6.00 has zero
policy violations in 11,791 lines of code, compared to version 5.48&rsquo;s 242
violations in 10,634 lines of code. The average
<a href="http://en.wikipedia.org/wiki/Cyclomatic_complexity">McCabe</a> score of
subroutines also dropped from 4.21 to 3.45.</p>
<h3 id="documentation">Documentation</h3>
<p>Last but not least, the Pod docs have been improved. Errors have been fixed,
much more formatting and linking has been added for easier reading and
browsing, consistency has been improved, and many sections have been expanded.</p>
<p>I also added a
<a href="http://search.cpan.org/perldoc?POE::Component::IRC::Cookbook">cookbook</a> with
a few recipes showing off some of the things one can do with
POE::Component::IRC.</p>
<h3 id="credits">Credits</h3>
<p>Thanks to all the users who provided feedback, bug reports and patches. You
helped make this happen. I also couldn&rsquo;t have done many of these things
without the help of Chris &lsquo;BinGOs&rsquo; Williams, the senior maintainer of
POE::Component::IRC.</p>
<p>Now go write some IRC bots (or clients)!</p>
<p><em>Notes:</em></p>
<ol>
<li>It&rsquo;s actually quite an insignificant release. Historically,
POE::Component::IRC versions have always passed the whole-number boundaries
naturally as part of a regular &ldquo;bump the version number up by 0.02 for the
next release&rdquo; process.</li>
</ol>Spicing up IRC with MegaHALhttp://blog.nix.is/spicing-up-irc-with-megahal
Fri, 29 Aug 2008 04:45:00 +0000http://blog.nix.is/spicing-up-irc-with-megahal<p>I released <a href="http://search.cpan.org/perldoc?POE::Component::IRC::Plugin::MegaHAL">POE::Component::IRC::Plugin::MegaHAL</a>
the other day, a <a href="http://search.cpan.org/perldoc?POE::Component::IRC">POE::Component::IRC</a>
plugin for the esteemed and humorous conversation simulator known as
<a href="http://megahal.alioth.debian.org/">MegaHAL</a>. This makes it very easy to add a
MegaHAL brain to an IRC bot. I wrote one and fed him IRC logs of a channel
that I frequent, then let him loose online (he&rsquo;s a &ldquo;he&rdquo; because the IRC logs
primarily include ramblings from male users). Some examples of his brilliance
follow.
</p>
<pre><code>&lt;failo&gt; Or maybe it's about sexual liberation and not perl &amp; python.
&lt;failo&gt; Yes, printing passport photos at home is surely a sign of a flock of aids infected prisoners with big sloppy dicks.
&lt;failo&gt; I heard failo is a fact of evolution that cannot be shown because it looks like he knows what electricity is
&lt;failo&gt; Fucking get your cock out and jerk each other off in proportion to how much dexter's wedding dinner is going your way
&lt;failo&gt; Sir you had me at &quot;balls.&quot;
&lt;failo&gt; Every porn dvd has a &quot;grill setting&quot;
&lt;failo&gt; So only 2% of americans also believe that the government is composed of thong-wearing oiled-up muscle boys.
&lt;failo&gt; Increased struggling from the bees made honey in the darkened cubicle, waiting for another musky man-pole to probe its way through the light to go mushroom hunting with me
&lt;failo&gt; I bet all drug users keep a human skull on their myspace page.
&lt;failo&gt; Earle: everybody's gay for a prison system in which millions of our descendants murder each other in prison.
&lt;failo&gt; Either you're gay or associated with all drugs.
&lt;failo&gt; The war on polarbears is going to cum buckets into my gaping /dev/null.
</code></pre>
<p>I&rsquo;m afraid these results say more about the quality of conversation in this
channel than they do about anything else&hellip;</p>Perl book (mini) reviewshttp://blog.nix.is/perl-book-mini-reviews
Sun, 04 May 2008 21:33:00 +0000http://blog.nix.is/perl-book-mini-reviews<p>I much prefer reading things in well typeset paperback books than on computer
screens. The subject of Perl is no exception. I&rsquo;ve got quite a few Perl books.
Some reviews are in order. I haven&rsquo;t actually written a book review in many
years, so they won&rsquo;t be very &ldquo;in-depth&rdquo;, though.</p>
<p>Oh, and by the way, this, too, <a href="http://use.perl.org/~schwern/journal/36263">is a Perl blog</a>.
</p>
<style>.book-pic { float: right; padding: 10px; }</style>
<h2 id="programming-perl">Programming Perl</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/progperl.png" alt="Programming Perl" />
</figure>
<p><em>By Larry Wall, Tom Christiansen, and Jon Orwant</em></p>
<p>Ah, the Camel book. The latest edition is about 7 years old now, but it has
stood the test of time. If you only read one Perl book, make it this one.
Aside from being a comprehensive and detailed reference on the language, it&rsquo;s
also funny. One of my favorite passages is the start of chapter 10, Packages:</p>
<blockquote>
<p>In this chapter, we get to start having fun, because we get to start
talking about software design. If we&rsquo;re going to talk about good software
design, we have to talk about Laziness, Impatience, and Hubris, the basis
of good software design.</p>
<p>We&rsquo;ve all fallen into the trap of using cut-and-paste when we should have
defined a higher-level abstraction, if only just a loop or a subroutine.*</p>
<p>To be sure, some folks have gone to the opposite extreme of defining
ever-growing mounds of higher-level abstractions when they should have
used cut-and-paste.† Generally, though, most of us need to think about
using more abstraction rather than less.</p>
<p>Caught somewhere in the middle are the people who have a balanced view of
how much abstraction is good, but who jump the gun on writing their own
abstractions when they should be reusing existing code.‡</p>
<p>――――――――――</p>
<p>* This is a form of False Laziness.</p>
<p>† This is a form of False Hubris.</p>
<p>‡ You guessed it — this is False Impatience. But if you&rsquo;re determined to
reinvent the wheel, at least try to invent a better one.</p>
</blockquote>
<p>The book is written for programmers, so if you&rsquo;ve haven&rsquo;t programmed before,
you should probably start with an easier book. I&rsquo;ve heard &ldquo;Learning Perl&rdquo; is
pretty good.</p>
<h2 id="the-perl-cookbook">The Perl Cookbook</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perlcookbook.png" alt="Perl Cookbook" />
</figure>
<p><em>By Tom Christiansen and Nathan Torkington</em></p>
<p>Here is another highly useful book. If you read the Camel book and thought
&ldquo;What next?&rdquo;, then this book is the answer. It&rsquo;s got 900 pages of recipes for
how to do everything from sorting hashes to writing pre-forking TCP servers.
Each one of the 22 chapters begins with a few pages of very informative
discussion of the problem domain at hand. This book has been extremely useful
to me. It saves you a lot of Googling and looking through various Perl man
pages. I highly recommend it.</p>
<h2 id="mastering-regular-expressions">Mastering Regular Expressions</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/masteringregex.png" alt="Mastering Regular Expressions" />
</figure>
<p><em>By Jeffrey Friedl</em></p>
<p>While not strictly a Perl book, it is about regular expressions, which were
to a large extent popularized by Perl. Or the other way around. Or both. I&rsquo;m
not sure. But hey, at least the chapter on Perl is the longest one in the
book.</p>
<p>Anyway, the book truly lives up to its title. It describes in detail how
regular expression engines work, how to write effecient expressions, how
programming languages support regexes, and more. Until I read this book, I did
not know that there are two main approaches to writing a regular expression
engine, a DFA (deterministic finite automaton) and an NFA (nondeterministic
finite automaton), or that the fastest way to strip leading and trailing
whitespace is (usually) like so:</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #4e9a06">s/^\s+//</span><span style="color: #000000; font-weight: bold">;</span>
<span style="color: #4e9a06">s/s+$//</span><span style="color: #000000; font-weight: bold">;</span>
</pre></div>
<p>I hadn&rsquo;t actually learned Perl before getting this book. Perl is used in most
of the examples in the book, which is what got me interested in the language.</p>
<h2 id="mastering-algorithms-with-perl">Mastering Algorithms with Perl</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/masteringalgorithms.png" alt="Mastering Algorithms with Perl" />
</figure>
<p><em>By Jon Orwant, Jarkko Hietaniemi, and John Macdonald</em></p>
<p>I&rsquo;m not a computer scientist. Perhaps that&rsquo;s why I found this book so
interesting. It has working Perl code accompanying all the algorithms that
are explained in the book, as well as discussions of pertinent CPAN modules.
Though I suspect that last part might be a bit outdated, as the book was
written almost a decade ago. If you know Perl and would like a hands-on
introduction to algorithms, this book is great.</p>
<h2 id="perl-best-practices">Perl Best Practices</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perlbestpractices.png" alt="Perl Best Practices" />
</figure>
<p><em>By Damian Conway</em></p>
<p>I liked this book. I like my code especially clean and readable, and this book
has very good guidelines that can help achieve that. The book is a joy to read
(it has humour!), and it&rsquo;s got very useful appendices at the end with quick
summaries of all the guidelines. Then, if you want to audit your source code
according to some of these (and other) guidelines, you can use the excellent
<a href="http://search.cpan.org/dist/Perl-Critic/">Perl::Critic</a> module.</p>
<h2 id="perl-hacks">Perl Hacks</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perlhacks.png" alt="Perl Hacks" />
</figure>
<p><em>By <code>chromatic</code>, Damian Conway, and Curtis &ldquo;Ovid&rdquo; Poe</em></p>
<p>This one is sort of like The Perl Cookbook, except it focuses more on the
process of working with Perl rather than specific solutions to various
programmatic problems. As such, it&rsquo;s not really aimed at beginners. It has
many very useful tricks regarding things like customizing your editor for
Perl, working with modules, (ab)using the guts of Perl, and debugging.
There&rsquo;s something in it for everyone.</p>
<h2 id="higher-order-perl">Higher-Order Perl</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/higherorderperl.png" alt="Higher Order Perl" />
</figure>
<p><em>By Mark Jason Dominus</em></p>
<p><em>Now</em> it&rsquo;s getting interesting. In this book, the author tries to familiarize
the reader with ways of programming most common in purely functional
languages like Lisp and Haskell. That makes it very different from most Perl
books (in a good way). The book is all about &ldquo;being clever&rdquo; as the author has
said, so you&rsquo;re bound to learn something from it. I certainly have. The topics
covered include recursion, dispatch tables, currying, and parsing, to name a
few. There are <a href="http://hop.perl.plover.com/#free">plans</a> to turn the book into
wiki, but I I&rsquo;m not sure when that&rsquo;ll happen.</p>
<h2 id="perl-6-and-parrot-essentials">Perl 6 and Parrot Essentials</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perl6andparrot.png" alt="Perl 6 and Parrot Essentials" />
</figure>
<p><em>By Allison Randal, Dan Sugalski, and Leopold Toetsch</em></p>
<p>This book is an excellent introduction to Perl 6 and Parrot (the virtual
machine which will run Perl 6), though I&rsquo;ve been told that the Parrot bits
are a little dated now. I read it in one sitting, it was that exciting. I
realize that his book is about programming, and that this statement makes me
a total geek. Oh well, there&rsquo;s no looking back, Perl 6 will be awesome! The
first half of the book covers the Perl 6 language, while the second covers
the design of Parrot. I didn&rsquo;t absorb as much as I could have from the second
half as I don&rsquo;t know much about parsing or compilers, but the first half more
than made up for it. I really wish I could use all those nifty featues in my
code right now. In the meantime, <a href="http://en.wikipedia.org/wiki/Pugs">Pugs</a>
will have to quence the thirst of those like me who are excited about Perl 6.</p>
<h2 id="perl-6-now">Perl 6 Now</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perl6now.png" alt="Perl 6 Now" />
</figure>
<p><em>By Scott Walters</em></p>
<p>I didn&rsquo;t get much out of this book. For
one, many things are explained which most Perl users already know, so I
flipped over many pages while reading it. Secondly, many of the Perl 6 ideas
are illustrated with modules that use source filters, which are known to be
unreliable, so you definitely wouldn&rsquo;t use them in production code.</p>
<h2 id="catalyst">Catalyst</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/catalyst.png" alt="Catalyst" />
</figure>
<p><em>By Jonathan Rockway</em></p>
<p>The chapters in this book follow a very <em>practical</em> approach, diving right in
and showing you various common ways of using Catalyst. I haven&rsquo;t written much
using this fine web framework, so there isn&rsquo;t really much more I can say about
the content of the book. I can say something about the <em>lack</em> of content
though. This book is missing an overview of Catalyst, and a better explanation
of how all the parts fits together. Some discussion on the MVC architecture
and how Catalyst implements it would have been nice as well. Seeing as the
book is quite short, there would be plenty of room for these things.</p>
<h2 id="perl-medic">Perl Medic</h2>
<figure class="book-pic">
<img src="http://blog.nix.is/books/perlmedic.png" alt="Perl Medic" />
</figure>
<p><em>By Peter J. Scott</em></p>
<p>I was not disappointed by this book. It delivers what is promised. Having
recently started working on an old codebase, I found the book very useful.
Testing, refactoring, benchmarking, debugging; all these things are well laid
out by the author. It also has a useful chapter on upgrading code written for
older versions on Perl, and discusses how to make use of newer features.</p>IRC and character encodinghttp://blog.nix.is/irc-and-character-encoding
Thu, 01 May 2008 01:44:00 +0000http://blog.nix.is/irc-and-character-encoding<p>A while ago, I wrote an IRC logger for
<a href="http://search.cpan.org/dist/POE-Component-IRC/">POE::Component::IRC</a>, which
is an IRC client module for Perl. The main challenge I faced was the issue of
character encodings. Since IRC is ripe with clients that use different
encodings, messages must be reliably decoded before they are written to a
file.
</p>
<p>You see, <a href="http://www.faqs.org/rfcs/rfc1459.html">RFC 1459</a>, the standards
document describing the IRC protocol, does not regulate the use of character
encodings:</p>
<pre><code>2.2 Character codes
No specific character set is specified. The protocol is based on a
set of codes which are composed of eight (8) bits, making up an
octet. Each message may be composed of any number of these octets;
however, some octet values are used for control codes which act as
message delimiters.
Regardless of being an 8-bit protocol, the delimiters and keywords
are such that protocol is mostly usable from USASCII terminal and a
telnet connection.
</code></pre>
<p>ASCII uses the first 7 bits. So, from the looks of it, you should only be able
to rely on the first seven bits representing an ASCII character, the
interpretation of the last bit being anyone&rsquo;s guess. That&rsquo;s bad.</p>
<p>For most of IRC&rsquo;s history, the most popular IRC client has been mIRC. Until
recently, mIRC decoded incoming messages using the ANSI code page that was
currently being used on the user&rsquo;s Windows system. This meant that whenever
mIRC users wanted to communicate using anything other than ASCII characters,
they&rsquo;d better be using the same code page. In later versions, mIRC decodes
incoming messages as UTF-8 if they look UTF-8 encoded, or code page 1252 (used
by most Westerners). As for <em>how</em> it does this, I cannot know since mIRC is
closed-source.</p>
<p>The open-source client irssi handles the situation similarly. It uses GLib&rsquo;s
<a href="http://library.gnome.org/devel/glib/2.16/glib-Unicode-Manipulation.html#g-utf8-validate">g_utf8_validate()</a>
function to check if the incoming message is UTF-8 encoded, otherwise it falls
back to CP1252 by default. As for XChat, it uses the same GLib function, but
if it determines that the message is not UTF-8, XChat decodes the message in a
rather novel way. Here is an excerpt from its
<a href="http://xchat.cvs.sourceforge.net/xchat/xchat2/src/common/text.c?view=markup"><code>src/common/text.c</code></a>:</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #8f5902; font-style: italic">/* converts a CP1252/ISO-8859-1(5) hybrid to UTF-8 */</span>
<span style="color: #8f5902; font-style: italic">/* Features: 1. It never fails, all 00-FF chars are converted to valid UTF-8 */</span>
<span style="color: #8f5902; font-style: italic">/* 2. Uses CP1252 in the range 80-9f because ISO doesn&#39;t have any- */</span>
<span style="color: #8f5902; font-style: italic">/* thing useful in this range and it helps us receive from mIRC */</span>
<span style="color: #8f5902; font-style: italic">/* 3. The five undefined chars in CP1252 80-9f are replaced with */</span>
<span style="color: #8f5902; font-style: italic">/* ISO-8859-15 control codes. */</span>
<span style="color: #8f5902; font-style: italic">/* 4. Handles 0xa4 as a Euro symbol ala ISO-8859-15. */</span>
<span style="color: #8f5902; font-style: italic">/* 5. Uses ISO-8859-1 (which matches CP1252) for everything else. */</span>
<span style="color: #8f5902; font-style: italic">/* 6. This routine measured 3x faster than g_convert :) */</span>
</pre></div>
<p>How would I handle this in Perl? I don&rsquo;t want to depend on GLib, and I don&rsquo;t
want to write any C code (requiring the user to have a C compiler). At first I
tried using <a href="http://search.cpan.org/dist/Encode-Detect/">Encode::Detect</a>, but
there are two problems with it. It&rsquo;s an extra dependency, and more
importantly, it works heuristically, deciding which character set is being
used based on the number of occurences of each character code. As such, it&rsquo;s
only reliable when large amounts of data are involved. Like a whole web page,
for example, which is what the code was written for. Then I learned of
<a href="http://perldoc.perl.org/Encode/Guess.html">Encode::Guess</a>, which is included
with Perl as of version 5.6.0. The following decodes <code>$line</code> as UTF-8 if
Encode::Guess is sure that it&rsquo;s UTF-8. Otherwise it decodes it as CP1252.</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #204a87; font-weight: bold">use</span> <span style="color: #000000">Encode</span> <span style="color: #4e9a06">qw(decode)</span><span style="color: #000000; font-weight: bold">;</span>
<span style="color: #204a87; font-weight: bold">use</span> <span style="color: #000000">Encode::Guess</span><span style="color: #000000; font-weight: bold">;</span>
<span style="color: #204a87; font-weight: bold">my</span> <span style="color: #000000">$utf8</span> <span style="color: #ce5c00; font-weight: bold">=</span> <span style="color: #000000">guess_encoding</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #4e9a06">&#39;utf8&#39;</span><span style="color: #000000; font-weight: bold">);</span>
<span style="color: #000000">$line</span> <span style="color: #ce5c00; font-weight: bold">=</span> <span style="color: #204a87">ref</span> <span style="color: #000000">$utf8</span> <span style="color: #000000; font-weight: bold">?</span> <span style="color: #000000">decode</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #4e9a06">&#39;utf8&#39;</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">)</span> <span style="color: #000000; font-weight: bold">:</span> <span style="color: #000000">decode</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #4e9a06">&#39;cp1252&#39;</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">);</span>
</pre></div>
<p>So far this method has worked flawlessly for me on channels with mixed
encodings. However, I don&rsquo;t know exactly how Encode::Guess works, so I&rsquo;m not
as confident in this method as I could be. Any feedback on this issue would be
quite welcome.</p>