literal thoughtshttp://blog.nix.is/tags/irc/index.xml
Recent content on literal thoughtsHugo -- gohugo.ioen-usPOE::Component::IRC 6.00 is herehttp://blog.nix.is/poe-component-irc-600-is-here
Thu, 05 Mar 2009 03:03:00 +0000http://blog.nix.is/poe-component-irc-600-is-here<p><a href="http://search.cpan.org/dist/POE-Component-IRC">POE::Component::IRC</a> version
6.00 has just been released on CPAN. I&rsquo;ve neglected to blog about PoCo::IRC
since I started contributing to it, but since a new major release has been
rolled out[1], now would be a good time. Also, as it turns out, next May will
be the tenth anniversary of the project&rsquo;s first release.
</p>
<p>For the uninitiated, POE::Component::IRC is an event-driven IRC client library
built on top of POE. People mostly use it to write bots. Some have made that
even easier by creating a simpler interface suited to that task (see
Bot::BasicBot).</p>
<p>I became involved in the project about 14 months ago, fixing bugs and adding
features. There&rsquo;ve been about 50 releases during that time, so there&rsquo;s
something for everybody. Following is a list of the most prominent ones.</p>
<h3 id="important-squashed-bugs">Important squashed bugs</h3>
<ul>
<li>Quite a few DCC-related bugs have been fixed, error handling and diagnostics have been improved.</li>
<li>A bug causing the NickReclaim plugin to only try to reclaim the nick once has been fixed.</li>
<li>POE::Component::IRC::State was reacting incorrectly to some WHO replies sent by IRC servers that veered from the RFCs, causing it to hold inconsistent information. This has been fixed.</li>
<li>When raw messages were enabled, the raw line was not provided with CTCP-related events. Fixed.</li>
<li>POE::Component::IRC::State would issue more WHO commands than necessary when another user would join more than one of the component&rsquo;s channels. No more.</li>
</ul>
<h3 id="new-major-features">New major features</h3>
<ul>
<li>POE::Component::IRC::Common, which provides many helper functions, now has functions for identifying and stripping color/formatting from IRC messages. It also defines IRC color constants for use in messages.</li>
<li>We now handle FreeNode&rsquo;s IDENTIFY-MSG capability, which means that can you always know whether a user had identified with NickServ when s/he wrote a particular message.</li>
<li>Sending and receiving files with spaces in them over DCC is now supported.</li>
<li>All DCC-related events now provide the IP address of the peer.</li>
<li>DCC resume support has been implemented.</li>
<li>The BotTraffic plugin now send an event for every CTCP ACTION issued by the client.</li>
<li>We now guard against sending IRC protocol messages that are too long and might get us booted off the server.</li>
<li>The Connector plugin (takes care of maintaining the connection to the IRC server) now supports cycling through a list of servers when reconnecting.</li>
<li>The CTCP plugin can now respond to CTCP SOURCE requests for you.</li>
<li>POE::Component::IRC::State and can now track the away status of users for you.</li>
<li>POE::Component::IRC::State now keeps track of a channel&rsquo;s creation time.</li>
<li>Added NICKSERV, SERVLIST, and SQUERY commands.</li>
<li>Plugins can now respond to custom events which have not been explicitly defined by POE::Component::IRC.</li>
</ul>
<h3 id="new-plugins">New plugins</h3>
<p>I wrote 5 additional core plugins:</p>
<ul>
<li>First of all, the <strong>Logger</strong> plugin. It logs channel/private/dcc chat activity to files on disk like normal IRC clients do.</li>
<li>Then there&rsquo;s <strong>AutoJoin</strong>, which takes care of keeping you on your favorite channels, whatever happens.</li>
<li><strong>NickServID</strong> deals with identifying your user to NickServ.</li>
<li>A <strong>CycleEmpty</strong> plugin which reclaims ops on channels that become empty.</li>
<li><strong>BotCommand</strong>, which allows you to register commands that your bot handles, and get back an appropriate event when one is issued.</li>
</ul>
<h3 id="testing">Testing</h3>
<p>The test suite has been reorganized, many tests improved and more added. The
test coverage (as reported by Devel::Cover) has increased from 40% (version
5.48) to 61% (version 6.00).</p>
<h3 id="refactoring">Refactoring</h3>
<p>Much refactoring was done. The coding and indenting style has also been made
consistent across the project, and many spotty coding practices have been
eliminated (thanks, Perl::Critic).</p>
<p>POE::Filter::CTCP was merged with POE::Filter::IRC:Compat, and the former was
removed. DCC support has been moved into its own plugin, and the plugin system
itself has been ripped out in favor of POE::Component::Pluggable (which is
based on the aforementioned plugin system).</p>
<p>Using the project&rsquo;s current Perl::Critic parameters, version 6.00 has zero
policy violations in 11,791 lines of code, compared to version 5.48&rsquo;s 242
violations in 10,634 lines of code. The average
<a href="http://en.wikipedia.org/wiki/Cyclomatic_complexity">McCabe</a> score of
subroutines also dropped from 4.21 to 3.45.</p>
<h3 id="documentation">Documentation</h3>
<p>Last but not least, the Pod docs have been improved. Errors have been fixed,
much more formatting and linking has been added for easier reading and
browsing, consistency has been improved, and many sections have been expanded.</p>
<p>I also added a
<a href="http://search.cpan.org/perldoc?POE::Component::IRC::Cookbook">cookbook</a> with
a few recipes showing off some of the things one can do with
POE::Component::IRC.</p>
<h3 id="credits">Credits</h3>
<p>Thanks to all the users who provided feedback, bug reports and patches. You
helped make this happen. I also couldn&rsquo;t have done many of these things
without the help of Chris &lsquo;BinGOs&rsquo; Williams, the senior maintainer of
POE::Component::IRC.</p>
<p>Now go write some IRC bots (or clients)!</p>
<p><em>Notes:</em></p>
<ol>
<li>It&rsquo;s actually quite an insignificant release. Historically,
POE::Component::IRC versions have always passed the whole-number boundaries
naturally as part of a regular &ldquo;bump the version number up by 0.02 for the
next release&rdquo; process.</li>
</ol>Spicing up IRC with MegaHALhttp://blog.nix.is/spicing-up-irc-with-megahal
Fri, 29 Aug 2008 04:45:00 +0000http://blog.nix.is/spicing-up-irc-with-megahal<p>I released <a href="http://search.cpan.org/perldoc?POE::Component::IRC::Plugin::MegaHAL">POE::Component::IRC::Plugin::MegaHAL</a>
the other day, a <a href="http://search.cpan.org/perldoc?POE::Component::IRC">POE::Component::IRC</a>
plugin for the esteemed and humorous conversation simulator known as
<a href="http://megahal.alioth.debian.org/">MegaHAL</a>. This makes it very easy to add a
MegaHAL brain to an IRC bot. I wrote one and fed him IRC logs of a channel
that I frequent, then let him loose online (he&rsquo;s a &ldquo;he&rdquo; because the IRC logs
primarily include ramblings from male users). Some examples of his brilliance
follow.
</p>
<pre><code>&lt;failo&gt; Or maybe it's about sexual liberation and not perl &amp; python.
&lt;failo&gt; Yes, printing passport photos at home is surely a sign of a flock of aids infected prisoners with big sloppy dicks.
&lt;failo&gt; I heard failo is a fact of evolution that cannot be shown because it looks like he knows what electricity is
&lt;failo&gt; Fucking get your cock out and jerk each other off in proportion to how much dexter's wedding dinner is going your way
&lt;failo&gt; Sir you had me at &quot;balls.&quot;
&lt;failo&gt; Every porn dvd has a &quot;grill setting&quot;
&lt;failo&gt; So only 2% of americans also believe that the government is composed of thong-wearing oiled-up muscle boys.
&lt;failo&gt; Increased struggling from the bees made honey in the darkened cubicle, waiting for another musky man-pole to probe its way through the light to go mushroom hunting with me
&lt;failo&gt; I bet all drug users keep a human skull on their myspace page.
&lt;failo&gt; Earle: everybody's gay for a prison system in which millions of our descendants murder each other in prison.
&lt;failo&gt; Either you're gay or associated with all drugs.
&lt;failo&gt; The war on polarbears is going to cum buckets into my gaping /dev/null.
</code></pre>
<p>I&rsquo;m afraid these results say more about the quality of conversation in this
channel than they do about anything else&hellip;</p>IRC and character encodinghttp://blog.nix.is/irc-and-character-encoding
Thu, 01 May 2008 01:44:00 +0000http://blog.nix.is/irc-and-character-encoding<p>A while ago, I wrote an IRC logger for
<a href="http://search.cpan.org/dist/POE-Component-IRC/">POE::Component::IRC</a>, which
is an IRC client module for Perl. The main challenge I faced was the issue of
character encodings. Since IRC is ripe with clients that use different
encodings, messages must be reliably decoded before they are written to a
file.
</p>
<p>You see, <a href="http://www.faqs.org/rfcs/rfc1459.html">RFC 1459</a>, the standards
document describing the IRC protocol, does not regulate the use of character
encodings:</p>
<pre><code>2.2 Character codes
No specific character set is specified. The protocol is based on a
set of codes which are composed of eight (8) bits, making up an
octet. Each message may be composed of any number of these octets;
however, some octet values are used for control codes which act as
message delimiters.
Regardless of being an 8-bit protocol, the delimiters and keywords
are such that protocol is mostly usable from USASCII terminal and a
telnet connection.
</code></pre>
<p>ASCII uses the first 7 bits. So, from the looks of it, you should only be able
to rely on the first seven bits representing an ASCII character, the
interpretation of the last bit being anyone&rsquo;s guess. That&rsquo;s bad.</p>
<p>For most of IRC&rsquo;s history, the most popular IRC client has been mIRC. Until
recently, mIRC decoded incoming messages using the ANSI code page that was
currently being used on the user&rsquo;s Windows system. This meant that whenever
mIRC users wanted to communicate using anything other than ASCII characters,
they&rsquo;d better be using the same code page. In later versions, mIRC decodes
incoming messages as UTF-8 if they look UTF-8 encoded, or code page 1252 (used
by most Westerners). As for <em>how</em> it does this, I cannot know since mIRC is
closed-source.</p>
<p>The open-source client irssi handles the situation similarly. It uses GLib&rsquo;s
<a href="http://library.gnome.org/devel/glib/2.16/glib-Unicode-Manipulation.html#g-utf8-validate">g_utf8_validate()</a>
function to check if the incoming message is UTF-8 encoded, otherwise it falls
back to CP1252 by default. As for XChat, it uses the same GLib function, but
if it determines that the message is not UTF-8, XChat decodes the message in a
rather novel way. Here is an excerpt from its
<a href="http://xchat.cvs.sourceforge.net/xchat/xchat2/src/common/text.c?view=markup"><code>src/common/text.c</code></a>:</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #8f5902; font-style: italic">/* converts a CP1252/ISO-8859-1(5) hybrid to UTF-8 */</span>
<span style="color: #8f5902; font-style: italic">/* Features: 1. It never fails, all 00-FF chars are converted to valid UTF-8 */</span>
<span style="color: #8f5902; font-style: italic">/* 2. Uses CP1252 in the range 80-9f because ISO doesn&#39;t have any- */</span>
<span style="color: #8f5902; font-style: italic">/* thing useful in this range and it helps us receive from mIRC */</span>
<span style="color: #8f5902; font-style: italic">/* 3. The five undefined chars in CP1252 80-9f are replaced with */</span>
<span style="color: #8f5902; font-style: italic">/* ISO-8859-15 control codes. */</span>
<span style="color: #8f5902; font-style: italic">/* 4. Handles 0xa4 as a Euro symbol ala ISO-8859-15. */</span>
<span style="color: #8f5902; font-style: italic">/* 5. Uses ISO-8859-1 (which matches CP1252) for everything else. */</span>
<span style="color: #8f5902; font-style: italic">/* 6. This routine measured 3x faster than g_convert :) */</span>
</pre></div>
<p>How would I handle this in Perl? I don&rsquo;t want to depend on GLib, and I don&rsquo;t
want to write any C code (requiring the user to have a C compiler). At first I
tried using <a href="http://search.cpan.org/dist/Encode-Detect/">Encode::Detect</a>, but
there are two problems with it. It&rsquo;s an extra dependency, and more
importantly, it works heuristically, deciding which character set is being
used based on the number of occurences of each character code. As such, it&rsquo;s
only reliable when large amounts of data are involved. Like a whole web page,
for example, which is what the code was written for. Then I learned of
<a href="http://perldoc.perl.org/Encode/Guess.html">Encode::Guess</a>, which is included
with Perl as of version 5.6.0. The following decodes <code>$line</code> as UTF-8 if
Encode::Guess is sure that it&rsquo;s UTF-8. Otherwise it decodes it as CP1252.</p>
<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #204a87; font-weight: bold">use</span> <span style="color: #000000">Encode</span> <span style="color: #4e9a06">qw(decode)</span><span style="color: #000000; font-weight: bold">;</span>
<span style="color: #204a87; font-weight: bold">use</span> <span style="color: #000000">Encode::Guess</span><span style="color: #000000; font-weight: bold">;</span>
<span style="color: #204a87; font-weight: bold">my</span> <span style="color: #000000">$utf8</span> <span style="color: #ce5c00; font-weight: bold">=</span> <span style="color: #000000">guess_encoding</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #4e9a06">&#39;utf8&#39;</span><span style="color: #000000; font-weight: bold">);</span>
<span style="color: #000000">$line</span> <span style="color: #ce5c00; font-weight: bold">=</span> <span style="color: #204a87">ref</span> <span style="color: #000000">$utf8</span> <span style="color: #000000; font-weight: bold">?</span> <span style="color: #000000">decode</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #4e9a06">&#39;utf8&#39;</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">)</span> <span style="color: #000000; font-weight: bold">:</span> <span style="color: #000000">decode</span><span style="color: #000000; font-weight: bold">(</span><span style="color: #4e9a06">&#39;cp1252&#39;</span><span style="color: #000000; font-weight: bold">,</span> <span style="color: #000000">$line</span><span style="color: #000000; font-weight: bold">);</span>
</pre></div>
<p>So far this method has worked flawlessly for me on channels with mixed
encodings. However, I don&rsquo;t know exactly how Encode::Guess works, so I&rsquo;m not
as confident in this method as I could be. Any feedback on this issue would be
quite welcome.</p>