2014-08-13T03:05:16+09:00http://rurounijones.github.com/Octopress2014-08-11T12:00:00+09:00http://rurounijones.github.com/blog/2014/08/11/bayesian-filter-performance-in-rubyA while ago I was in an online discussion regarding “Correct Tools for
the Jobs” and how ruby was not a good language for developing say, an
operating system.

Someone made a comment that it was not a good idea to use ruby for
bayesian filtering of things like forum posts. (Bayesian filtering is
one of the primary algorithms used for determining if an email or forum
post is spam or not).

They made some performance claims which seemeds exceedingly slow, but
also made some statements that made me suspect their application design
was not all that it could be so I thought I would see if ruby was the
guilty party.

This post is not primarily about bayesian filtering but about
performance testing; it is probably most helpful to low to intermediate
ruby developers.

TL;DR

In this post I will write about:

Investigating to see if ruby can bayesian filter 1000 posts / second

Finding a Bayesian filter

Finding an appropriate data-set

Writing some code to allow us to benchmark things

Identifying performance bottlenecks using the stackprof gem and
fixing them.

Great success!

“I learned something today”

If you are here for the performance analysis stuff with Stackprof then
you can skip down to “The Analysing” section below.

The Claim

So first we need to identify the claims made against ruby, this will
determine what our goals are.

The poster made the following claims (Paraphrased slightly):

“Implemented a Bayesian Filter to help deal with forum spam”

“Tested under load”

“Sometimes took up to 15 seconds due to all the Array structures”

“The same implementation in C never exceeded half a second”

Now, if you are a ruby person the third claim might have been a red
flag for you (It was for me). “All the Array Structures”… Now when
it comes to searching, Arrays are slow, dog slow. A quick look at
our farvourite Big-O Notation Cheetsheet
site shows that searching an array is an “O*N” operation. This means
that search time increases as the size of the array increases.

The other thing is that Arrays shouldn’t really feature that much in
Bayesian filtering as the algorithm doesn’t really care if a word
appears once or a hundred times.

These were the things that made me wonder if the poster was correct
in blaiming ruby for the slowness. I also happened to know that there
are some bayesian filter gems in the ruby-ecosystem which doesn’t
really make sense if it is as slow as the poster claims.

When I pointed out the fact that it might be the poster’s design they
didn’t take it very well. They stated the challenge to create a userland
ruby application that processes 1000 unique variable length posts a
second. Their training data set was 100,000 posts.

The Filter

Right sports fans! The first thing we need to do is write a bayesian
filter in ruby… Bwahahahahah, I make myself laugh. I am a great
believer in standing on the shoulders of giants so instead of writing
a filter lets find some likely ruby gems.

Some searching on github and “The Ruby Toolbox” lead me to the
Classifier gem, however
the last commit was a bit old and there was
this issue which
called Classifier’s accuracy into question which, at the time, was
not fixed.

Now, at this point I should confess, I only have a vague overview
level of knowledge on how bayesian filtering works and I was hoping
to spare my brain-power by not learning the nitty gritty mathematical
details (One of the reasons I was looking for a Gem in the first place)

However, the poster of that issue wrote their own gem instead called
Ankusa which looks pretty good.
So lets go with that.

The Dataset

So now we have our filter we just need a data-set. The original poster
said that they used a training data set of 100,000 posts. I am going
to assume a roughtly 50/50 split here and say they had 50,000 known
good posts (from now on called ‘ham’) and 50,000 spam posts.

After some DuckDuckGo‘ing I found some
likely sources of curated spam/ham data-sets. It looked like at least
one good thing came out of the
Enron Collapse which is
that their emails were made public as part of the discovery process.

Some researchers then put them up for
download. While the numbers aren’t
quite up to the 100,000 posts the original poster said they are
in the same ballpark.

19,089 ham emails

32,989 spam emails

For a grand-total of around 50,000 emails. Due to the way the filter
is written I don’t see there being much difference in speed between
50,000 and 100,000 items in the training set.

NOTE: If you are reading this and know of a bigger training set that
is freely available I would love to hear about it.

Wow, 750 odd words into this article and we haven’t even gotten to the
beginning of the code stuff yet.

Assumptions

Before I continue with the actual code stuff I am also going to have
to make some assumptions on how the filter was actually used. (At least
this is probably how I would have done it) I hope you will agree that
these are logical assumptions:

A dedicated Bayesian machine: If you are processing 1000 forum
posts / second I am going to assume that your operation is large enough
to warrent a dedicated bayesian filtering server (Probably at the end of
a REST API or message queue called by your front-end machines)

No online updating of the training data-set: I will assume the training
set is batch-updated at some point (Maybe a daily / weekly thing).

The training data-set is held in memory: After implementating the
training code using the Enron data-set I found that the training data only
took up about 17MB of disk-space. Since this is pretty small and we assumed
we have a dedicated server (See Assumption #1) that we will hold it in
memory for maximum performance rather than a database.

The Code

Before we can benchmark Ankusa we need some sort of test-runner code.
To that end I present to you
“Don’t Bayes Me Bro”
(DBMB: Well it made me chuckle when I named it and I like the idea of
spammers being tazered).

All code samples from here-on-out will come from either Ankusa or DBMB.

WARNING: DBMB code is messy and not TDD’d and likely to make your
“Beautiful Code” gland rupture.

Training

DBMB has a “training” folder into which we dumped the Enron emails from
earlier. All we are doing in this code recursively reading files from
the “spam” and “ham” sub-directories and training ankusa with them.

We then save the data to a file called “corpus”. Note that since we
are doing an operation that uses file I/O we can speed things up by
creating one thread for “spam” and one for “ham”. The GIL gets in the
way a bit but still get a performance speed-up in this case.

Since the original poster was talking about “Forum Threads” I decided
to just parse the email body rather than everything including headers
etc.

This is kicked off from a Rake task.

Benchmarking

Benchmarking is done here
and is started from another Rake task which allows us to test 1,000
to 30,000 emails using the original data-set from which the corpus
came from.

To run the benchmark we insert the required number of email bodies
into a queue. We then pop from the queue and run the filter. Why a queue?
Well I was thinking down the road when multi-threading might make
an appearance.

A few things to note. To avoid skewing the benchmarks we pre-initialize
a two variables which are otherwise lazy loaded by Ankusa.

To keep things deterministic we save our queue data to a file for
future reading, this also saves time over parsing thousands of emails
each time. We also avoid emails with VERY short bodies (Less than 100
characters) to avoid getting an corpus with an overly short average
body length.

The Analysis

Ok, here it is, the moment we have all been waiting for. After lots
of waffle about setting this up, it is time for the main event, the
actual benchmarking, the thing you are actually reading this blog post
about probably, the thing that is at the end of this overly long sentence
which is driving you insane!

Note: In the interest of brevity (In a post this long?! HAH) I have
removed a lot of unnecessary output from the commands.

Here I have told stackprof to identify the methods which were taking up
the most runtime and limit it to the longest 4 methods. And holy moly!

From reading line 17 we can tell that the vast VAST VAST majority of
the runtime is taken up with a single method! This is good and bad.
Good, in that if we can optimise this then it will be a huge win, bad in
that if we cannot, we are screwed. So let us look at the offending code.

1234

# word should be only alphanum chars at this pointdefself.valid_word?(word)not(Ankusa::STOPWORDS.include?(word)||word.length<3||word.numeric?)end

I have removed all the words from STOPWORDS but let me tell you it
had 544 entries. So what we have here is a 544 entry Array that is
searched for every.. SINGLE… WORD! Remember what we said about
searching arrays? O*N average complexity, as the size of the array
increases so does the time it takes to search. We can show this using
a micro-benchmark.

You can see how the time take to run .include? increases with the
size of the array.

So what are we to doooooo!? Well, when we have an array for the
sole purpose of calling include? on it we do not care about
duplicate values. Therefore we can use another, underused Ruby
Data-structure.

Tadaaaa! Set
to the rescue! Sets are similar to Arrays with a few key differences.

But the big one, the BIG one, is that, unlike Arrays, Sets use the
same “Hash Table” data-structure as ruby Hashes to store their data.
What does this mean? Well another visit to
Big-O Notation Cheetsheet tells us that
Hash Tables are much MUCH better for searching with an average complexity
of O*1 and a worst-case of O*N.

This means that, even as the size increases the search-time remains
relatively constant. Let us test this again with our micro-benchmark.

So the next longest action is actually a method on String which is checking
if the string is numeric or not and it there is some rescue action going on
in there which means it is taking up the top two spots. Now String is a
ruby core class and it does not have a numeric? method by default so
this looks like something Ankusa has monkey-patched in.

A quick look through the source-code and we see this is the case (in the
appropriately named extensions.rb)

So what is wrong with this method? Well, nothing is wrong with it per se,
it is one of the standard ways to see if a String is numeric or not.

The problem with it is that it is slow, it is even slower if the string
is not numeric because then it raises an exception which has to be rescued
(sloooooooooooow). This is one of the reasons you see people saying.
“Don’t use exceptions for flow-control!”.

The problem is that we cannot really change this because every other way
of checking if a string is numeric or not has edge-cases where they fail.
This is the only bullet-proof way of making sure if a String is numeric
or not.

But do we need bullet-proofness? Lets have a nose around and see if there
is any other option.

If we go up the call chain a bit we can see that each word is processesed
in add_text:

1234567891011

defadd_text(text)iftext.instance_of?Arraytext.each{|t|add_textt}else# replace dashes with spaces, then get rid of non-word/non-space characters,# then split by space to get wordswords=TextHash.atomizetextwords.each{|word|add_word(word)ifTextHash.valid_word?(word)}endselfend

To create a “word” it first atomises any text passed to it. The comment
looks very interesting “Replace dashes with spaces”… well that would
remove negative numbers for starters. Lets have a look at the atomize
method.

Hmm, this looks interesting, this code basically strips all dashes and
replaces anything that is not a word or whitespace character with a space.
Lets assume our regexp knowledge is fuzzy and we are not sure what a
“word” (\w) is, we can fire up IRB and do some testing:

1234567891011121314151617181920

$irb2.1.2:001># Include the monkey-patch required by atomise2.1.2:002>classString2.1.2:003?>defto_ascii2.1.2:004?>encode("UTF-8",:invalid=>:replace,:undef=>:replace,:replace=>"").force_encoding('UTF-8')rescue""2.1.2:005?>end2.1.2:006?>end=>:to_ascii2.1.2:007>2.1.2:008>defatomize(text)2.1.2:009?>text.downcase.to_ascii.tr('-',' ').gsub(/[^\w\s]/," ").split2.1.2:010?>end=>:atomize2.1.2:011>2.1.2:012>atomize("1234")=>["1234"]2.1.2:013>atomize("-1234")=>["1234"]2.1.2:014>atomize("-1234.56")=>["1234","56"]

Well, with some experimentation it looks like any kind of number will
always be split into a bunch of integers. This means that we don’t
really need the edge-case surety of Float(string). Lets see how much
faster a simple regex is.

Eagle-eye mathematicians might notice that we DO have an edge-case
that we are no longer covering in that numbers like 1.05e16 will
end up as ["1","05e16"]. However by the time we check if a String
is numeric this number has already been mashed up and checking for
[\d.]+(?:e?\d+) could result in us ignoring words that we would
prefer to check. All in all I think it is safer to not ignore a
string like “1e05”.

Others may cringe at having such a method as a monkey-patch on
String but do not worry, in the real PR I also moved it out of there
as evidenced here

Great Success

We are now close enough to 1000 jobs per second that I am going to call time on this
post. There are other optimisations we could probably do but we have already
done the easy stuff as evidenced by stackprof once again.

As we can see from this (full) output there is no horrendously slow
bottleneck that we can fix for a big win.

(A cofession here: I got the “Need for Speed” bug at this point and did
some more tweaking that got us to about 970-980 jobs per second, you
can see the full list of changes here

Where Next?

So we had a challenge and I think we met it, We didn’t do much in the
way of long-run tests and our setup might have differed from theirs but
I think I showed that ruby can have respectable results. This means that
Ruby is perfect, right?.

Well no, while I do believe the original poster was wrong to blame
ruby for his application’s slowness there are a few issues here.

First is that using ankusa in this manner is massively CPU bound we are
stuck here using a single thread. This operation would benefit hugely from
effective multi-threading but Ruby’s GIL prevents us from doing so since we are
not doing much in the way of I/O.

JRuby to the rescue! I did actually try testing on JRuby. Ankusa
actually uses a C Extension for the word-stemming and there is a JRuby
drop-in equivalent but when I ran the tests on JRuby it was
horrendously slow (Something like 40 times slower) and at that point
I was not really up for trying to figure out why.

There is always Rubinius, I have never used it to be honest, but it
does sound ideal for this case, maybe I will write a part 2 (RBX Redux!).

I Learned Something Today

What I hoped I demonstrated was that improving performance is not
the black-magic beginners might think it is. There are tools that make
it dead simple to do so I highly recommend you give it a go.

]]>2013-03-04T12:40:00+09:00http://rurounijones.github.com/blog/2013/03/04/resources-for-ruby-developersAs someone (Probably a really famous guy whose name I cannot remember) once said
becoming a software developer means having to learn new things for the rest of
your life.

This is just as true for Ruby as for any other language so here are the
resources I like to use to keep up to date.

These are besides things like API sites and the ruby on rails guides etc.

Made by Ryan Biggs, Railscasts is a great site with webcasts exploring new and
interesting things happening in the ruby world. As the name suggests it is more
focused on Web Development and Ruby on Rails but it also explores technology
that can be useful to other types of ruby developers.

There are free and paid-for episodes available (for US$9 per month) and every
episode has source code avilable on github. Most episodes also have full
transcripts, including code, which can be read if you are not a video-watching
type.

If you do not have money to travel the world attending conferences and your
company does not stump up the cash then Confreaks is a gem of a site. They
record presentations at a lot of major development conferences.

Confreaks is very popular with ruby conferences so you can find a wealth of
interesting talks there. (Most of the talks are about 40-60 minutes long
which is perferct for filling up a lunch break while eating or watching on the
daily commute)

Hacker News is a (mostly) news aggregation site that has reasonable high
standards of submissions and comments. Like all aggregation sites it has plenty
of stories that do not interest me directly but there is a lot of good technical
news / startup news there.

Requires a bit more mental filtering but still worth it. Just be aware that it
is quite start-up focused which can cause a bit of an echo chamber effect. A
healthy dose of cynicism (realism?) is useful.

This is a handy little site which lets you write and test regular expressions
as you write them. It is my go-to site when I need to use regexes (which is not
that common, hence why it is such a useful site)

Other things

If you have read this far then I thank you for your attention and would like to
use it to remind you of one thing.

Everything above is optional. What you should be doing anyway is being signed up
to the security mailing lists of all the major components in your stack.

For example if you you Ruby on Rails with PostgreSQL and Redis datastores and a
Varnish cache running on CentOS then you should be signed up to the security
mailing lists of all of these.

A Final Warning / Bit of Encouragement

I have listed a reasonable number of resources above. This list is no-where near
exhaustive and you should be building up your own list of go-to resources. As
well as this you need to beware of something I call “Learner’s Paralysis”.

Following all of the above sites (plus ones you find yourself) can take up a
significant chunk of your time if you let it.

Do not let it take up so much that you end up reading / learning a lot more than
doing. This is a problem I suffer from, I find learning about these things so
interesting (at a superficial level) that I don’t actually get round to doing
anything. (Work on side-projects, think about that bootstrapped business
I want to to start etc.)

Get out there and put the stuff to use rather than just thinking “Hey, I learned
something” and leaving it at that.

]]>2013-01-28T22:00:00+09:00http://rurounijones.github.com/blog/2013/01/28/setting-up-japanese-input-on-kubuntu-12-dot-04Historically setting up Japanese input on Linux has been a trial-and-error
affair with much forum searching, dark config file incantations and other
black magic.

Luckily with Kubuntu 12.04 getting setup with Japanese input has become a very
simple process

I am going to make a few assumptions before we get going. I am going to assume
that:

You have installed Kubuntu 12.04 or later

You installed the English version.

You have not yet tried to setup Japanese text input yet.

The reason for 1. is simply that I have not tested this on any earlier versions.
Feel free to give it a go and report back, but no guarantees.

The reason for 2. is again, because I have not tested it with other languages.
Again feel free to give it a go and report back, but no guarantees.

The reason for 3. is because, historically, attempting ot get Japanese input
working with one method would intefere with other attempts using different
methods. If you have already tried to setup Japanese input on your install then
by all means try the below method, but if it doesn’t work I am afraid that I
cannot help you. In this case the best (albeit annoying) option is to reinstall.

Right, with these caveats out of the way lets proceed.

1

sudo apt-get install ibus-mozc gnome-icon-theme

Run this command from the konsole. It will install the ibus-mozc software,
this has worked for me without issues. I used to use some different software but
this seems to be the way to go for 12.04 and up.

gnome-icon-theme is required by ibus-mozc (yes, even on KDE) but due to a
missing dependency it is not installed so we need to specify it here.

After running the above command there is a good chance that you will need to
restart, if prompted then please do so.

After rebooting go go to your system-settings and choose the locale option
(It is at the top under the “Common Appearance and Behavior” section).

In the left-hand list select “System Languages” and then in the right hand
window choose the “Set System Language” option.

In my case I choose “English (United Kingdom)” (Rule Britannia!) and then in
the bottom right of the window set the “Keyboard input method” to “ibus”

Clicking apply will require you to enter your root password.

Now we need to start ibus. One option is to reboot but the quicker way is to
bring up the app launcher by pressing ALT+F2 then typing in “ibus”. Doing so
will bring up the “IBus Input Method Framework” option, please click this.

Clicking the above will result in a little keyboard icon coming into life in
your taskbar bottom right of the screen.

Right click on this icon and select “Preferences”. On the window that appears
select the “Input Method” tab.

Check the “Customize active input methods” checkbox. Click the “Select an input
method”, click the little arrow next to the greyed out “Japanese” text then
click on the very orange icon with the “Mozc” text. Once that is done click
“Add”.

The orange icon with “Japanese - Mozc” should appear in the input method
list. At this point you can close the window.

You should now be able to input Japanese! Open a text editor like Kate or a
browser. Select an area where you can enter text then use CTRL+Space, the
keyboard icon bottom right should switch to the orange icon and you can enter
Japanese. よし！

]]>2009-10-24T18:00:00+09:00http://rurounijones.github.com/blog/2009/10/24/textile-filtering-with-redclothI wanted to use RedCloth to let users of my rails app make textile formatted
posts but I wanted to restrict the input they were allowed to use. How to do
this? I thought it would be simple.

Warning: This article was imported from an old site and is therefore itself
rather old. It may not still be accurate for current versions of RedCloth.

Textile has a +filter_html+ option which I thought would do the trick but that
only filters what HTML RedCloth allows users to enter. It doesn’t filter the
HTML created by Redcloth itself when a user uses textile tags.

So how to filter the textile tags?

First, assuming you are using Rails 2.3 or later create the following file.
For other frameworks please use the recommended method for adding start-up
code to that framework.

1

config/initializers/redcloth_filter.rb

This file will be run during the rails initialization and will contain the code
we want to override (monkey-patch). Now paste the following code into the file.

ALLOWED_TAGS is a hash of tags that you want to allow. You can take the
BASIC_TAGS
to use as a base and strip tags you don’t want to allow from the hash and add
other ones if you want to.

So we have defined the tags that we want to allow. Now we need to actually do
some stripping. This is where the after_transform method comes in. This is
called by RedCloth as standard after initial modification. So what we can do is
override the method and tell RedCloth to clean_html again with the HTML string
it has just created. To give you a list of steps.

RedCloth is configured with +filter_html+ enabled

User enters string (Textile and HTML)

RedCloth strips HTML tags from the string according to the BASIC_TAGS using
clean_html method

RedCloth converts the textile tags in the string to HTML

At this point the HTML’ised string is usually returned; however we do some
overriding so that:

RedCloth strips HTML tags from the above generated HTML string according to
our ALLOWED_TAGS using the clean_html method

RedCloth returns the twice filtered HTML string.

Thinking about it you don’t even need +filter_html+ since it will all be
filtered the second time around explicitly by our code. However I feel a little
more secure by stripping all the user generated HTML cruft first using
+filter_html+ before stripping our textile generated HTML ourselves.

Enjoy

]]>2009-03-17T01:24:00+09:00http://rurounijones.github.com/blog/2009/03/17/how-to-ask-for-help-on-ircGreetings. If you have arrived at this page via your own volition and intent
then be aware that this is an abridged, slightly more modern, IRC specific
version of Eric S. Raymond’s “How to ask questions the smart way”
which is excellent, but lengthy, reading.

If, on the other hand, you have been directed to this page by someone else and
quickly want to find out what this is about, Read on.

How to ask for help properly

In short:

Don’t ask to ask

Give the helpers some details

Ask suitable questions

Be polite

Wait

For the purposes of this article I will introduce “Terry” and “Gonad”
(Of “Zero punctuation” fame).
Terry shall be our long suffering helper in the “Spiffy” project’s IRC channel
and Gonad shall be the person asking for help (badly).

1 - Don’t ask to ask

This one drives people up the wall and many channels have special bots that will
print out an entire spiel about not asking to ask, so let’s get it out of the
way first.

1234

[16:00] * Gonad has joined #spiffy-help
[16:00] * Topic is "Welcome to the spiffy help channel. Latest version is 1.5"
[16:00] <Gonad> Is it ok to ask a question about Spiffy?
[16:01] * Terry sobs quietly in the corner

Let us look at the obvious first. The entire channel is dedicated towards
helping people with Spiffy, this is usually hinted at in the channel name and
outright does in the topic. Do you think they have a personal grudge against
you that is going to stop them answering? (they might possibly later, but right
now it is a blank slate) of course they can help you. Just ask the question
straight out.

What Gonad should have done was:

123

[16:00] * Gonad has joined #spiffy-help
[16:00] * Topic is "Welcome to the spiffy help channel. Latest version is 1.5"
[16:00] <Gonad> Hi all, I am having an issue with ....

Which leads us nicely onto the next point

2 - Give the helpers some details.

Let us follow on from the previous point and continue Gonad’s sentence.

1234567

[16:00] * Gonad has joined #spiffy-help
[16:00] * Topic is "Welcome to the spiffy help channel. Latest version is 1.5"
[16:00] <Gonad> Hi all, I am getting an error page with Spiffy, what is going
wrong?
[16:03] <Terry> Wait a moment, I just need to psychically connect with your
brain so I can figure out what you are talking about. This may
cause loss of control over your bodily functions.

Moral of the story, “Helpers are not psychic”. When you post a problem we need
something to go on. Preferably some or all of the following:

The error page (Or output thereof).

A list of steps taken which generated this error page and what you are
expecting it to do instead of the error.

The version of the software you are running along with any plugins you might
have installed. If you think it is pertinent then the OS version and Database
if used.

A note about including error pages or output. Because these things are
traditionally huge, pasting them directly into the channel will make you as
well liked as a clown at a funeral. Instead use a “paste” website like
“Pastie”, “Pastebin” or
“Gist” and then paste the URL into the channel.
These sites also offer syntax highlighting which is very useful.

If the problem is complicated, or the steps very detailed, then consider posting
a summary in IRC and more detail in your linked paste.

So once more let’s look at what Gonad should have done.

1234567

[16:00] * Gonad has joined #spiffy-help
[16:00] * Topic is "Welcome to the spiffy help channel. Latest version is 1.5"
[16:00] <Gonad> Hi all, I am getting an error page with a new Spiffy 1.5
installation when I try and create a second admin user.
Steps and Details here: http://pastie.org/417957
[16:02] <Terry> Gasp, what a well formatted cry for help, I rush, RUSH to your
aid.

And they all lived happily ever after.

3 - Ask suitable questions.

I hang around in some programming channels and every now and then we get a question illustrated by
the following:

1234567

[16:00] * Gonad has joined #programming
[16:00] * Topic is "Welcome to the New Hotness #programming channel"
[16:02] <Gonad> Hi guys, how do I build a search engine / forum / fission reactor
cooling loop fluid dynamics modelling application.
[16:07] <Terry> By simply asking that question you have betrayed your inherent
cluelessness and I shall now ignore you until the heat death
of the universe.

Let us take a real-world equivalent. Your town has a local club of burly men,
covered in engine oil, who like messing around with car engines: tuning, fixing
and the like and they meet every night. Now imagine a someone walks into
their workshop and asks “Hey guys, I want to design and build an engine, can you
quickly tell me how to do that? I have my notepad and everything”

Be thankful that at least online you have a certain amount of anonymity and
are not vulnerable to immediate physical retribution.

We are a help channel, this means we usually answer technical questions.
We sometimes answer non-technical questions but don’t ask anything that will
require an entire business/technical specification or 4 year education course
to answer

Further reading on this subject can be read at the
“Help Vampire” site (Which is very funny
and better illustrated that this site). Make sure you don’t end up as one.

4 - Be polite

Common sense dictates that when you put yourself in a position where you are
relying on the kindness of strangers if behoves you to be polite to those
strangers.

Remember how much you paid for the support contract on the software? Exactly.
99.99% of the helpers are not being paid, they are volunteers and generally nice
people doing this because they like the warm fuzzy feeling they get from helping.
Acting like a prat really discourages them from continuing this noble endeavour.

Examples of bad behaviour:

Being arrogant.

Discarding/ignoring answers because they aren’t quick fixes or require
using your brain a bit

Not thinking for yourself and expecting to be spoon fed. (Not Googling your
problem before asking is a prime example).

5 - Wait

This is the internet, land of many timezones. Therefore questions might not be
answered immediately. Let us look at Gonad again.

Sometimes answers can take hours/days in a relatively quiet channel, state your
question and wait. Most people will type your nick when replying so make sure
your IRC client is set to alert you when your nick is typed.

Conclusion

If you have read this far and actually read everything then congratulations.
You should now know enough to not be a prat and get ignored whenever you ask a
question in IRC. However this is only the first step; I still recommend reading
Eric S. Raymond’s
“How to ask questions the smart way”
for a more thorough explanation of everything… go on, don’t be adequate, go
and be clued up.

Real Life Examples.

Fun competition time. Can you sees what rules are being broken with the
following real life examples? (Names changed to protect the… innocent.)

#kubuntu-offtopic on irc.freenode.net
[03:55] > Gonad has joined this channel (***).
[03:55] <Gonad> hello... pls how can i download and extract music from youtube?
[03:58] > Gonad has left this channel.
[04:02] <Terry> Hm. An ask-and-run