Search

Tackling L33t-Speak

My daughter and I were bantering with each other via text message this morning as we often
do, and I dropped into a sort of mock "leet speak". She wasn't
impressed, but it got me thinking about formulaic substitutions in language
and how they represent interesting programming challenges.

If you're not familiar with "leet speak" it's a variation on
English that some youthful hackers like to use—something that obscures words
sufficiently to leave everyone else confused but that still allows reasonably
coherent communication. Take the word "elite", drop the leading
"e" and change the spelling to "leet". Now replace the
vowels with digits that look kind of, sort of the same: l33t.

There's a sort of sophomoric joy in speaking—or writing—l33t. I
suppose it's similar to pig latin, the rhyming slang of East Londoners or
the reverse-sentence structure of Australian shopkeepers. The intent's
the same: it's us versus them and a way to share with those in the know
without everyone else understanding what
you're saying.

At their heart, however, many of these things are just substitution ciphers. For
example, "apples and pears" replaces "stairs", and "baked
bean" replaces "queen", in Cockney rhyming slang.

It turns out that l33t speak is even more formalized, and there's actually a
Wikipedia page that outlines most of its rules and structure. I'm just going
to start with word variations and letter substitutions here.

The Rules of L33t Speak

Okay, I got ahead of myself. There aren't "rules", because at its
base, leet speak is a casual slang, so l33t and 733T are both valid
variations of "elite". Still, there are a lot of typical
substitutions, like dropping an initial vowel, replacing vowels with
numerical digits or symbols (think "@" for "a"), replacing a
trailing "s" with a "z", "cks" with "x" (so
"sucks" becomes "sux"), and the suffixed "ed" becomes
either 'd or just the letter "d".

All of this very much lends itself to a shell script, right? So let's
test some mad skillz!

For simplicity, let's parse command-line arguments for the l33t.sh
script and use some level of randomness to ensure that it's not too
normalized. How do you do that in a shell script? With the variable
$RANDOM. In
modern shells, each time you reference that variable, you'll get a
different value somewhere in the range of 1..MAXINT. Want to "flip a
coin"? Use $(($RANDOM % 2)), which will return a zero or 1 in reasonably
random order.

So the fast and easy way to go through these substitutions is to use
sed—that old mainstay of Linux and UNIX before it, the stream editor.
Mostly I'm using sed here, because it's really easy to use
substitute/pattern/newpattern/—kind of like this:

word="$(echo $word | sed "s/ed$/d/")"

This will replace the sequence "ed" with just a "d", but
only when it's the last two letters of the word. You wouldn't want to
change education to ducation, after all.

In order, a trailing "s" becomes a trailing "z"; "cks"
anywhere in a word becomes an "x", as does "cke"; all
instances of "a" are translated into "@"; all instances of "e" change to
"3"; and all instances of "o" become "0". Finally, the script cleans up any words
that might start with an "a". Finally, all lowercase letters are
converted to uppercase, because, well, it looks cool.

How does it work? Here's how this first script translates the sentence
"I am a master hacker with great skills":

That's a good start, but there's more you can do, something I'll pick up in
my next article. Meanwhile, if you consider yourself a l33t expert, hit me
up, let's talk about some additional letter, letter combination and word
rules.

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a
really long time. He's the author of Learning Unix for Mac OS
X and Wicked Cool Shell Scripts. You can find him on Twitter
as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.