May 18, 2007

Accidental dropped keyboard command issuance probability

My wireless Macintosh keyboard, which was talking to my laptop, fell
off my desk at home, and as it went down and bumped against the drawer
knobs and I grabbed at it, several keys were accidentally pressed (and
a Shift key actually came off). In the active window on the laptop at
the time was a live SSH connection to the Linux machine on my desk up at
the UCSC campus, so everything that was accidentally typed was interpreted
as a sentence in the language of the tcsh Unix shell language on my
Linux box two miles away. And as fate would have it, the keys that were
hit as the keyboard clattered to the floor spelled out a fully grammatical
sentence of that language, meaning that an actual executable command was
issued to the operating system of my desktop machine, and was executed.
What are the chances of that?

Well, it would take some tedious but elementary work in elementary
combinatorics and Unix command file listing to figure it out exactly.
Unix/Linux commands are spelled mostly in lower case letters with
occasional upper-case letters and digits (I'll ignore one or two other
legal characters like the underbar and the @-sign), so first we need
the probability of
all the keystrokes being contained among the 62 characters {0 1 2 3 4
5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f
g h i j k l m n o p q r s t u v w x y z}. That is already low, but I
won't bother to work it out; it would be different for different keyboard
design features like function keys and numerical keypads and so on.

After that, given a combination of n characters in the right
range for some positive integer n, the probability of their spelling
a command name would be the number of available n-letter commands
divided by 62n. For example, there happen to be 55
accessible commands (that is, commands in the directories on my path)
that are spelled with 2 letters, so if just two random characters from
the correct range were typed, the probability of their forming a legal
command would be 55/(622), roughly 0.0143 — considerably
less than a 2% chance.

The chances of accidentally hitting commands with longer names get
lower and lower, of course, because there are fewer and fewer command
names as length goes up (Unix loves short, cryptic command names), and
at the same time you keep get a larger and larger divisor (6225

But then we have to take account of the fact that some commands require
additional words. The command ls means something on its own
("list the contents of the current working directory"), but rm
("remove") requires at least one filename (which can be any arbitrary
string of letters and/or digits and/or certain other printable characters),
and cp ("copy") requires at least two filenames. If you get a
name of a command that needs one extra word on the command line, you have
to get a space after the command line and then a sequence of characters;
and if you happen to get a name of a command that needs two extra words
on the command line, you have to get a space after the command line and
then two sequences of characters separated by a space.

You do the math; it can be your Breakfast Experiment™.
As it happened, what my keyboard actually told my Linux system to
do this morning was this:

bg OP+

(plus a Return on the end, which caused the actual execution
attempt). I was lucky. This is not a very dangerous command.
It means "Take the stopped job whose identifying number is OP+
and restart it running in the background." And it turned out to be
semantically incoherent: OP+ is not a valid job number, so
the result was just an error message saying that there was no such job.

It could have been an issue, though. The following string
is of exactly the same length (seven keystrokes including the space
and the final invisible Return):

rm ~/*

But that one means "Remove all plain files in the current user's
home directory." And a Unix system will do just that if you (or your
dropped keyboard) should happen to tell it to. It won't ask "Are
you sure?" or "Delete all files?"; it will just swiftly and silently
destroy the record of their former existence. And there is a finite,
though very small, probability that it might happen simply by accident.
There but for the grace of God and the low probability of random strings
turning out to be grammatical in most kinds of language...

You may recall that I remarked on
Language Log in another context that in English nearly
all strings of words are ungrammatical, in the sense that the
probability of a random string of English words being grammatical in
Standard English heads down toward zero in the limit as string length
goes up toward infinity. That's only a conjecture, but I think it's true.
(By the way, it doesn't have to be true:
Chris Barker devised, just
for fun, a programming language called jot in which all programs
are expressible and in which every string is grammatical; see this fascinating page.
But in fact jot only uses the characters 0 and 1, so there is
hardly any chance of a jot program being typed and run by accident
even if he drops his keyboard while running a jot interpreter in the
active window. Given about a 3% chance of the first keystroke being a 0
or a 1, the probability that both of the first two will be binary digits
is only 6/10,000 = 0.0009, and for the first three the probability is
only .000027, and we head downward toward zero land pretty rapidly.)