This is not a new discovery or recently introduced vulnerability, but it’s one of the lesser discussed issues in bash scripting. ksh behaves the same way, if you account for the lack of read -p.

The shell evaluates values in an arithmetic context in several syntax constructs where the shell expects an integer. This includes: $((here)), ((here)), ${var:here:here}, ${var[here]}, var[here]=.. and on either side of any [[ numerical comparator like -eq, -gt, -le and friends.

In this context, you can use or pass literal strings like "1+1" and they will evaluated as math expressions: var="1+1"; echo $((var)) outputs "2".

However, as demonstrated, it’s not just a bc style arithmetic expression evaluator, it’s more closely integrated with the shell: a="b"; b="c+1"; c=41; echo $((a)) will show 42.

And any value used as a scalar array subscript in an arithmetic context will be evaluated and interpreted as an integer — including command substitutions.

This issue is generally trivialized or ignored. It’s not pointed out in code reviews or example code, even by those who are well aware of this issue. It would be pretty tedious to use ${var//[!0-9-]/} for every comparison, and it’s often not relevant anyways.

This isn’t necessarily unfair, because arbitrary code execution is only a security issue if it also provides escalation of privileges, like when you’re doing math using values from untrusted files, or with user input from CGI scripts.

It also frequently allows arbitrary code execution, but any use of it — safe or unsafe, on either side of said hatchway — is likely to elicit a number of comments ranging from cautionary to condescending.

"eval is evil", people say, even though no one ever said that "-eq is evil" or "$((foo+1)) considered harmful".

Why such a double standard?

I would argue that eval is considered bad not (just) because it’s unsafe, but because it’s lowbrow. Safety is just the easiest way to argue that people should use better constructs.

eval is considered lowbrow because it’s primary use case is letting newbies do higher level operations without really knowing how the shell works. You don’t have to know the difference between literal and syntactic quotes, or how to work with the order of expansion, as long as you can figure out roughly what you want to run and generate such commands.

Here’s an example of a script where you can run ./myscript 1 or ./myscript 2 to get "Foo" or "Bar" respectively. You’ve probably seen (and at some point tried) the variable-with-numeric-suffix newbie pattern:

#!/bin/bash
var1="Foo"
var2="Bar"
eval echo "\$var$1"

This example would get a pile of comments about how this is unsafe, how ./myscript '1; rm -rf /' will give you a bad time, so think of the children and use arrays instead:

#!/bin/bash
var[1]="Foo"
var[2]="Bar"
echo "${var[$1]}"

As discussed, this isn’t fundamentally more secure: ./myscript '1+a[$(rm -rf /)]' will still give you a bad time. However, since it now uses the approprate shell features in a proper way, most people will give it a pass.

Of course, this is NOT to say that anyone should feel free to use eval more liberally. There are better, easier, more readable and more correct ways to do basically anything you would ever want to do with it. If that’s not reason enough, the security argument is still valid.

It’s just worth being aware that there’s a double standard.

I have no particular goal or conclusion in mind, so what do you think? Should we complain about eval less and arithmetic contexts more? Should we ignore the issue as a lie-to-children until eval makes it clear a user is ready for The Talk about code injection? Should we all just switch to the fish shell?

Most distros have a step like this. If you don’t immediately recognize it, you might have used a different installer with different wording. For example, the graphical Ubuntu installer calls it “Encrypt the new Ubuntu installation for security”, while the text installer even more opaquely calls it “Use entire disk and set up encrypted LVM”.

Somehow, some people have gotten it into their heads that not granting the new owner access to all your data after they steal your computer is a sign of paranoia. It’s 2015, and there actually exists people who have information based jobs and spend half their lives online, who not only think disk encryption is unnecessary but that it’s a sign you’re doing something illegal.

I have no idea what kind of poorly written crime dramas or irrational prime ministers they get this ridiculous notion from.

The last time my laptop was stolen from a locked office building.

Here’s a photo from 2012, when my company laptop was taken from a locked office with an alarm system.

Was this the inevitable FBI raid I was expecting and encrypted my drive to thwart?

Or was it a junkie stealing an office computer from a company and user who, thanks to encryption, didn’t have to worry about online accounts, design documents, or the source code for their unreleased product?

Hundreds of thousands of computers are lost or stolen every year. I’m not paranoid for using disk encryption, you’re just foolish if you don’t.

If you have a new installation, you’re probably using SHA512-based passwords instead of the older MD5-based passwords described in detail in the previous post, which I’ll assume you’ve read. sha512-crypt is very similar to md5-crypt, but with some interesting differences.

Since the implementation of sha512 is really less interesting than the comparison with md5-crypt, I’ll describe it by striking out the relevant parts of the md5-crypt description and writing in what sha512-crypt does instead.

Like md5-crypt, it can be divided into three phases. Initialization, loop, and finalization.

Generate a simple md5sha512 hash based on the salt and password

Loop 10005000 times, calculating a new sha512 hash based on the previous hash concatenated with alternatingly the hash of the password and the salt. Additionally, sha512-crypt allows you to specify a custom number of rounds, from 1000 to 999999999

Use a special base64 encoding on the final hash to create the password hash string

The main differences are the higher number of rounds, which can be user selected for better (or worse) security, the use of the hashed password and salt in each round, rather than the unhashed ones, and a few tweaks of the initialization step.

Step 3.5 — which was very strange in md5-crypt — now makes a little more sense. We also calculated the S bytes and P bytes, which from here on will be used just like salt and password was in md5-crypt.

From this point on, the calculations will only involve the passwordP bytes, saltS bytes, and the Intermediate0 sum. Now we loop 5000 times (by default), to stretch the algorithm.

For i = 0 to 4999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:

If i is even, Intermediatei

If i is odd, passwordP bytes

If i is not divisible by 3, saltS bytes

If i is not divisible by 7, passwordP bytes

If i is even, passwordP bytes

If i is odd, Intermediatei

At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate5000. Let’s call this the Final sum. Since sha512 is 512bit, this is 64 bytes long.

The bytes will be rearranged, and then encoded as 86 ascii characters using the same base64 encoding as md5-crypt.

What’s the difference in generating these hashes? Why are they different at all?

Just running md5sum on a password and storing that is just marginally more secure than storing the plaintext password.

Thanks to GPGPUs, a modern gaming rig can easily try 5 billion such passwords per second, or go over the entire 8-character alphanumeric space in a day. With rainbow tables, a beautiful time–space tradeoff, you can do pretty much the same in 15 minutes.

As an aside, these techniques were used in the original crypt from 1979, so there’s really no excuse to do naive password hashing anymore. However, at that time the salt was 12 bits and the number of rounds 25 — quite adorable in comparison with today’s absolute minimum of 64 bits and 1000 rounds.

The original crypt was DES based, but used a modified algorithm to prevent people from using existing DES cracking hardware. MD5-crypt doesn’t do any such tricks, and can be implemented in terms of any MD5 library, or even the md5sum util.

As regular reads might suspect, I’ve written a shell script to demonstrate this: md5crypt. There are a lot of workarounds for Bash’s inability to handle NUL bytes in strings. It takes 10 seconds to generate a hash, and is generally awful..ly funny!

Let’s first disect a crypt hash. man 3 crypt has some details.

If salt is a character string starting with the characters
"$id$" followed by a string terminated by "$":
$id$salt$encrypted
then instead of using the DES machine, id identifies the
encryption method used and this then determines how the
rest of the password string is interpreted. The following
values of id are supported:
ID | Method
-------------------------------------------------
1 | MD5
2a | Blowfish (on some Linux distributions)
5 | SHA-256 (since glibc 2.7)
6 | SHA-512 (since glibc 2.7)

Simple and easy. Split by $, and then your fields are Algorithm, Salt and Hash.

md5-crypt is a function that takes a plaintext password and a salt, and generate such a hash.

To set a password, you’d generate a random salt, input the user’s password, and write the hash to /etc/shadow. To check a password, you’d read the hash from /etc/shadow, extract the salt, run the algorithm on this salt and the candidate password, and then see if the resulting hash matches what you have.

md5-crypt can be divided into three phases. Initialization, loop, and finalization. Here’s a very high level description of what we’ll go through in detail:

Generate a simple md5 hash based on the salt and password

Loop 1000 times, calculating a new md5 hash based on the previous hash concatenated with alternatingly the password and the salt.

Use a special base64 encoding on the final hash to create the password hash string

Put like this, it relatively elegant. However, there are a lot of details that turn this from elegant to eyerolling.

Here’s the real initialization.

Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 chars), and “magic” the string “$1$”

Start by computing the Alternate sum, md5(password + salt + password)

Compute the Intermediate0 sum by hashing the concatenation of the following strings:

Password

Magic

Salt

length(password) bytes of the Alternate sum, repeated as necessary

For each bit in length(password), from low to high and stopping after the most significant set bit

If the bit is set, append a NUL byte

If it’s unset, append the first byte of the password

I know what you’re thinking, and yes, it’s very arbitrary. The latter part was most likely a bug in the original implementation, carried along as UNIX issues often are. Remember to stay tuned next week, when we’ll compare this to SHA512-crypt as used on new installations!

From this point on, the calculations will only involve the password, salt, and Intermediate0 sum. Now we loop 1000 times, to stretch the algorithm.

For i = 0 to 999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:

If i is even, Intermediatei

If i is odd, password

If i is not divisible by 3, salt

If i is not divisible by 7, password

If i is even, password

If i is odd, Intermediatei

At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate1000. Let’s call this the Final sum. Since MD5 is 128bit, this is 16 bytes long.

The bytes will be rearranged, and then encoded as 22 ascii characters with a special base64-type encoding. This is not the same as regular base64:

For each group of 6 bits (there are 22 groups), starting with the least significant

Output the corresponding base64 character with this index

Congratulations, you now have a compatible md5-crypt hash!

As you can see, it’s quite far removed from a naive md5(password) attempt.

Fortunately, one will only ever need this algorithm for compatibility. New applications can use the standard PBKDF2 algorithm, implemented by most cryptography libraries, which does the same thing only in a standardized and parameterized way.

As if this wasn’t bad enough, the next post next week will be more of the same, but with SHA512-crypt!

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Why can't bash scripts be SUID?

Bash scripts can’t run with the suid bit set. First of all, Linux doesn’t allow any scripts to be setuid, though some other OS do. Second, bash will detect being run as setuid, and immediately drop the privileges.

This is because shell script security is extremely dependent on the environment, much more so than regular C apps.

Now the script will run our grep instead of the system grep, and we have full root access.

Let’s assume the author was aware of that, had set PATH=/bin:/usr/bin as the first line in the script. What can we do now?

We can override a library used by grep

gcc -shared -o libc.so.6 myEvilLib.c
LD_LIBRARY_PATH=. addmaildomain

When grep is invoked, it’ll link with our library and run our evil code.

Ok, so let’s say LD_LIBRARY_PATH is closed up.

If the shell is statically linked, we can set LD_TRACE_LOADED_OBJECTS=true. This will cause dynamically linked executables to print out a list of library dependencies and return true. This would cause our grep to always return true, subverting the test. The rest is builtin and wouldn’t be affected.

Even if the shell is statically compiled, all variables starting with LD_* will typically be stripped by the kernel for suid executables anyways.

There is a delay between the kernel starting the interpretter, and the interpretter opening the file. We can try to race it:

But let’s assume the OS uses a /dev/fd/* mechanism for passing a fd, instead of passing the file name.

We can rename the script to confuse the interpretter:

ln /usr/bin/addmaildomain ./-i
PATH=.
-i

Now we’ve created a link, which retains suid, and named it “-i”. When running it, the interpretter will run as “/bin/sh -i”, giving us an interactive shell.

So let’s assume we actually had “#!/bin/sh –” to prevent the script from being interpretted as an option.

If we don’t know how to use the command, it helpfully lists info from the man page for us. We can compile a C app that executes the script with “$0” containing “-P /hax/evil ls”, and then man will execute our evil program instead of cat.

So let’s say “$0” is quoted. We can still set MANOPT=-H/hax/evil.

Several of these attacks were based on the inclusion of ‘man’. Is this a particularly vulnerable command?

Perhaps a bit, but a whole lot of apps can be affected by the environment in more and less dramatic ways.

POSIXLY_CORRECT can make some apps fail or change their output

LANG/LC_ALL can thwart interpretation of output

LC_CTYPE and LC_COLLATE can modify string comparisons

Some apps rely on HOME and USER

Various runtimes have their own paths, like RUBY_PATH, LUA_PATH and PYTHONPATH

Many utilities have variables for adding default options, like RUBYOPT and MANOPT

Tools invoke EDITOR, VISUAL and PAGER under various circumstances

So yes, it’s better not to write suid shell scripts. Sudo is better than you at running commands safely.

Do remember that a script can invoke itself with sudo, if need be, for a simulated suid feel.

So wait, can’t perl scripts be suid?

They can indeed, but there the interpretter will run as the normal user and detect that the file is suid. It will then run a suid interpretter to deal with it.

You probably have your own closely guarded ssh key pair. Chances are good that it’s based on RSA, the default choice in ssh-keygen.

RSA is a very simple and quite brilliant algorithm, and this article will show what a SSH RSA key pair contains, and how you can use those values to play around with and encrypt values using nothing but a calculator.

RSA is based on primes, and the difficulty of factoring large numbers. This post is not meant as an intro to RSA, but here’s a quick reminder. I’ll use mostly the same symbols as Wikipedia: you generate two large primes, p and q. Let φ = (p-1)(q-1). Pick a number e coprime to φ, and let d ≡ e^-1 mod φ.

The public key is then (e, n), while your private key is (d, n). To encrypt a number/message m, let the ciphertext c ≡ m^e mod n. Then m ≡ c^d mod n.

This is very simple modular arithmetic, but when you generate a key pair with ssh-keygen, you instead get a set of opaque and scary looking files, id_rsa and id_rsa.pub. Here’s a bit from the private key id_rsa (no passphrase):

Here, modulus is n, publicExponent is e, privateExponent is d, prime1 is p, prime2 is q, exponent1 is dP from the Wikipedia article, exponent2 is dQ and coefficient is qInv.

Only the first three are strictly required to perform encryption and decryption. The latter three are for optimization and the primes are for verification.

It’s interesting to note that even though the private key from RSA’s point of view is (d,n), the OpenSSH private key file includes e, p, q and the rest as well. This is how it can generate public keys given the private ones. Otherwise, finding e given (d,n) is just as hard as finding d given (e,n), except e is conventionally chosen to be small and easy to guess for efficiency purposes.

If we have one of these hex strings on one line, without colons, and in uppercase, then bc can work on them and optionally convert to decimal.

This is a very simple file format, but I don’t know of any tools that will decode it. Simply base64-decode the middle string, and then read 4 bytes of length, followed by that many bytes of data. Repeat three times. You will then have key type, e and n, respectively.

Here’s a bash script that will decode a private key and output variable definitions and functions for bc, so that you can play around with it without having to do the copy-paste work yourself. It decodes ASN.1, and only requires OpenSSL if the key has a passphrase.

When run, and its output pasted into bc, you will have the variables n, e, d, p, q and a few more, functions encrypt(m) and decrypt(c), plus a verify() that will return 1 if the key is valid. These functions are very simple and transparent.