31 October 2007, Wednesday

Much has been said about this brainfart of a feature, and attempts at reverting its behaviour are common for all php coders. Old versions of the php manual were giving this function in their “Best Practice” example:

… which has apparently became a bad meme in more than one way. I have already mentioned (as many others did as well way before me) the blunder of using is_numeric(), but a more alarming mistake is the way magic quotes are handled. It isn’t bad just because it’s buggy, but because it gives a seriously flawed idea.

It is also bad, as it is (was) given as an example to newbies, and even after it was replaced in the manual, one can still see the meme “in the wild” in snippets posted on sites and fora. What’s worse, the manual didn’t explain what was wrong in the code before it was replaced; as I see it, maybe they still have no idea what wrongness they have been teaching.

So, after you’ve had a paragraph of reading time to think about it, do you see it? The bug lies in the assumption that the data that is given to quote_smart() comes from $_GET, $_POST, $_COOKIES, $_REQUEST, etc. If you pass something else (a constant string, a value from a file or database, etc.) containing slashes (for a superficial example, take a smb-style path: “\\host\share\file.ext”) and you have magic_quotes on, the function will blindly run stripslashes, and thus damage the string (“\hostsharefile.ext”).

The function also doesn’t check for the magic_quotes_sybase setting, which completely changes the way magic quotes are handled.

The correct way of negating magic_quotes is of course globally, at script startup, with proper setting checks and while keeping in mind that those arrays may contain other arrays.

But enough about the bug, it is something that happens with certain inputs under certain setups, and it damages the data in a way that doesn’t really affect the security. So what’s the big deal, is it worth writing so long a rant about it? I think so, and the reason lies in that implied assumption above, about where the data comes from. What it basically says is that input comes only from one of the input superglobal arrays, which is wrong. Input comes from all kinds of sources and you never know which of them may be under the control of an attacker.

So data coming from other places, take the database for example, gets labelled as “secure” in the mind of the coder, and he readily inserts it in a dynamic SQL query and thus second-order SQL injections are born. Such are the curses of bad memes, caveat coder.