I’ve got two books sitting on my desk, PHP and MySQL for Dynamic Web Sites by Larry Ullman which is absolutely fantastic for bringing yourself up to speed if you already have rudimentary programming skills in other languages (though it could stand for an update), and the PHP Cookbook, an O’Reilly book which (so far) seems like a good reference for getting beyond the basics.

So, any guess on what I’ve been working with lately? I’ve been running this site on a pretty basic PHP foundation for a while now, and by basic I mean aside from some PHP includes, passing a couple of variables from page to page, and a slightly more complex spaghetti mess of a comment preview page, there’s not much PHP to be found here.

Relational databases, on the other hand, are an entirely new ball of wax. I’ve worked with people who use them. I’ve studied the flowcharts. I’ve formatted the data coming out of them. But I’ve never had to get my hands dirty before and actually make those connections myself. It’s been an interesting learning process as I’ve put together an application I’ve needed for a while, which really is the best way to learn. Reading a book and not applying the knowledge immediately is a good way to ensure you don’t retain it, I find. Anyway, on to the point of this article, which otherwise probably would have appeared out of left field if not for the above.

This is a particular problem I’ve run into. Up until recently, all user interaction with the database was limited to adding records. Aside from sloppy code, there were no real security risks to speak of. The new functionality I wanted to build would expose records for editing by the user, which is obviously problematic if the wrong people get access to them.

So the editing interface should be hidden behind a layer of verification. There are a few ways of going about this, a login form being the most obvious. Generating a username/password doesn’t make much sense in this context though; the ideal solution would be just a simple text link to get back into the interface and retrieve the data, which requires a unique querystring. Luckily, the data they had previously submitted provides ample doses of uniqueness, so generating one isn’t a problem. But this is also where the dilemma comes to light.

Let’s assume the unique data in this case is their name. So ‘Jane Smith’ is told to re-enter the application with a URI like so:

http://www.example.com/manager/?returnID=janesmith

Simple enough. Except guessing a different returnID value isn’t exactly difficult, so some records may potentially be exposed to the wrong people. Not good.

So it seemed the most effective way to fix this lay in using a cryptographic hash function like MD5 or SHA-1. (And yes, both have been broken by now, but collisions seem pretty much irrelevant when the hashes exist in a one-to-one relationship with only a few thousand records.) By running the name through the sha1 function:

But we’re not quite done yet. The last step is actually verifying the hash once the user returns, and this poses a choice between two equally not-very-good options. First, we can check the hash against those of each record calculated on the fly:

There’s probably better syntax for that loop, but you get the idea. Presumably, though, this won’t scale very well. I didn’t bother testing, but I could easily imagine a few thousand dynamically-calculated SHA-1 hashes bogging down the server in a hurry.

So instead, storing a pre-calculated hash value for each record seems like a good way to ease up the load. Each new record inserted gets a value in the hash column, encoded with the sha1 function of course. Verifying is now as easy as a straight up string compare:

Due to SHA-1 hashes being 40 bytes, there are minor storage space issues, but considering 10,000 records only add up to 4MB, it’s not worth obsessing over. It bothers me to have that dead weight for a somewhat minor function, but it’s the best I came up with. I’d be curious to compare notes with anyone else who’s grappled with similar problems — is SHA-1 overkill? Is there an easier way to do it that I’ve overlooked?

Maybe you’re already doing such a thing, but you only want your hash values stored in a single, user table. Just usernames, hashes and a record ID. Then you could just use the record ID as a foreign key to your data table. That way you don’t have a bunch of 40-byte hashes that would become quite redundant in your data table.

I have no clue what your DB structure is, so what I just proposed could make no sense and also be completely irrelavent to your sccenario :)

You could also just hash up the record ID so that it can’t be messed with for web form purposes, then just get the hash value (if you have a key/value pair) which would be the record ID. Then you wouldn’t have any hashes actually stored in the database.

I think that you’re overlooking the fact that anybody in the world can calculate a hash of “janesmith” & use it as a request ID. No matter how random the string looks, breaking this requires only one more step than guessing a username. You’d better concatenate it against some random data. But in that case you may as well just generate some random data to use as a request it.

“Maybe you’re already doing such a thing, but you only want your hash values stored in a single, user table.”

I probably do. I haven’t actually done it that way, but I get the nagging feeling I should.

“I think that you’re overlooking the fact that anybody in the world can calculate a hash of “janesmith” & use it as a request ID.”

Nope. That was just an example, I’m using more obscure data then that. Yes, all you need to deduce is the source data, granted — but that’s like saying to login to my Hotmail account, all you need to do is deduce my password. Any process is reversible in theory, it’s the difficulty in practice that makes it tough.

“Why don’t you take the first part of the hash? For example, the first half, that way you’ll have to store only half of the data.”

If you hash something unique together with a secret and then put the unique thing and the hash in the query string, you can verify that the URL was generated by your script by doing the hash over again, and comparing with the one the client sent in.

Adding records to a database should be limited to those you want to allow to add records. If not, someone could easily run a denial of service attack by filling up any tables with numerical, auto incrementing primary keys. And that’s just the start. If you haven’t taint-checked your input, you could be allowing attackers to run arbitrary SQL commands.

To the problem at hand: if I were you, I’d store that information in a PHP session. Create a PHP session with the username in it, and use the session instead of a custom function.

“Adding records to a database should be limited to those you want to allow to add records.”

The users don’t actually write SQL you know, they’re filling out forms and my PHP handles the rest. Or is this what you mean? Arbitrary users shouldn’t be filling out forms? Kind of hard to build an application that way.

“How do you communicate the link in the first place?”

Email. It’s a one-time only thing. If they don’t save the email, they have to re-submit the previous data. (Trust me, I want it this way.)

Its so funny when I saw this post because I have just finished Ullmans book and it was the best one I found on php. The best thing was that it was cheap…the companion site rocks too.

What led me here was I picked up your zen of css design book today…funny how the world turns… I already have my backend built and now working on the design and it seems you have gone the other way! Right on!

I’m in the midst of releasing a free blogging site featuring its own backend site management and database system. This way people can use my designs without knowing a lick of html,css,database,etc. and customize it to their liking.

Maybe I missed something, but it really sounds like you’re going about this using “security through obscurity”. While it’s a great mantra for security pedantics to chant, it actually does have some real-world applicability.

Just as an example, what if someone managing these records clicks an external link to another site? The other site now has a referrer log of the hash used to get into managing that record.

You should be able to have anyone find out the URL (querystring included) of a management page without being worried about them getting in if they are not supposed to.

I’m having a hard time trying to figure out exactly what you want to do. Which makes a little harder to offer a solution. :)

So, let me get this straight… you want to send an e-mail to a user with a URL that uniquely identifies that person. There would be no login and any time they want to access this special URL just for them, they just click on the link. Is that correct? If so, then it seems like an MD5 of SHA-1 hash should be sent to the user AND stored in the database. That way, your SQL and subsequent PHP become very straightforward as you only need to check if the record exists. Looping through all records to perform a hash is certainly a more time-consuming process. The only difference between sending the hash and sending the user’s name is guessability.

Wow, someone mentioned SQL-injection by the second comment. I was thinking of that as I was reading the article…

As a coder first and designer second, I’m really amazed at the wealth of knowledge in the php.net documentation. Highly detailed explanations of all the core commands, plus heaps of user-submitted commentary.

With regards to the situation you’re describing, I can’t completely picture it, but perhaps you could avoid having all these edit hashes just sitting there indefinitely? How about if the author wants to edit something, they click a button which emails then a new login-url (hash server time) which is then only valid for half an hour… something like that. This way the hashes are tied to user sessions, rather than to individual records.

“Just as an example, what if someone managing these records clicks an external link to another site?”

In a general sense, you’re absolutely right — a URL that allows editing of database records without further qualification is a bad idea.

It just occured to me that it would probably help to explain that this particular interface is time limited. It’s only ever triggered if an error occurs. It’s a way of saying, hey, you screwed up, fix your data. When the user fixes it, a flag is set and the link dies. You can no longer edit the data with the original link.

If this feature were to encompass wider activities, I’d certainly consider a login. But because it’s very limited in scope, and only a handful of users will ever have this kind of exposure to the management interface, and ultimately the link expires anyway, it seems like a reasonable solution.

What David Barrett was trying to say, is that users *can* use SQL injection techniques to harm your database. If you do not do some kind of find/replace to get rid of harmful characters, a user could write their way out of the data value and into your SQL statement.

While SQL Injection is well beyond the scope of this discussion, it’s easy to fix and there is quite a lot of information about it out there.

From the looks of it, you may have opened a can of worms here. Everybody with any database experience is going to tell you what you *need* to do ;)

“Perhaps you could avoid having all these edit hashes just sitting there indefinitely?”

I think I’ve accomplished this too, in a more round-about way. The script actually checks two separate variables before ever displaying the edit interface; one is the hash, the other is a status variable that is only set when the original error notification email went out. So if both are true, then display the form.

Ultimately, it’s my belief that by doing it like this, it’s impossible (or at least not possible in the context we’re considering now) to compromise data other than any error data sitting in the queue. Since this will usually be a small number of records, and it’s bad data anyway, I’m not overly worried about it. Have I covered all my bases?

Of the two approaches you detail in this entry, the second one is probably preferable, since it doesn’t require looping over the entire database in your script. (It would be better to let MySQL handle that sort of thing.) You should not be concerned about the space taken up by the hashes, at least as the common method of storing passwords would have it – you probably know that in most applications, passwords are stored in some kind of hashed or obfuscated form, in case someone who shouldn’t should gain access to them.

I was about to second the comment about instead programming more and doing a username/password sort of thing, but if you’re not concerned about people snapping up your referrers or users having problems getting the hashes correct, and the page they authenticate isn’t used for all that much or for all that long, then this is probably the way to go. Anything else introduces complexity you might not need.

This approach, using a ‘token’ for security, is used elsewhere, where more complex security isn’t possible – the Basecamp project management application does this to create ‘private’ RSS feeds.

A comment was raised stating that anyone could “calculate” the SHA1 of “janesmith” but, I don’t know if it helps, I have heard of a method whereby you “salt” the hash. Salt is a piece of text that is kept secret. The salt is added to the string such as “janesmith” + “salt” = “janesmithsalt”. Although it may be possible in some circumstances to guess “janesmith” from the context, the salt would be virtually random (it is consistent all of the time but itself appears to be random). Therefore, the “hacker” would need not only “janesmith” but also the salt. I am certain that there are good resources out there that explain this process better than I have.

PHP’s default behaviour is to automatically ‘escape’ any single quotes that arrive through $_POST or $_GET. However, between encoding characters and multiple-encoding and Lord knows what else, the safest thing is exactly what you’ve got: Only allow what you’re expecting, rather than filtering what you consider dangerous.

Typically, 99% of webform fields shouldn’t have anything going into them except letters, numbers, whitespace, and a very limited set of punctuation (don’t give ‘em the ampersand or semicolon or backslash unless you’re sure they’ll need them for legitimate data).

Re: SQL Injection…
I don’t know if I would condone limiting the number of available characters, there will undoubtedly arise a situation where a visitor will want to use said character for legitimate reasons.

Just delimit the characters you need to and forget about it. This should be performed on every single free-form text field that is inserted into the database.

Since I’m from a SQL Server/Access background, the character to watch for is the single quote ( ’ ). All you do is replace it with two single quotes in a row ( ” )when it comes time to insert/update data. It’s simple and doesn’t limit the visitor, since not allowing a single quote would be mean.

3/4 of the way through the article I linked, it talks about doubling quotes, and how that’s not secure at all.

I’d typed up a fancy little comment explaining it, but some combination of the quotes and escaping threw MT’s Perl script, and I got a bunch of errors trying to preview it. Anyhow, it’s there in the article.

Thanks Mike, didn’t realize that you could escape the single quote with a backslash (guess I’m a naive fool, as the article says). I’ve never heard of using the backslash as a delimiter in SQL Server/Access, maybe that technique only applies to MySQL.

Jarra, you don’t need to store the “secret string”. Here’s what I do to get RSS links off of my webapp that go to feeds for a logged-in user (so that the permissions and preferences of the web app also apply to the RSS feeds):

When I want to sent out a pre-authenticated link (that is, I want someone who comes in using that link to be treated as logged in as a user without having to specify username or password), I append two things: a username and a hash. These are sent as the query parameters auth-user and auth-code. The hash is calculated by concatenating the URL (up to and including the auth-user parameter, but before appending anything about the hash), and a secret key (a bunch of random data that never leaves the server). The hash is appended as a query parameter, leading to a URL like http://app.example.com/rss?auth-user=magnus&auth-code=123eba3c12

When my webapp receives a link with the auth-user and auth-code parameters set, it first strips off the auth-code part, and then appends the secret key, and hashes that. If the hash is equal to the value that was sent in as auth-code, then the link must have been generated by my server (since nothing else has access to the secret key). I therefore trust it, and log the user in as the user specified by auth-user. Since the username was part of the URL that was hashed, users cannot replace the username without the hash being invalidated.

This method can of course be extended to include expiry times, functionality subsets and what-not, all apended as auth-* parameters that are added before hashing - so all the information of how the link may be used is in plain-text, and none of it can change without the link being invalidated.

You don’t need to use a hash function here. You’re just treating the hash output as random data. If what you want is simply a unique, hard-to-guess identifier, then just use a bunch of random data (if you’re on UNIX, the ‘uuidgen’ program is an easy way to do this). You can store it in the database like you’re storing the hash output now.

You would use a hash if you wanted someone else to be able to compute the same hash value using the same source data. For example, pretend you and your friend both had a copy of some secret data, and you wanted to be sure they matched. You don’t want to send the secret data itself over insecure channels, but if you sent the hash instead and the hashes matched, you could be reasonably sure that you have the same data.

I just re-read the article, and for the sake of completeness, I’d like to add the way I did these things in the past, too.

In a webapp where we e-mailed out links to users where they could track support questions, we had a table that mapped randomly generated values to the ID of the support question. The links we sent out only contained the random value in the query section, and we purged the table now and then. No need for any hashes or other clever tricks.

This is, I believe, the standard way of doing these things, and I’d be surprised if the usual form sign-up schemes used anythingelse when doing email address verification.

If I’d do it over again, I’d use the auth-user+auth-code way (as described above), though.

I don’t know if I am the only one who noticed this but what if a malicious user provided bad data on purpose, just to gain access to the backend? Then the user would have a valid login (temporary, true, but he/she could do it as many times as needed) and access to the DB. Then there would be lots of bad things that could be done, most minor of which is SQL injection. You can easily take care of injection, however other problems inherent with database access are not so easy to take care of.

This is, however, a prickly problem and I don’t believe there is an actual solution… The only thing you could really do is use the php-generated text-as-image method so that when the user clicks the link he has to enter the text he sees in the image to get access. This prevents hammering and bots, which is at least half of the problem.

Preventing attackers from getting a valid session is the main purpose for having these checks. With sufficiently large values for the authentication data (be it a random number, a hash, or something else), these attacks can be made impossible.

To find out how large numbers you need, multiply your severer bandwidth with the duration that a sustained attack can be expected to last, and divide by the size of the smallest possible request that would make the server tell whether the attack worked or not. Then make that number a few times larger, and you have the max.

For instance, if you have a 100Mb/s link, and the smallest possible HTTP requests that would get your authorization scheme to reply is 24 bytes and an attack may go unnoticed for a week, then you need numbers up to 45 000 000 000 (<2^36) times a safety factor. So, a 40-bit value (e.g. 10 hex digits) would probably suffice.

The salt has to be quite random, and should be protected as much as possible. It can’t be anything related to other user information, although you can use the other information on top of that, such as email address, to ensure uniqueness. Don’t lose your secret salt or all your sent links will be unrecoverable duds.

Don’t forget to separate your values with a delimiter, for example jane:doe:janedoe@email.com:saltyramdomstring to avoid edge cases where two similar names/email addresses, etc, could merge together to equal the same string for different users.

You don’t have to store this information in the database, since it can be calculated on the fly, however the process of creating an SHA1 hash is quite CPU-intensive (try making 100,000 of them real fast). So depending how well you want to scale both on the email send event and the click through handler, you might want to go ahead and store the hash.

In case it wasn’t clear, the same random salt string is used for every record typically. Think of it as a randomly generated password used for all your hashes.

small note: Consider changing the regular expression to ^[a-f0-9]{40}$. the caret and dollar sign require the beginning and end of the string. Otherwise you are only verifying that a 40 char sequence is IN the given string.

“a few thousand dynamically-calculated SHA-1 hashes bogging down the server”

I don’t really believe that… SHA1-ing a string as short as “janesmith”. Even my old 3.5MHz 8-bit MSX computer can do that 300 times per second. On current 32-bit gigahertz machines, I can’t imagine it will be an issue.

Concerning storing your hashes inside the database: I’ve always thought that since storage inevitably gets bigger and cheaper, storing the results of system intensive processes is always favourable to recalculating them on the fly, when building a scaling application.

Magnus: An easier scheme would be to simply log every failed attempt, timestamp it, and save the source IP (don’t bother with browser signature – that’s too easy to spoof).

Then when a new login attempt comes through, you can check to see if there’s been more than, say, 5 failures from that IP in the last 12 hours. If so, deny them access.

It’s hard to provide further commentary without specific knowledge of what the task being accomplished is (sure does sound like comment editing, which Mr. Brill already has working), but there’s lots of good ideas in this discussion thread.

Tim: I agree with your position on this. Small redundancies in the data is far better than dynamic calculations on every query – that’s whole theory behind caching. For a completely over-simplified look at the issue, ask if you’d rather pay to double your storage space or your computational power?

I’m not sure why you’re bothing to store what is basically session data into the database, just so your users can return and chug happily away at their data. Why not just create a cookie with the session_* functions in PHP and authenticate based on those cookies?

Jon, you can absolutely use less than the standard 160 bits. Since there is no need to protect against birthday paradox attacks, 10.000 times a safey factor would be enough to prevent both collisions and random guessing.

You can safely truncate the hash to something manageable. As per the calculation above, 40 bits should be enough for most uses.

Bruce, Mark: I’d advocate prepared statements over stored procedures since there’s larger support for them. Basically, rather than executing a single string ‘The Query’, you call a prepare function which gets the query ready to go, but for every place where there’s a variable you need to insert, you just fill in a placeholder. Then when it’s time to actually run the query, you give the DB engine an array of values to populate the empty holes with.

Anit: You’d only be logging *failed* entrances as a way of clamping down on brute-force attacks. Sure, someone who knew what they were doing could go through proxies to mix it up a little, but as a first step it’s a start.

Elliott: I think it’s so that they can come back and edit something after a period of time has passed. (similar to the way craigslist operates)

dave, your approach is susceptible to man-in-the-middle attacks. a proxy could intercept your request, authenticate on behalf of the user, and do all sorts of nastiness. SSL is the usual solution to this problem.

i think for your application, the one-time-password and SSL would be a good combination. it all depends on what kinds of threats you’re willing to implement countermeasures for. :)

Once you submit a comment, you will notice a link by your name that
says “Edit Comment”. You can only edit your comment one time, then the
link is gone forever (sounds like the same approach your are taking).
I’m using session variables to store the data, that way I don’t have
to worry about people hacking the URL to edit somebody else’s comment
or something to that effect.

Me:
“Adding records to a database should be limited to those you want to allow to add records.”

Dave Shea:
The users don’t actually write SQL you know, they’re filling out forms and my PHP handles the rest. Or is this what you mean? Arbitrary users shouldn’t be filling out forms? Kind of hard to build an application that way.

Reply:
What I’m saying is this: you have an application running that allows people to add records to a database, by filling out a form.

By reading your desription of the application, I get the impression that if you just know the URL of the app, you can get to this form and through this add records. It wasn’t clear if this was a public app or not.*

If you’re running a public app, this is fine. Right now, I’m adding a record to the comment table of your database. There’s still security implications to worry about, ones that you can’t just ignore. If MovableType was poorly written, then someone could type some fragments of SQL into your comments form and run an SQL injection attack. Seriously muck with your site.

Even if the app is public, and only allows the adding of records, you STILL need to consider security. Taint check that input. Assume that each field contains a SQL injection attack. Assume that there’s a script out there trying to fill your database with crap (like a spambot would) as fast as your webserver can handle it. Find the holes in your defenses and plug them.

If the app isn’t supposed to be public, you should make sure it isn’t public.

When I said you should limit who should be allowed to add records: say you have a group of users who you want to be able to add records to the database. Now, I’m pretty sure I’m not in this group. Therefore, you probably don’t want me adding stuff to your database… and if it’s not a public app, I shouldn’t be able to.

You linked to an article on writing secure PHP. Point number 1 is the most important lesson you can learn. Legitimate users may not be writing SQL, but that doesn’t mean other people aren’t :)

Dave,
I’m more into server side dev than client side (and that’s the reason why I read your blog, to improve my poor skills). What I can tell you is that your app, as you described it above, is highly unsecure. As it was stated above, if I can guess the name of one of your users, it’s easy to calculate a few hashes for this name (md5 and sha1 being the most popular) and use the result in the query string.

If people are going to manipulate personal data, you don’t have many options: you’ll end up with an auth system based on session and/or cookie management.

I’ve come across problems similar to this before. The argument of storing the hash in the table or calculating it each time keeps cropping up, but most of the time I can rely on the fact I know approximately how many people will be using the application at one time so I know that the server can cope (closed systems).

I can’t see a way of getting around either storing the hash or calculating it each time and I’d strongly recommend against PHP’s built-in session management, I can’t see how this would solve the problem either as it would just shift the data to the servers hard drive instead of the database.

This method of not logging in leaves you with no choice other then for the client to either have a completely random hash stored and given to them secretly, or just using a password for their link. If you give them a password feild then they can do requestId=mypassword , because if sombody is trying to get data from a certain person, i’m sure it woulden’t be hard to find things like street addresses, and it would be very easy to just use their name, chances are they already know it.

From my understanding, some of the best practices, in no particular order, used to prevent SQL Injection are:

(a) Type-caste input variables
(b) Use regex to filter data
(c) Single-quote all variables in sql statements
(d) Escape quotes using mysql_real_escape_string (php version > 4.3) for input strings
e) Best to turn off all 3 magic_quotes settings in php.ini or at runtime since quotes are not the only thing coders should be escaping
(f) Log anything out of the ordinary

None of these steps by themselves can prevent SQL Injection. By this method, you can avoid using addslashes and stripslashes since they are not as robust as the alternatives. If I’ve missed anything or need to be corrected, feel free to correct or update this.