PHP tips

I'm an enthusiastic participant of the stackoverflow
platform, a Question & Answer site for programmers. From time to time i stumble over an interesting
problem (at least for me), then i try to solve the problem and publish a small article of the
solution here on this page.

If you should have problems, questions or suggestions about the functions
below, or if you simply find them useful, don't hesitate to send me an email to
.

Using X-Frame-Options and Content-Security-Policy with PHP

Most browsers today will help protecting your site from malicious attacks,
but you have to tell them they should. A widely supported method is setting
the X-Frame-Options. Setting this option, the browser will not allow other
sites to display your page inside an iframe. This protects against
Clickjacking attacks and should be used on all sensitive pages like the login
page.

// Adds X-Frame-Options to HTTP header, so that page can only be shown in an iframe of the same site.
header('X-Frame-Options: SAMEORIGIN'); // FF 3.6.9+ Chrome 4.1+ IE 8+ Safari 4+ Opera 10.5+

Users working with an up-to-date browser will benefit automatically, when a website
sends a Content-Security-Policy (CSP) within the HTTP header. With a CSP you can
specify from which locations you accept javascript, which sites are allowed to
show your page inside an iframe and many other things. If a browser supports
CSP, this can be an effective protection against Cross-Site-Scripting.
more…

The implementation in PHP is very straightforward, though some problems may arise
from inline JavaScript. The most protection you get, if you avoid all JavaScript
inside the HTML files, and always put it to separate *.js files. If this cannot be
done (because of existing source code), there is an option to allow inline-script.

If your site serves over HTTPS only (SSL for all pages), then it is a good idea
to send the Strict-Transport-Security header. The first time a user visits your site,
the browser will store this header. If the user later visits your site again, maybe
using an unsafe WLAN connection, the browser remembers to call it exclusively with HTTPS.
This would then protect from SSL-strip.

Generating password hashes with bcrypt

PHP 5.5 offers it's own functions password_hash()
and password_verify()
to simplify generating BCrypt password hashes. I strongly recommend to use this excellent api, or
its compatibility
pack for earlier PHP versions. The usage is very straightforward, the hash-value can be
stored in a database field of type varchar(255):

// Hash a new password for storing in the database.
// The function automatically generates a cryptographically safe salt.
$hashToStoreInDb = password_hash($password, PASSWORD_DEFAULT);
// Check if the hash of the entered login password, matches the stored hash.
// The salt and the cost factor will be extracted from $existingHashFromDb.
$isPasswordCorrect = password_verify($password, $existingHashFromDb);
// This way you can define a cost factor (by default 10). Increasing the
// cost factor by 1, doubles the needed time to calculate the hash value.
$hashToStoreInDb = password_hash($password, PASSWORD_BCRYPT, array("cost" => 11));

This solves the task pretty well. If you still want to know more about PHP's crypt function,
or how to add a pepper, then read on… There are well known best practices for (not) storing
passwords in a database. I wrote an in-depth tutorial
about hashing passwords, a short overview could look like that:

Do not store the password at all, instead use a one-way hash function and store only
the hash value of the password. If an attacker can steal your database, he cannot get
the passwords.

The hash function uses a salt, which should be random, and should be generated
separately for every stored password. This salt has to be stored together with the hash, but is
not secret (can be plain text). Different salts make rainbow tables impracticable, because an
attacker would have to build a rainbow-table for each password. Nobody will build a rainbow-table
for a single password, brute-forcing is faster.

Hash functions for passwords should be slow (need some computing time). Most
hash algorithms are designed to be fast, but this makes it easier to create rainbow tables for
every salt. That may sound theoretical, but with common hardware it is possible to calculate
100 Giga MD5 hash-values per second with common hardware (hashcat).
So for an english dictionary with about 500'00 words, we need only a fraction of a millisecond!

Ideally, you can adjust the computing time later for new hardware, without
breaking existing hashes.

The bcrypt hash algorithm was especially designed to meet this demands. It has a cost parameter,
that controls the necessary time for the calculation. This cost parameter will be stored together
with the salt in the resulting hash. The following class helps building secure BCrypt hashes. It
is provided with comments for educational purposes.

Secure password-reset function

In the article above, we saw how to store passwords safely, but this immediately leads to the
next problem, the password-reset function. The best password hash function is worthless, if we
do not handle the password-reset with the same care, as storing the password itself.

The usual way is, to send an email with a one time token to the registered user. The token will
be stored in the database and when the user clicks the link, we check the token and allow the user
to set a new password.

Now imagine an attacker can read the database table with the tokens through SQL-injection. He
could then demand a password reset for any e-mail address he likes, and because he can see the new
token, he could use it to set his own password. An ideal password-reset function should fulfill all
of these requirements:

The token must be unpredictable, that's accomplished best with a "really" random code which is
not based upon a timestamp or values like the user-id.

Like a password, the token should be hashed, before storing it in the database. This makes them
useless for an attacker, even if the database is stolen.

The reset-link should preferably be short to avoid problems with email clients, and contain
only safe characters 0-9 A-Z a-z (base62 encoded).

The token should have an expiry date. There is no advantage, when the link can be clicked two
years later. On the other hand, being able to read the e-mails doesn't necessarily mean, that an
attacker must hack the e-mail account, there is for example the open e-mail client in the office,
a lost mobile phone...

Of course the token should be marked as used, after the user has successfully set a new password.

The following class StoPasswordReset helps generating such
reset-links. The generated tokens are very strong (in contrast to weak passwords), so it is safe to
store an unsalted hash, calculated with a fast algorithm.

https://www.example.com/reset_password.php// Validate the token
if (!isset($_GET['tok']) || !StoPasswordReset::isTokenValid($_GET['tok']))
handleErrorAndExit('The token is invalid.');
// Search for the token hash in the database
$tokenHashFromLink = StoPasswordReset::calculateTokenHash($_GET['tok']);
if (!loadPasswordResetFromDatabase($tokenHashFromLink, $userId, $creationDate))
handleErrorAndExit('The token does not exist or has already been used.');
// Check whether the token has expired
if (StoPasswordReset::isTokenExpired($creationDate))
handleErrorAndExit('The token has expired.');
// Show password change form and mark token as used
letUserChangePassword($userId);

Switching between HTTP and HTTPS pages with secure session-cookie

To come to the point, every website switching between unsecure HTTP and encrypted HTTPS pages,
is inevitable prone to SSL-strip.
A secure HTTPS connection remains untouched with this attack, though the unaware user will be
tricked to work with an HTTP connection, when he thinks to use an HTTPS connection.

Because one cannot expect users to be able to recognize an SSL-strip attack, one should absolutely
think about using HTTPS for the whole site. Although this neither can prevent SSL-strip
in every case, it helps considerably. Because the following concept can have advantages for HTTPS-only
sites too, i decided to keep the article.

The problem with the session-id

For every request of a page, a session-id has to be sent along, that allows the server to
recognize the user. The session-id should be stored in a cookie, because passing it along the URL
makes session-fixation much to easy.
In the session on the server resides the information, whether the user is already logged in or not.
The problem now is, that an attacker that finds out this session-id (however he does), can
impersonate the user, and therefore has the same priviledges as the user.

To exchange sensitive data, we absolutely need an HTTPS connection with SSL
encryption. This makes sure, that nobody between client and server can eavesdrop our communication
and prevents a man-in-the-middle attack.
Websites which are switching betweed HTTP and HTTPS pages, have now to decide whether they:

send the session-cookie to HTTP and HTTPS pages, and thereby transmit the session-id
unprotected as soon as they request a HTTP page (even for requests of pictures).

or configure the session-cookie, so it will be sent exclusively to HTTPS pages, and thereby
loose the session, as soon as a HTTP page is shown.

With option 1 we can stop the discussion right now, there won't exist something like security
afterwards. Option 2 could be handled, using HTTPS only for the whole site. As already mentioned,
this should really be done, todays servers shouldn't have any problems with it. In PHP you could
then call the function session_set_cookie_params(...) and set the parameter
$secure to true.

The authentication cookie

The idea of the authentication cookie is, to create a second cookie in addition to the session
cookie, as soon as the user increases his privileges (login). This second cookie is configured in
such a way, that it will be sent back exclusively to HTTPS pages. Of course the login page itself
has to use HTTPS.

Now every page (HTTPS and HTTP) can use the unsecure session-cookie, it's purpose is merely to
maintain the session. However, all pages with sensitive information can check for the secure authentication
cookie.

https://www.example.com/secret.php
<?php
session_start();
// check that the authentication cookie exists, and that
// it contains the same code which is stored in the session.
$pageIsSecure = (!empty($_COOKIE['authentication']))
&& ($_COOKIE['authentication'] === $_SESSION['authentication']);
if (!$pageIsSecure)
{
// do not display the page, redirect to the login page
}
...
?>

An attacker could manipulate the session cookie, but he never has access to the authentication
cookie, which is responsible for the authentication. Only the person who entered the password, can
own the authentication cookie, it is sent exclusively over encrypted HTTPS connections.

In separating the two concerns "maintaining the PHP session" and "authentication", we can make
the system a bit more robust. There are many ways to attack the session-cookie (server settings,
php.ini, .htaccess, php code, browser settings, id in the url, ...), with the separation such
attacks are bound to fail.

UTF-8 for PHP and MySQL

Different character encodings can cause headaches, that's something every
developer who needs to make localized software knows for sure. Maybe your page
shows UTF-8, where as the database delivers iso-8859-1, then you get these odd
hieroglyphics, or even worse the user can possibly not even login anymore.

That's why Unicode was developed. I can't go into the details of Unicode here,
but the goal is to represent the characters of all known languages, and other symbols
as well (see this font character map).
One of the most commonly used encodings for Unicode is UTF-8, because it is very
compact (only 1 byte for common characters) and is understood by all todays web
browsers.

UTF-8 in a PHP page

First the HTML/PHP page itself should be stored in the UTF-8 file format.
That means you need an editor which supports Unicode, fortunately most IDE's are
able to do this. Normal characters are then stored with 1 byte, special characters
need 3-4 bytes, but the editor displays the typed-in character. That means, no HTML-entities
like &Auml; anymore(!), what you see is what you typed.

You should care that the editor does not store the BOM header,
this header is sometimes stored at the begin of the file with 3 bytes ï»¿.
The editor will hide them, so if you are not sure if your file contains these characters,
you can either use a non interpreting editor (hex editor), or this wonderful online
W3C checker. The BOM header
is treated as output by PHP, and this can cause nasty Cannot modify header
information - headers already sent errors.

Then you should add the encoding declaration to the top of the head element of
your HTML/PHP page, right after the opening <head> tag.

UTF-8 in MySQL

There is a simple way to tell the database it should deliver UTF-8 encoded
strings, so they can be used in an UTF-8 web page. Instead of fiddling with the
configurations of MySQL, just tell your connection object, which character-set
you expect, the database does the rest for you.

Queries will automatically return UTF-8 encoded strings, ajax results can be
used without cumbersome conversions, and other applications can request different
encodings if necessary.

To get more information about the charset of your database, you can make a query like that:

SHOW VARIABLES LIKE "character%"

Equal or not equal

What i'm missing most in PHP, is the benefit of a strong typed language.
Dynamic typing may have it's advantages, but would you have thought following
comparisons will give back true? PHP makes it possible...

('abc' == 0)

(0 == null)

(1 == '1w?z')

Of course you can use the === operator, to check values and
their types. Since PHP doesn't support you well with controlling types
explicitly, i found it to be of no much use. That was the point when i started
writing a class covering all the things i wished to be built-in in the PHP
language.

So where's the problem? Everybody using this function needs previous knowledge,
that he can only get by looking at the code or at the (good) documentation.

You must use the === operator to check the result (== will not work).

You must compare with either true (but not with false),
or with false (but not with true), and you are never sure which.

The return value has to be stored in a variable for later checking/printing.
It's difficult to find a describtive name, because it can contain different
things. This makes the code more prone to misunderstandings.

// All this checks will wrongly accept the email as valid!
$result = precariousCheckEmail('nonsense');
if ($result == true)
print('OK'); // -> OK will be given out
if ($result)
print('OK'); // -> OK will be given out
if ($result === false)
print($result);
else
print('OK'); // -> OK will be given out
if ($result == false)
print($result);
else
print('OK'); // -> OK will be given out

Instead of just telling what is bad, i would like to give a better alternative
as well. The example below passes an additional parameter by reference. The calling
code is very readable and it's nearly impossible to use it wrong.

Calculating distance between points on earth

To calculate the spheric distance between two points on the earth (great-circle distance), one
can use the Haversine formula. This formula is stable for calculating small distances regarding
rounding errors.