Part of being a good, security-conscious web developer is paranoia, and it's apparent that the RockYou.com developers could have used a little more of it. They made two mistakes in their work, not one. Their first, and most obvious one, is that they had a SQL injection hole somewhere. Their second was their assumption that their measures to protect their data were enough to do so.

A healthy dose of paranoia would have led their developers to make the opposite assumption - that whatever they did to protect the data, sooner or later someone would be able to access it.

The result of this second mistake is that, rather than simply announcing a security hole has been found and closed, they have had to deal with the fact that the passwords of more than 32 million people have been exposed, in plain text, to an unknown number of people. As most people use the same password for multiple places, and most will be unaware that this has happened, we can safely assume that the access details of millions of email accounts are in the open and unchanged. That's a bad day in code-land by anyone's standards.

Hashing

The solution to the problem is to first assume that all data will be exposed at some point to an intruder of some sort. Once you assume that, it becomes important to ensure that the damage resulting from that exposure is minimal.

Which brings me on to hashes. Hashes are one-way functions that generate a representation, usually a number, of the data put in to them. They always generate the same hash from the same data, and there is no simple way to reverse the process.

This makes them incredibly useful for password storage. Instead of storing a user's password, you can store the hash of the password. When a user logs in again, instead of checking the password they type in against the one you have stored, you calculate the hash of the password they type in and compare that to the stored hash.

There are lots of different hashing algorythms, the most commonly used being MD5 and SHA1.

Are Hashes Secure?

Unfortunately, ensuring passwords are stored securely isn't as simple as just using storing a simple hash of a password. Two of the strengths of hashes are also their largest potential weakness: they are small to store and quick to generate.

To generate SHA1 and MD5 hashes of every word in English, for example, takes moments. To store that amount of data is also trivial. To generate hashes of all combinations of letters and numbers, plus a few commonly used punctuation marks, up to say 8 characters, is much slower but still doable without any special setup or equipment.

Tables of precalculated hashes of data like this are easily found online or easily generated. If you have a hash of some data (like a password) and you want to see what that data originally was, you can compare the hash to the entries in your precalculated table. If you find a match, you have discovered the data that was originally used to generate the hash - the password you were trying to find out.

So basic password hashing is, essentially, useless for the majority of users. It is a simple process to compare hashes of basic passwords to a table of precalculated hashes and thereby "dehash" passwords en masse.

Some people recommend nesting hashes as a way to make add complexity and therefore more security. Unfortunately, to generate tables of nested hashes is almost as easy as plain hashes by themselves, and no more secure.

Add Salt!

The solution is to hash more than just the user's password, and this process is called "salting". For example, instead of storing a hash of a user's password, you could store the hash of their email address and their password together.

This is effective because tables of hashes of generated data of more than about 10 characters start to become problematic to generate and store. At around that point, tables must be generated based upon dictionaries and known words, rather than on programatically generated lists of all possible passwords in a range.

The average length of "email plus password" is easily in the region of 25 characters. Not only that, but if someone worked out that you were using hashes of "email plus password", they would still need to generate a new table for every password they wanted to dehash.

This level of complexity, added to a reasonably strong password policy, ensures that if (or when) your user data is exposed, the work involved in extracting usable passwords from it is going to stop all but the most determined attackers. Not only that, but even they will find extraction of data in bulk prohibitively difficult.

Cross-Site Scripting (XSS)

Cross-site scripting (often abbreviated to XSS) is a form of injection, where an attacker finds a way to have the target site display code they control. In its most basic form, this can be as simple as a site that allows HTML characters in usernames, where someone can specify a username like:

Now, whenever someone sees my username on the target site, the script I've added to my username will run. I could potentially use this to grab the person's login information, log their keystrokes - any number of nefarious activities.

As a developer, you can combat this type of attack by encoding or removing HTML characters (watch out for character encoding issues, as outlined next). Even better than stripping out unwanted characters is to allow a whitelist of safe characters in usernames and other fields. Be especially careful with e-commerce sites where you are listing orders in a CMS - an XSS vulnerability may allow an attacker to gain administrative access to your CMS. It is also important to turn off TRACE and TRACK support on the server, as if there is a vulnerability (and always assume that despite your best efforts there will be) these potentially allow an attacker to steal a user's cookie.

As a user you are also vulnerable to this sort of attack, and it is very difficult, at the moment, to make yourself safe against it. Vigilance is key, and to that end I have released a userscript that warns you about third party scripts (for users of GreaseMonkey, Opera or Chrome).

Cross-Site Request Forgery (CSRF)

Despite the similar name, CSRF is unconnected to XSS. CSRF is a form of attack where an authenticated user performs an action on a site without knowing it.

Let's assume that Jack is logged in to his bank, and has a cookie stored on his computer. Each time he sends an HTTP request to the bank (i.e., views a page or an image on a page) his browser sends the cookie along with the request so that the bank knows that it's him making the request.

Jill, meanwhile, runs a different website and has managed to get Jack to visit it. One of the items on the page is in fact loaded from the bank, for example in an iframe. The URL of the iframe or request contains instructions to the bank to transfer money from Jack's account to Jill's. Because the request is coming from Jack's computer, and includes his cookie, the bank assumes it is a legitimate request and the money is transferred.

This type of attack is extremely dangerous and virtually untracable. As a developer, your job is to protect against it, and the best way to do that is to remember Rule Number One: Never, Ever Trust Your Users. No matter how authenticated they are, do not assume every request was intended.

In practical PHP terms, you can combat CSRF with several relatively simple coding habits. Never let the user do anything with a GET request - always use POST. Confirm actions before performing them with a confirmation dialog on a separate page - and make sure both the original action button or link and the confirmation were clicked. Even better, have the user enter information like letters from their password on the confirmation page.

Add a randomly generated token to forms and verify its presence when a request is made. Use frame-breaking JavaScript. Time-out sessions with a short timespan (think minutes, not hours). Encourage the user to log out when they've finished. Check the HTTP_REFERER header (it can be hidden, but is still worth checking as if it is a different domain to that expected it is definitely a CSRF request).

Character Encoding

Character encoding in PHP and associated database systems is worthy of its own series. In any one request, there may be more different character encodings in use than you might think.

For example, a single request and response (uploading a file to a server and writing information to a database) may involve all of the following differently items with different character encodings: the HTTP request headers, post data, PHP's default encoding, the PHP MySQL module, MySQL's default set, the set of each table being used, a file being opened and read, a new file being created and written, the response headers and the response body.

English-speaking developers generally don't have much cause to get embroiled in character encoding issues, and that results in a lot of developers with a serious lack of understanding of how character encodings work and fit together. For those that do have a reason to look at character encodings, usually that interest ends with the setting of the response's character set.

However, character sets are a fundamental part of all web development. English alone can exist in any one of a wide variety of sets, and developers are usually familiar with the most common two: ISO-8859-1 and UTF-8. Fewer are familiar with UCS-2, UTF-16 or windows-1252. Still fewer are familiar with commonly used alternative language sets (e.g, GB2312 for Chinese).

Which, in a very roundabout way, brings me on to the security pitfalls of character encodings. Where data is processed by PHP using one character set, but a database server uses a different character set, a character (or series of characters) deemed safe by PHP may in fact allow SQL injection against the database.

PHP security expert Chris Shiflett has written about this issue and included an example of how it can be exploited to allow SQL injection even where input is sanitized using addslashes().

The solution is to always always use mysql_real_escape_string() rather than addslashes() (or use prepared statements / stored procedures), and to explicitly state character sets at all stages of interaction. Ideally, use the same character set throughout your system (UTF-8 is recommended) and where PHP allows you to specify a character encoding for a function (e.g., htmlspecialchars() or htmlentities()), make use of it.

It's not just SQL that's vulnerable as a result of character encoding bugs. Cross-site scripting is possible even where HTML characters are escaped if character sets are not handled properly. Fortunately, once again that is simple to avoid by properly setting character encodings at all stages of the process and specifying character encoding for functions where possible.

]]>Thu, 11 Sep 2008 12:11:14 +0000https://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-4/Dave Childcode,coding,development,mysql,php,programming,security,tips,tutorial,web,webdesign,webdevRex Swain HTTP Viewer Bookmarklethttps://www.addedbytes.com/blog/rex-swain-http-viewer-bookmarklet/
Rex Swain's HTTP Viewer is a great tool for checking HTTP status codes, redirection, and so on. I've been unable to find a bookmarklet, though, for sending the URL I am viewing to the viewer automatically, so I put one together:

]]>Fri, 26 Jan 2007 09:55:13 +0000https://www.addedbytes.com/blog/rex-swain-http-viewer-bookmarklet/Dave Childbookmarklet,http,webdevRSS to iCalhttps://www.addedbytes.com/blog/rss-to-ical/
I have been looking for a way to convert the BBC weather feed for my area to iCal, so I can subscribe to it. It's date-based, after all, and RSS never seemed to me to be an appropriate format for subscribing to weather information. iCal always struck me as being "better" for that purpose.

I have been looking for a way to convert the BBC weather feed for my area to iCal, so I can subscribe to it. It's date-based, after all, and RSS never seemed to me to be an appropriate format for subscribing to weather information. iCal always struck me as being "better" for that purpose. Of course, the BBC only have an RSS feed for local weather. What I needed was a converter.

After some hunting, I discovered that Dean Sanvitale had written a PHP script to convert RSS feeds to iCal format. However, his site (codent.com) appears to be long since abandoned and the script is no longer available from there. Fortunately, the Wayback Machine did have a copy. Dean originally released the script under a Creative Commons License which, fortunately, allows me to make the script available to download from this site (note: the script is available from this site under the same license).

So, if you're looking for a way to convert an RSS feed to iCal, this PHP script will do the job. Thanks Dean!

]]>Thu, 19 Oct 2006 11:14:16 +0000https://www.addedbytes.com/blog/rss-to-ical/Dave Childbbc,code,convert,ical,php,rss,rss2ical,tools,weather,web,webdevInternet Country Codeshttps://www.addedbytes.com/blog/code/internet-country-codes/
I recently had need to create a list of the ccTLDs, to allow someone to find out what a country's ccTLD was. Maybe it'll be useful to someone else as well!

Afghanistan

.af

Albania

.al

Algeria

.dz

American Samoa

.as

Andorra

.ad

Angola

.ao

Anguilla

.ai

Antarctica

.aq

Antigua and Barbuda

.ag

Argentina

.ar

Armenia

.am

Aruba

.aw

Australia

.au

Austria

.at

Azerbaijan

.az

Bahamas, The

.bs

Bahrain

.bh

Bangladesh

.bd

Barbados

.bb

Belarus

.by

Belgium

.be

Belize

.bz

Benin

.bj

Bermuda

.bm

Bhutan

.bt

Bolivia

.bo

Bosnia and Herzegovina

.ba

Botswana

.bw

Bouvet Island

.bv

Brazil

.br

British Indian Ocean Territory

.io

British Virgin Islands

.vg

Brunei

.bn

Bulgaria

.bg

Burkina Faso

.bf

Burma

.mm

Burundi

.bi

Cambodia

.kh

Cameroon

.cm

Canada

.ca

Cape Verde

.cv

Cayman Islands

.ky

Central African Republic

.cf

Chad

.td

Chile

.cl

China

.cn

Christmas Island

.cx

Cocos (Keeling) Islands

.cc

Colombia

.co

Comoros

.km

Congo, Democratic Republic of the

.cd

Congo, Republic of the

.cg

Cook Islands

.ck

Costa Rica

.cr

Cote d'Ivoire

.ci

Croatia

.hr

Cuba

.cu

Cyprus

.cy

Czech Republic

.cz

Denmark

.dk

Djibouti

.dj

Dominica

.dm

Dominican Republic

.do

East Timor

.tl

Ecuador

.ec

Egypt

.eg

El Salvador

.sv

Equatorial Guinea

.gq

Eritrea

.er

Estonia

.ee

Ethiopia

.et

Falkland Islands (Islas Malvinas)

.fk

Faroe Islands

.fo

Fiji

.fj

Finland

.fi

France

.fr

France, Metropolitan

.fx

French Guiana

.gf

French Polynesia

.pf

French Southern and Antarctic Lands

.tf

Gabon

.ga

Gambia, The

.gm

Gaza Strip

.ps

Georgia

.ge

Germany

.de

Ghana

.gh

Gibraltar

.gi

Greece

.gr

Greenland

.gl

Grenada

.gd

Guadeloupe

.gp

Guam

.gu

Guatemala

.gt

Guernsey

.gg

Guinea

.gn

Guinea-Bissau

.gw

Guyana

.gy

Haiti

.ht

Heard Island and McDonald Islands

.hm

Holy See

.va

Honduras

.hn

Hong Kong

.hk

Hungary

.hu

Iceland

.is

India

.in

Indonesia

.id

Iran

.ir

Iraq

.iq

Ireland

.ie

Israel

.il

Italy

.it

Jamaica

.jm

Japan

.jp

Jersey

.je

Jordan

.jo

Kazakhstan

.kz

Kenya

.ke

Kiribati

.ki

Korea, North

.kp

Korea, South

.kr

Kuwait

.kw

Kyrgyzstan

.kg

Laos

.la

Latvia

.lv

Lebanon

.lb

Lesotho

.ls

Liberia

.lr

Libya

.ly

Liechtenstein

.li

Lithuania

.lt

Luxembourg

.lu

Macau

.mo

Macedonia

.mk

Madagascar

.mg

Malawi

.mw

Malaysia

.my

Maldives

.mv

Mali

.ml

Malta

.mt

Man, Isle of

.im

Marshall Islands

.mh

Martinique

.mq

Mauritania

.mr

Mauritius

.mu

Mayotte

.yt

Mexico

.mx

Micronesia, Federated States of

.fm

Moldova

.md

Monaco

.mc

Mongolia

.mn

Montserrat

.ms

Morocco

.ma

Mozambique

.mz

Namibia

.na

Nauru

.nr

Nepal

.np

Netherlands

.nl

Netherlands Antilles

.an

New Caledonia

.nc

New Zealand

.nz

Nicaragua

.ni

Niger

.ne

Nigeria

.ng

Niue

.nu

Norfolk Island

.nf

Northern Mariana Islands

.mp

Norway

.no

Oman

.om

Pakistan

.pk

Palau

.pw

Panama

.pa

Papua New Guinea

.pg

Paraguay

.py

Peru

.pe

Philippines

.ph

Pitcairn Islands

.pn

Poland

.pl

Portugal

.pt

Puerto Rico

.pr

Qatar

.qa

Reunion

.re

Romania

.ro

Russia

.ru

Rwanda

.rw

Saint Helena

.sh

Saint Kitts and Nevis

.kn

Saint Lucia

.lc

Saint Pierre and Miquelon

.pm

Saint Vincent and the Grenadines

.vc

Samoa

.ws

San Marino

.sm

Sao Tome and Principe

.st

Saudi Arabia

.sa

Senegal

.sn

Serbia and Montenegro

.cs

Seychelles

.sc

Sierra Leone

.sl

Singapore

.sg

Slovakia

.sk

Slovenia

.si

Solomon Islands

.sb

Somalia

.so

South Africa

.za

South Georgia and the Islands

.gs

Soviet Union

.su

Spain

.es

Sri Lanka

.lk

Sudan

.sd

Suriname

.sr

Svalbard

.sj

Swaziland

.sz

Sweden

.se

Switzerland

.ch

Syria

.sy

Taiwan

.tw

Tajikistan

.tj

Tanzania

.tz

Thailand

.th

Togo

.tg

Tokelau

.tk

Tonga

.to

Trinidad and Tobago

.tt

Tunisia

.tn

Turkey

.tr

Turkmenistan

.tm

Turks and Caicos Islands

.tc

Tuvalu

.tv

Uganda

.ug

Ukraine

.ua

United Arab Emirates

.ae

United Kingdom

.uk

United States

.us

United States Minor Outlying Islands

.um

Uruguay

.uy

Uzbekistan

.uz

Vanuatu

.vu

Venezuela

.ve

Vietnam

.vn

Virgin Islands

.vi

Virgin Islands (UK)

.vg

Virgin Islands (US)

.vi

Wallis and Futuna

.wf

West Bank

.ps

Western Sahara

.eh

Western Samoa

.ws

Zambia

.zm

Zimbabwe

.zw

Reverse Lookup

This list is ordered by the domain names, rather than countries, to allow you to find out which country a domain name represents.

Andorra

.ad

United Arab Emirates

.ae

Afghanistan

.af

Antigua and Barbuda

.ag

Anguilla

.ai

Albania

.al

Armenia

.am

Netherlands Antilles

.an

Angola

.ao

Antarctica

.aq

Argentina

.ar

American Samoa

.as

Austria

.at

Australia

.au

Aruba

.aw

Azerbaijan

.az

Bosnia and Herzegovina

.ba

Barbados

.bb

Bangladesh

.bd

Belgium

.be

Burkina Faso

.bf

Bulgaria

.bg

Bahrain

.bh

Burundi

.bi

Benin

.bj

Bermuda

.bm

Brunei

.bn

Bolivia

.bo

Brazil

.br

Bahamas, The

.bs

Bhutan

.bt

Bouvet Island

.bv

Botswana

.bw

Belarus

.by

Belize

.bz

Canada

.ca

Cocos (Keeling) Islands

.cc

Congo, Democratic Republic of the

.cd

Central African Republic

.cf

Congo, Republic of the

.cg

Switzerland

.ch

Cote d'Ivoire

.ci

Chile

.cl

Cameroon

.cm

China

.cn

Colombia

.co

Costa Rica

.cr

Serbia and Montenegro

.cs

Cuba

.cu

Cape Verde

.cv

Christmas Island

.cx

Cyprus

.cy

Czech Republic

.cz

Germany

.de

Djibouti

.dj

Denmark

.dk

Dominica

.dm

Dominican Republic

.do

Algeria

.dz

Ecuador

.ec

Estonia

.ee

Egypt

.eg

Western Sahara

.eh

Eritrea

.er

Spain

.es

Ethiopia

.et

Finland

.fi

Fiji

.fj

Falkland Islands (Islas Malvinas)

.fk

Micronesia, Federated States of

.fm

Faroe Islands

.fo

France

.fr

France, Metropolitan

.fx

Gabon

.ga

Grenada

.gd

Georgia

.ge

French Guiana

.gf

Guernsey

.gg

Ghana

.gh

Gibraltar

.gi

Greenland

.gl

Gambia, The

.gm

Guinea

.gn

Guadeloupe

.gp

Equatorial Guinea

.gq

Greece

.gr

South Georgia and the Islands

.gs

Guatemala

.gt

Guam

.gu

Guinea-Bissau

.gw

Guyana

.gy

Hong Kong

.hk

Heard Island and McDonald Islands

.hm

Honduras

.hn

Croatia

.hr

Haiti

.ht

Hungary

.hu

Indonesia

.id

Ireland

.ie

Israel

.il

Man, Isle of

.im

India

.in

British Indian Ocean Territory

.io

Iraq

.iq

Iran

.ir

Iceland

.is

Italy

.it

Jersey

.je

Jamaica

.jm

Jordan

.jo

Japan

.jp

Kenya

.ke

Kyrgyzstan

.kg

Cambodia

.kh

Kiribati

.ki

Comoros

.km

Saint Kitts and Nevis

.kn

Korea, North

.kp

Korea, South

.kr

Kuwait

.kw

Cayman Islands

.ky

Kazakhstan

.kz

Laos

.la

Lebanon

.lb

Saint Lucia

.lc

Liechtenstein

.li

Sri Lanka

.lk

Liberia

.lr

Lesotho

.ls

Lithuania

.lt

Luxembourg

.lu

Latvia

.lv

Libya

.ly

Morocco

.ma

Monaco

.mc

Moldova

.md

Madagascar

.mg

Marshall Islands

.mh

Macedonia

.mk

Mali

.ml

Burma

.mm

Mongolia

.mn

Macau

.mo

Northern Mariana Islands

.mp

Martinique

.mq

Mauritania

.mr

Montserrat

.ms

Malta

.mt

Mauritius

.mu

Maldives

.mv

Malawi

.mw

Mexico

.mx

Malaysia

.my

Mozambique

.mz

Namibia

.na

New Caledonia

.nc

Niger

.ne

Norfolk Island

.nf

Nigeria

.ng

Nicaragua

.ni

Netherlands

.nl

Norway

.no

Nepal

.np

Nauru

.nr

Niue

.nu

New Zealand

.nz

Oman

.om

Panama

.pa

Peru

.pe

French Polynesia

.pf

Papua New Guinea

.pg

Philippines

.ph

Pakistan

.pk

Poland

.pl

Saint Pierre and Miquelon

.pm

Pitcairn Islands

.pn

Puerto Rico

.pr

Gaza Strip

.ps

West Bank

.ps

Portugal

.pt

Palau

.pw

Paraguay

.py

Qatar

.qa

Reunion

.re

Romania

.ro

Russia

.ru

Rwanda

.rw

Saudi Arabia

.sa

Solomon Islands

.sb

Seychelles

.sc

Sudan

.sd

Sweden

.se

Singapore

.sg

Saint Helena

.sh

Slovenia

.si

Svalbard

.sj

Slovakia

.sk

Sierra Leone

.sl

San Marino

.sm

Senegal

.sn

Somalia

.so

Suriname

.sr

Sao Tome and Principe

.st

Soviet Union

.su

El Salvador

.sv

Syria

.sy

Swaziland

.sz

Turks and Caicos Islands

.tc

Chad

.td

French Southern and Antarctic Lands

.tf

Togo

.tg

Thailand

.th

Tajikistan

.tj

Tokelau

.tk

East Timor

.tl

Turkmenistan

.tm

Tunisia

.tn

Tonga

.to

Turkey

.tr

Trinidad and Tobago

.tt

Tuvalu

.tv

Taiwan

.tw

Tanzania

.tz

Ukraine

.ua

Uganda

.ug

United Kingdom

.uk

United States Minor Outlying Islands

.um

United States

.us

Uruguay

.uy

Uzbekistan

.uz

Holy See

.va

Saint Vincent and the Grenadines

.vc

Venezuela

.ve

British Virgin Islands

.vg

Virgin Islands (UK)

.vg

Virgin Islands

.vi

Virgin Islands (US)

.vi

Vietnam

.vn

Vanuatu

.vu

Wallis and Futuna

.wf

Samoa

.ws

Western Samoa

.ws

Mayotte

.yt

South Africa

.za

Zambia

.zm

Zimbabwe

.zw

]]>Fri, 07 Oct 2005 08:06:59 +0000https://www.addedbytes.com/blog/code/internet-country-codes/Dave Childcheatsheet,codes,country,dns,domain,domains,internet,reference,resources,tld,webdevBlock Referrer Spam (Updated)https://www.addedbytes.com/blog/block-referrer-spam/
Log files are a useful tool for webmasters. It helps to know how people are finding your site, and what software they are using to view it, among other things. A strange decision by a small group of bloggers, though, has given unscrupulous marketers another window of opportunity to manipulate search engines to increase their traffic.

The decision made by these short-sighted bloggers was to display, on their site, a list of recent referrers to each page. I can't imagine any reason why a visitor might be in the least bit interested in seeing this, but nevertheless many sites now display referrers on every page.

As search engine spiders visit sites, they grab the contents of each page they visit. They use this snapshot in their index - meaning that although a page may change every minute or two, a search engine may be using a single copy of a page for several days, or even weeks.

So a referral URL that is on a page when the spiders come to visit can have quite a bit of value, if the search engine visiting uses link popularity in any way (Google uses link popularity, as do many others).

So marketers have started to use programs to visit pages using a fake referral header, to get their URLs listed on as many sites as possible, in the hopes that this will increase their traffic.

However, this renders log files almost completely useless. These fake visitors usually visit from search engines, having searched for a keyphrase relevant to their own site. They skew statistics relating to number of visitors received, the countries used to visit, the technology used to view the page, how users found the page, how long they spent on the site ... and so on.

A webmaster may find their search engine rankings dropping because of this, and they may find search engines have removed them completely. Many sites that use spam techniques are quickly identified and penalised, and penalties will often be applied to sites that link to them as well.

There are plenty of techniques available for blocking referrer spam, and everyone has their favourite. Personally, I use a combination of two techniques.

The first is fairly simple - my referrer log is not indexable. I don't display referrers on the pages of my site. My referral log is publicly available, but search engines are instructed to ignore it. This removes the main incentive for people to referrer-spam my site (the other reason for this type of spam - the hope that the site owner will themselves visit the spamming URL - is less common, because it has such a low response rate).

Second, I use an .htaccess file to block requests from whatever I've managed to identify as either a crawler designed to find URLs to spam or a spamming URL. This is a relatively simple blacklist, and though it cannot work as a long term solution to this problem, it keeps me happy for now.

To implement this technique on your own site, first make sure you are running Apache with mod_rewrite. If you are, create a file called ".htaccess" (just that, not .htaccess.txt or anything else) and paste the following into it:

Update: 14th September 2005

The list below has been expanded substantially over the last year, and now covers much more spam than before. As stated before, this is not a practical solution to the problem in the long term, as this list can only ever get longer and longer, and may become unmaintainable, or even (eventually) slow a site to a crawl as all the rules are processed. However, as of now, it is still a useful tool.

The above will block just about all of the most common referral spam that I've seen so far. I'm adding to the list constantly (last addition: 14th September 2005) so do check back and see if there are updates if you're using it.

One potential problem with this technique, other than that it will, in time, become useless as too many URLs are added, is that there is always a possibility authentic visitors will be blocked. So, on this site, instead of the last line above, I've actually used something a little more user-friendly:

And there we have it. With minimum effort (for now), referral log spamming in my site has been almost entirely removed. Before adding this set of rules and scripts, I was seeing around 200 fake referrals per day in my log files. Now, I see about 3 or 4 a week. Hopefully, this will continue until I can devise a better way of protecting against this kind of problem - before blacklists become an impossibility to manage.

]]>Wed, 14 Sep 2005 10:36:00 +0000https://www.addedbytes.com/blog/block-referrer-spam/Dave Childadmin,apache,howto,htaccess,referrer,server,spam,webdev,wordpressWriting Secure PHP, Part 3https://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-3/
In Writing Secure PHP and Writing Secure PHP, Part 2 I covered many of the basic mistakes PHP developers make, and how to avoid common security problems. It is time to get a little deeper into security though, and begin to tackle some more advanced issues.

Context

Before I start, it is worth mentioning at this point in this series that much of what is to come is highly dependant on context. If you are running a small personal site and are regularly backing it up, the chances are that there is no real benefit to you spending weeks on advanced security issues. If an attacker can gain nothing (and cause no harm) by compromising your site, and it would only take you ten minutes to restore it, should something go wrong, then it would be a waste to spend too long on security concerns. At the other end of the scale, if you are managing an ecommerce site that processes thousands of credit cards a day, then it is negligent not to spend a lot of time researching and improving your site's security.

Database Field Lengths

Database (we're going to talk about MySQL here, but this is applicable to any database) fields are always of a specific type, and every type has its limits. You can as well, in MySQL, limit field lengths further than they are already limited by their types.

However, to the inexperienced developer, this can present problems. If you are allowing users to post an article on your site, and adding that to a database field with type "blob", then the longest article you can store in the database is 65,535 characters. For most articles that will be fine, but what is going to happen when a user posts an article of 100,000 characters? At best, if you have set up your site so errors are not displayed, their article will simply vanish without being added to the site.

Remember that for an attacker to be able to compromise your system, they need information about it. They need to find weaknesses. Error messages are a very powerful part of that and if you are displaying errors, then an attacker can make use of this to find out information about your database.

To fix this, simply check the lengths of data input through forms and querystrings and ensure that before you launch a site you check forms will not cause errors to be displayed when too many characters are entered.

Weak Passwords

Dictionaries are a useful tool for an attacker. If you have a site with a login system and your database were compromised (and there is no harm in assuming that at some point it will be), an attacker can grab a list of hashed passwords. It is difficult (practically impossible) to directly translate a hash back into a password.

However, most attackers will have databases containing lists of words and their matching hashes in common formats (eg a database with all words in English and their MD5 hashes). It is fairly easy, should someone gain access to your database, for them to compare a hashed password to this list of pre-hashed passwords. If a match is found in the list, the attacker then knows what the un-hashed password is.

There are ways to avoid this problem, and the best of those is to ensure that only strong passwords are ever used. Some people find guaging the strength of passwords tricky, but the general rule of thumb is: a password like "password", "admin", "god", "sex", "qwerty", "123456" or similar (i.e. easily guessable) is extremely weak; a password made up only of a word in the dictionary is weak; a password made of letters, numbers and making use of upper and lower case is strong (there is a strong usability case to be made for not using case-sensitive passwords - if you wish to use case-insensitve ones, simply perform checks to ensure people do not pick passwords like "password12345").

Clients

Clients are a huge security risk, believe it or not. Some will hire a cheaper developer to make small changes six months after you're finished. Some will give out FTP details to anyone who phones and asks for them. [Out of curiosity, I decided to see how easy it is to get FTP details over the phone. I visited the site of a local company (who shall remain nameless) and found the name of their design company (who shall also remain nameless). I then phoned the local company and told them I was with the design company and needed them to send me the site's FTP details. They agreed without question or hesitation. Scary. (I told them what I was doing before they sent any sensitive data to me and they are now better educated and suitably paranoid about people asking for details over the phone).]

Some will ignore emails from people pointing out security problems (in the process of writing the previous article in this series, I found a large selection of sites with publically available database connection scripts. I emailed the owners explaining why they are at risk, and only one has replied and had the problem fixed at the time of writing). Admitedly, many of the emails and calls they receive will be misinformation or sales pitches, but it is still worth them having someone check this out - they do not know enough to distinguish a genuine problem from the rest.

Unfortunately, this is one security problem that cannot be solved with code. This one requires education. For this reason, I have created an unbranded copy of the sheet I give to my clients, with a selection of security tips on. When we launch the site, I sit down with them and tell them how they need to treat their site, and what to consider when making decisions regarding it.

Code Injection (a.k.a. "Cross-Site Scripting")

Unlike SQL Injection, which relies on the use of delimiters in user-input text to take control of database queries, code injection relies on mistakes in the treatment of text before it is output. Or, to put it in simpler terms, code injection is where a malicious user uses a text box to add HTML that they've written to your webpage.

Let's say you have a system that allows users to register as members to your site and that they are allowed to create their own username. They fill out a form, and you insert the data they enter, once you've made it safe to use in a SQL query, into a database. Your members listing page fetches all the usernames from the database and lists them, outputting exactly what is in the database to anyone that views that page.

Now, let's say you've not added a limit to username lengths. Someone could, if they wanted, create a user with the following username:

Anyone that then views a page with that username on it will see a normal username, but a JavaScript has been loaded from another site invisibly to the user.

There are plenty of uses for this. First and foremost, it allows attackers to add keyloggers, tracking scripts or porn banners on your site, or just stop your site working altogether. There are several ways to ensure this doesn't happen. First, you could encode HTML in usernames. If you wanted to allow people to use greater-then and less-than signs in their usernames, that is. If not, you can strip these characters out, or strip out HTML tags altogether.

Another, better way to approach this is to limit the character set that can be used in usernames. If you only allow letters and numbers, for example, you could simply use a regular expression in the signup process to validate the username and force the user to pick another if they have disallowed characters in their username. Obviously the problem is not just applicable to usernames - however, as with most other security concerns, being quite paranoid will ensure that you always check data coming from a user before outputting it, and sanitising it in an appropriate way.

Aftermath

Part of a good security strategy is the assumption that at some point everything (and I mean everything) will be deleted or destroyed. It is wise to assume that at some point any security measure you have in place will be compromised. All data may be taken (which is one reason why it is important to encrypt things like passwords and credit card numbers in databases), all files deleted and so on.

One part of PHP development, though perhaps not directly about PHP security, is ensuring that after a catastrophic failure a site can be brought back online quickly. While downtime of four hours maybe acceptable with a low-traffic point-of-presence site, any ecommerce retailer is going to erupt with fury at the thought of that much lost revenue.

Dealing with the client under these circumstances is the first step. Often, your first inkling of a problem with a site may actually come from the client. They may have phoned you and could be angry, worried, or a myriad of other emotions. At moments like this, you would be very glad to have a clear contigency plan in place. Many developers panic when the client phones saying their front page has been defaced. Stick to your action plan and to your client you will seem confident and unphased. That will relax them. The plan will also allow you to resolve the problem far faster.

First, find out what happened. Are you dealing with a security breach or has someone at the host company tripped over a power lead? Was the database compromised, or deleted, as a result of an attack or was your server simply unable to cope with too much traffic? You need to know what has happened in order to deal with it - a site going offline could be down to too many factors to just assume it is a security problem.

Assuming this is a security problem, the next step is to reassure the client. Let them know what has happened. If someone got into the database, no problem - all sensitive data is encrypted. If they've uploaded files to your server (quite possible), you'll have to delete all files and restore from a backup.

You've got to find out how the attacker broke into your system. Check log files, if you have access to them. Also, have a look at hacker and cracker web sites - many of them will list successful attacks against servers by various groups (these are often what are sometimes known as "script kiddies" - not hackers as such, but usually exploiting vulnerabilities found by others). You may well find your site listed and that listing will give you invaluable information. Look at other sites brought down by the same group at around the same time - you will often spot a theme (e.g. all sites that have been attacked were running the same version of IIS or Apache, were all running phpBB, or all are file repositories running on CFML).

If you are running any third party software on the site, check the distribution site and if necessary get in touch with them, especially if other sites running the same software appear to have been compromised.

It is very important that you fix any hole there may be before you restore the site. It would be wise to add a "We are currently undergoing essential maintenance" page, but do not fully restore the site before you have found out and fixed whatever the problem was - you'll be wasting your time.

Shared Hosting

Shared hosting is much cheaper than dedicated hosting, and is where several sites are all hosted on the same server. Most sites are hosted this way, and this brings with it its own set of security issues.

First and foremost, the security of your site is, in these circumstances, almost entirely out of your hands. It is dependant on the hosting company you are with. They may be excellent, or they may be crooks. Check reviews of a company before you select them, as they will have access to all the data you store with them. There is no harm in being automatically suspicious of your hosting company.

If they are completely above board (and most are), you are still not necessarily secure with shared hosting. The security measures they put in place are generally pretty simple. Shared hosting servers should always use PHP's safe mode (which disables many of the more advanced and dangerous features of PHP). That is what it is there for. However, many don't.

Vulnerabilities associated with shared hosting are, for the most part, out of your hands. A badly set up server will allow any site on that server to access files like /etc/passwd and httpd.conf, often giving them access to all other sites on the same server. It is possible to secure yourself to some degree against the effects of this vulnerability. Storing information in a database is recommended. Of course, if you then store your database login in a file, an attacked could access this information. In order to make this inaccessible to others on the same server, you could set database login information within the httpd.conf file, using environmental variables (you will need to ask your host company to add the lines to the httpd.conf file).

Better yet is to ensure that your host, if shared, uses safe mode. While this is still not 100% secure (nothing is), it does help make these attacks more difficult. A dedicated server is another, far better, option, but the expense may be prohibitive.

]]>Wed, 27 Jul 2005 08:58:00 +0000https://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-3/Dave Childguide,php,programming,security,tips,web,webdev,workBlock Prefetchinghttps://www.addedbytes.com/blog/block-prefetching/
Mozilla and Google's prefetching functions are a nice addition to browser technology in many ways. They are not without flaws, however.

Mozilla and Google's prefetching functions are a nice addition to browser technology in many ways. They are not without flaws, however. The main two problems with the prefetching idea are that it messes with log files and it means every link on a page could potentially be followed despite the consequences (dangerous in a site administration context).

It appears from the FAQ that Google only intends their accelerator to prefetch specific pages, that have been specified with the <link> tag. However, many people are claiming that normal links have been prefetched.

To prevent prefetching of a page is simple: add the following PHP to the page you do not want prefetched:

This will serve a "forbidden" header to the prefetcher. Normal browsing should be unaffected.

]]>Wed, 20 Apr 2005 15:16:00 +0000https://www.addedbytes.com/blog/block-prefetching/Dave Childblock,google,mozilla,php,prefetching,reference,webdevWriting Secure PHP, Part 2https://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-2/
In Writing Secure PHP, I covered a few of the most common security holes in websites. It's time to move on, though, to a few more advanced techniques for securing a website. As techniques for 'breaking into' a site or crashing a site become more advanced, so must the methods used to stop those attacks.

File Systems

Most hosting environments are very similar, and rather predictable. Many web developers are also very predictable. It doesn't take a genius to guess that a site's includes (and most dynamic sites use an includes directory for common files) is an www.website.com/includes/. If the site owner has allowed directory listing on the server, anyone can navigate to that folder and browse files.

Imagine for a second that you have a database connection script, and you want to connect to the database from every page on your site. You might well place that in your includes folder, and call it something like connect.inc. However, this is very predictable - many people do exactly this. Worst of all, a file with the extension ".inc" is usually rendered as text and output to the browser, rather than processed as a PHP script - meaning if someone were to visit that file in a browser, they'll be given your database login information.

Placing important files in predictable places with predictable names is a recipe for disaster. Placing them outside the web root can help to lessen the risk, but is not a foolproof solution. The best way to protect your important files from vulnerabilities is to place them outside the web root, in an unusually-named folder, and to make sure that error reporting is set to off (which should make life difficult for anyone hoping to find out where your important files are kept). You should also make sure directory listing is not allowed, and that all folders have a file named "index.html" in (at least), so that nobody can ever see the contents of a folder.

Never, ever, give a file the extension ".inc". If you must have ".inc" in the extension, use the extension ".inc.php", as that will ensure the file is processed by the PHP engine (meaning that anything like a username and password is not sent to the user). Always make sure your includes folder is outside your web root, and not named something obvious. Always make sure you add a blank file named "index.html" to all folders like include or image folders - even if you deny directory listing yourself, you may one day change hosts, or someone else may alter your server configuration - if directory listing is allowed, then your index.html file will make sure the user always receives a blank page rather than the directory listing. As well, always make sure directory listing is denied on your web server (easily done with .htaccess or httpd.conf).

------

Out of sheer curiosity, shortly after writing this section of this tutorial, I decided to see how many sites I could find in a few minutes vulnerable to this type of attack. Using Google and a few obvious search phrases, I found about 30 database connection scripts, complete with usernames and passwords. A little more hunting turned up plenty more open include directories, with plenty more database connections and even FTP details. All in, it took about ten minutes to find enough information to cause serious damage to around 50 sites, without even using these vulnerabilities to see if it were possible to cause problems for other sites sharing the same server.

-----

Login Systems

Most site owners now require an online administration area or CMS (content management system), so that they can make changes to their site without needing to know how to use an FTP client. Often, these are placed in predictable locations (as covered in the last article), however placing an administration area in a hard-to-find location isn't enough to protect it.

Most CMSes allow users to change their password to anything they choose. Many users will pick an easy-to-remember word, often the name of a loved one or something similar with special significance to them. Attackers will use something called a "dictionary attack" (or "brute force attack") to break this kind of protection. A dictionary attack involves entering each word from the dictionary in turn as the password until the correct one is found.

The best way to protect against this is threefold. First, you should add a turing test to a login page. Have a randomly generated series of letters and numbers on the page that the user must enter to login. Make sure this series changes each time the user tries to login, that it is an image (rather than simple text), and that it cannot be identified by an optical character recognition script.

Second, add in a simple counter. If you detect a certain number of failed logins in a row, disable logging in to the administration area until it is reactivated by someone responsible. If you only allow each potential attacker a small number of attempts to guess a password, they will have to be very lucky indeed to gain access to the protected area. This might be inconvenient for authentic users, however is usually a price worth paying.

Finally, make sure you track IP addresses of both those users who successfully login and those who don't. If you spot repeated attempts from a single IP address to access the site, you may consider blocking access from that IP address altogether.

Database Users

One excellent way to make sure that even if you have a problem with someone accessing your database who shouldn't be able to, you can limit the damage they can cause. Modern databases like MySQL and SQL Server allow you to control what a user can and cannot do. You can give users (or not) permission to create data, edit, delete, and more using these permissions. Usually, I try and ensure that I only allow users to add and edit data.

If a site requires an item be deleted, I will usually set the front end of the site to only appear to delete the item. For example, you could have a numeric field called "item_deleted", and set it to 1 when an item is deleted. You can then use that to prevent users seeing these items. You can then purge these later if required, yourself, while not giving your users "delete" permissions for the database. If a user cannot delete or drop tables, neither can someone who finds out the user login to the database (though obviously they can still do damage).

Powerful Commands

PHP contains a variety of commands with access to the operating system of the server, and that can interact with other programs. Unless you need access to these specific commands, it is highly recommended that you disable them entirely.

For example, the eval() function allows you to treat a string as PHP code and execute it. This can be a useful tool on occasion. However, if using the eval() function on any input from the user, the user could cause all sorts of problems. You could be, without careful input validation, giving the user free reign to execute whatever commands he or she wants.

There are ways to get around this. Not using eval() is a good start. However, the php.ini file gives you a way to completely disable certain functions in PHP - "disable_functions". This directive of the php.ini file takes a comma-separated list of function names, and will completely disable these in PHP. Commonly disabled functions include ini_set(), exec(), fopen(), popen(), passthru(), readfile(), file(), shell_exec() and system().

It may be (it usually is) worth enabling safe_mode on your server. This instructs PHP to limit the use of functions and operators that can be used to cause problems. If it is possible to enable safe_mode and still have your scripts function, it is usually best to do so.

Finally, Be Completely and Utterly Paranoid

Much as I hate to bring this point up again, it still holds true (and always will). Most of the above problems can be avoided through careful input validation. Some become obvious points to address when you assume everyone is out to destroy your site. If you are prepared for the worst, you should be able to deal with anything.

]]>Tue, 22 Mar 2005 16:53:00 +0000https://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-2/Dave Childdevelopment,imported,mysql,php,programming,security,web,webdev,workPassword Protect a Directory with .htaccesshttps://www.addedbytes.com/blog/code/password-protect-a-directory-with-htaccess/
Password protecting a directory can be done several ways. Many people use PHP or ASP to verify users, but if you want to protect a directory of files or images (for example), that often isn't practical. Fortunately, Apache has a built-in method for protecting directories from prying eyes, using the .htaccess file.

In order to protect your chosen directory, you will first need to create an .htaccess file. This is the file that the server will check before allowing access to anything in the same directory. That's right, the .htaccess file belongs in the directory you are protecting, and you can have one in each of as many directories as you like.

You'll need first to define a few parameters for the .htaccess file. It needs to know where to find certain information, for example a list of valid usernames and passwords. This is a sample of the few lines required in an .htaccess file to begin with, telling it where the usernames and passwords can be found, amongst other things.

You've now defined a few basic parameters for Apache to manage the authorisation process. First, you've defined the location of the .htpasswd file. This is the file that contains all the usernams and encrypted passwords for your site. We'll cover adding information to this file shortly. It's extremely important that you place this file outside of the web root. You should only be able to access it by FTP, not over the web.

The AuthName parameter basically just defines the title of the password entry box when the user logs in. It's not exactly the most important part of the file, but should be defined. The AuthType tells the server what sort of processing is in use, and "Basic" is the most common and perfectly adequate for almost any purpose.

We've told apache where to find files, but we've not told it who, of those people defined in the .htpasswd file, can access the directory. For that reason, we still have another line to define.

If we want to grant access to everyone in the .htpasswd file, we can add this line ("valid-user" is like a keyword, telling apache any user will do):

require valid-user

If we want to just grant access to a single user, we can use "user" and their username instead of "valid-user":

Now we have almost everything defined, but we are still missing an .htpasswd file. Without that, the server won't know what usernames and passwords are ok.

An .htpasswd file is made up of a series of lines, one for each valid user. Each line looks like this, with a username, then colon, then encrypted password:

username:encryptedpassword

The password encryption is the same as you'll find in PHP's crypt() function. It is not reversible, so you can't find out a password from the encrypted version. (Please note that on page 2 of this article is a tool to help you generate an .htpasswd file, that will help you encrypt passwords).

A user of "dave" and password of "dave" might be added with the following line:

dave:XO5UAT7ceqPvc

Each time you run an encryption function like "crypt", you will almost certainly get a different result. This is down to something called "salt", which in the above case was "XO" (first two letters of encrypted password). Different salt will give different encrypted values, and if not explicitly specified will be randomly generated. Don't worry though, the server is quite capable of understanding all this - if you come up with a different value for the encrypted password and replace it, everything would still work fine, as long as the password was the same.

Once you've created your .htpasswd file, you need to upload it to a safe location on your server, and check you've set the .htaccess file to point to it correctly. Then, upload the .htaccess file to the directory you want to protect and you'll be all set. Simply visit the directory to check it is all working.

.htpasswd Generator

The .htpasswd file needs encrypted passwords, which can be a problem for anyone without experience with a programming language. For that reason, I've created this simple tool, which, if you enter the username and password you wish to use, will generate the appropriate line to add to your .htpasswd file.