How sites are attacked and what you can do to protect yourself

We all know that securing your website is important, but sometimes it can seem a distant, somewhat theoretical task.

Well, let me help you focus your minds (and perhaps instil a healthy level of fear) by making it all a little more real.

I'll describe exactly how I would go about compromising your website, using simple methods and tools that are freely available to all. You may be surprised, even alarmed by how straightforward this is.

But don't panic! I'll also outline some of the countermeasures you can use to deter the would-be hacker and keep yourself safe. The scenario is fictional but based on my experience of years in the web industry working as an ethical hacker – employed by organisations to hack their systems in order to uncover vulnerabilities. My remit is to gather as much information as possible from the sites and servers of Company A. To start with, I only know the URL.

1. Passive reconnaissance

Reconnaissance refers to the preparatory phase where an attacker gathers as much information as possible about the target prior to the attack. Passive reconnaissance involves techniques such as gathering publicly available information, using search engines, social engineering and dumpster diving – going through the bins.

Active reconnaissance (stage 2) involves using tools to actively interact with the target, such as network scanning and banner grabbing. Footprinting is the term used for collating the security profile of the organisation. Information unveiled at various network levels can include details regarding domain name, network blocks, network services, network applications, systems architecture, IP addresses, phone numbers, addresses, contact names, historical changes and application information to name but a few.

Footprinting is important to gather information about the technologies being used. With this information, the attack can be more focused.

Step 1: URL/networking info

Ethical hacker: "I need to gather information about the URL, so I'll use a WHOIS tool to get that. This provides me with lots of information including contact names, telephone numbers and Name Servers. Excellent, I now have two name server records, which I can test later.

I'll perform a reverse WHOIS on the IP address of the Name Servers now to see what I can find. I now have the IP block and will scan the range to see what servers are hosted on it later. Just by using one simple tool I've gathered the registrar company name, address, UK address, UK limited company number, name server IP addresses and the network block."

Countermeasures: There's very little you can do about this, I'm afraid. As an individual, you can opt out of the WHOIS.

Step 2: URL gathering

Ethical hacker: "I know the URL, so I'll use serversniff.net to discover subdomains and possibly other servers. These URLs may contain details about a company's products, partners, intranet and so on. I now have a list of all the subdomains on that server – Shop and Chat. I bet there's some old software I can exploit that's been forgotten about."

Countermeasures: Lots of firms have subdomains for hosting test sites, internal development, web mail etc. It's easy to find these subdomains by querying the Name Servers we discovered in step 1. Rather than create the subdomains on internet-facing DNS, you could have them on internal DNS only.

Step 3: IP address scanning

Ethical hacker: "Now I have several IPs that I can scan for host names from the WHOIS Net Block. I can scan each of these servers for any additional domains that they're hosting. This could lead to more vulnerabilities and information leaks. Again, I'll use serversniff.net."

Countermeasures: Each IP address has a domain name associated with it. This can be found by doing a simple HOST query on the IP. What this doesn't tell you is all the other domains hosted on the server. Apache and IIS allow multiple server names (or websites) on each server. Finding these additional websites may lead to new sources of information or old sites that have been forgotten. Again, there's not a lot that can be done to prevent this.

Step 4: Google hacking

Google hacking is the art of creating complex search engine queries in order to filter information related to the target. In its malicious form, it can be used to detect websites that are vulnerable to numerous exploits and vulnerabilities. It can also locate private, sensitive information about the target. The techniques can also be used on Bing, Yahoo etc.

To find all the pages for www.example.com that have been cached by the search engine, use "site:www.example.com". And to search within the cached pages, use "site:www.example.com search term". There are a plethora of searches available to find information about the target without actually touching the site.

Ethical hacker: "Rather than go straight to the website, I now want to see what the search engines have cached. Server administrators often forget that the search engines are spidering their servers and will leave valuable information open or worse still, allow access to sensitive documents. I'll use Google today and see what I can find. I find some internal documents and case studies. I'll also have a dig around archive.com to see if there's any old information about the company."

Countermeasures: You can restrict what the search engines spider with the robots.txt file. However, not all search engines read this file. Also be aware that, as you'll see later, the robots.txt file can be used to find directories you think you're hiding.