One of the most annoying things you need to deal with when you develop websites is spam and content scrapers.

If you allow people to contribute to your site by providing the ability to participate in forums, leave comments on blogs, etc. you will find that people will come to your site and try to leave links to promote their sites. Many of these links are to sites that search engines consider bad neighborhoods and would penalize you for linking to. Spam also degrades your real user's experience.

You can insert rel="nofollow" in links that users contribute, but that doesn't dissuade spammers. You may even have spam filtering capabilities in your software but sometimes spam will get through.

Content scrapers copy the content on your sites and create websites based on copyrighted content so they can attract search engine traffic. Even though they don't get much traffic out of it, and will eventually be dropped from Google's Index, the little bit they make is usually enough to make it worth it. The reason is that they do it in large volumes or they live in countries were a little bit of money to us is considered a lot.

You'll find that many of these spammers come from Asia and Russia and the bots they run tend to run on servers from places like ThePlanet.com.

Some people use Apache .htaccess files to block these types of visitors but I prefer to use ipfilter on servers I run. IPFilter is software that runs on many Linux and Unix servers, (such as Solaris) that provides firewall and network address translation (NAT) capabilities.

In this post I'll provide some information on how to use IPFilter to block spammers and content scrapers from your servers, as well as a list of ip address ranges you might want to block. I'll do my best to update the list of ip address ranges when I update them on my servers.

I prefer to use IPFilter instead of .htaccess files because it doesn't just block access to Apache Web Server, it can block those ip addresses from access to any service on your server. The reasoning is that they already showed one bad behavior so I don't want to give them access to show another.

Blocking Spam from Other Servers

I don't run mail services on my web servers. I sometimes run SMTP but that is only for outgoing mail if the web application needs it. So there is no need for servers to access my servers, only real people. So when I find a server accessing my site, I block the entire IP range for that hosting provider.

This may cause problems if you offer public web service APIs that other developers can use, but for many people, that's not the case.

One of the dedicated server providers that I've seen a lot of spam and scrapers coming from is ThePlanet.com. They don't seem to do a good job keeping these types of clients out of their network. So I choose to block them using IPFilter.

To block ThePlanet.com, or any other collocation/dedicated server provider you need to find all their IP Address ranges. This is easy to do because IP addresses are distributed from only a handful of entities. Just as there are domain name registrars, there are also IP address registrars.

The largest is ARIN, the American Registry for Internet Numbers. This is where I start my search. When I find the IP address of a spammer, content scraper, or any other address that is doing something I don't think they should be doing this is the process I go through to block them.

Get IP Address from Server Logs - you've gone through your logs and noticed strange activity from an ip address. You'll need to copy that ip address somewhere to decide if you want to block it.

Do an nslookup on the ip address - When you do an nslookup by typing nslookup ip.ad.dre.ess (substitute the actual IP address), you might get a response giving you the name for that ip address. This is a revers ip address lookup. If it looks like an IP address from a consumer ISP that provides internet connections to home users or businesses, such as Verizon, Comcast, OptimumOnline, RoadRunner, or even business users, there's really not much I can do. Many people use these addresses since they are usually dynamic so I don't want to block everyone, just because of one bad user.

Do an IP Address Lookup - nslookup may not be able to resolve the hostname based on the ip address, but you're not done yet. You can do a ARIN whois lookup to find out who owns that domain range. If you get more than one name associated with that IP address, it means that the person that owns it bought it from someone else that bought it from ARIN. The IP Address Block Owner you're looking for is the one on the bottom of the list returned by ARIN.

Sometimes it's obvious, based on the name of the company, that the IP address range is used for hosting services. Sometimes you need to Google their name and visit their website to see what they do.

Find all IP Addresses for a Hosting Company - if you determined that the it's a hosting company, and you don't want traffic from other servers on your site, then you should find all their IP Addresses.

You do this by going back to ARIN's whois and doing a search for the domain name of the provider. For instance theplanet.com as shown in the screenshot on the right.

This will give you a list of all the IP Address Ranges that they own. You'll need to find the starting IP Address and the Netmask to use to filter that address range. ARIN provides this information for you. On the results list, it shows you the address range, such as 12.156.0.0 - 12.156.7.255. The netmask for that range is 255.255.248.0 so you would use 12.156.0.0/255.255.248.0 the inverse CIDR which would be 12.156.0.0/21.

The shortest way is to use the inverse CIDR, and ARIN provides that for you. You need to click on a link in the results to view the full details for that range. The link will look something like NET-12-156-0-0-1. Click on that and look for the range in the appropriate format. In this case it will be 12.156.0.0/21.

You now use that information to enter it into your IPFilter config file, normally located at /etc/ipf/ipf.conf on most Unix and Unix-like operating systems. Edit that file and add the following line:

block in quick on if0 from 12.156.0.0/21 to any

You need to replace if0 with the appropriate interface name for your nic card. To make it easier for you, I included all of all the exploited servers and I added my updated ip address range for ThePlanet.com and formatted it for your convenience. I started with the exploited server blocklist from Wizcrafts.net. Remember to replace if0 with the appropriate network interface on your server.

# This is Wizcrafts' Exploited Servers blocklist, in iptables APF format, for use in Linux/Apache web server firewalls.
## Compiled and published by Wizcrafts Computer Services - http://www.wizcrafts.net/
### We have safer version of this file, in .htaccess format, at: http://www.wizcrafts.net/exploited-servers-blocklist.html
#### This time consuming work is supported by donations from people who use and benefit from this blocklist.
##### Please make your donation here: http://www.wizcrafts.net/payments.html - Thanks in advance :-)
###### This page was last updated on: January 7, 2009 (may include multiple updates per day)
########## READ THESE NOTES ##########
# All of the CIDRs in this list are here because they host un-secured exploited servers, or client websites.
# Some of these servers/websites are used for spamming, while others attempt hostile script redirects or scraping.
# This list includes CIDRs for German based spammers using "Schlund + Partner AG" and "1&1 Internet AG" servers.
# This list of IP CIDRs should go into a file named "deny_hosts.rules" which is managed by your APF directives.
### Be careful! Your own web host's, or dedicated server's IP may be included in this list.
# If so, you and your users will be totally blocked from HTTP, FTP and Email access!
# If your server's IP is covered by a CIDR in this list, remove it before installing this blocklist!
# You can also direct an APF firewall to allow your own IP addresses, via an included file, named "allow_hosts.rules"
## The .htaccess version of this blocklist is safer to use, as it doesn't normally lock out access to your mail or ftp servers.
# Exploited - shared, VPS and dedicated web servers, listed by the entire CIDR assigned to each hosting company.
block in quick on if0 from 24.172.171.18 to any
block in quick on if0 from 38.100.22.104/29 to any
block in quick on if0 from 38.100.22.112/28 to any
block in quick on if0 from 38.100.22.128/26 to any
block in quick on if0 from 62.21.96.0/22 to any
block in quick on if0 from 62.75.202.0/24 to any
block in quick on if0 from 62.141.48.0/20 to any
block in quick on if0 from 62.141.56.0/21 to any
block in quick on if0 from 62.149.128.0/17 to any
block in quick on if0 from 64.15.138.160/27 to any
block in quick on if0 from 64.20.32.0/19 to any
block in quick on if0 from 64.22.64.0/18 to any
block in quick on if0 from 64.27.0.0/19 to any
block in quick on if0 from 64.34.176.0/20 to any
block in quick on if0 from 64.38.0.0/18 to any
block in quick on if0 from 64.91.224.0/19 to any
block in quick on if0 from 64.92.199.0/24 to any
block in quick on if0 from 64.92.200.0/24 to any
block in quick on if0 from 64.118.80.0/20 to any
block in quick on if0 from 64.182.0.0/16 to any
block in quick on if0 from 64.185.224.0/20 to any
block in quick on if0 from 64.191.0.0/17 to any
block in quick on if0 from 65.23.153.0/24 to any
block in quick on if0 from 65.36.128.0/17 to any
block in quick on if0 from 65.98.0.0/17 to any
block in quick on if0 from 65.167.19.30 to any
block in quick on if0 from 65.182.188.0/22 to any
block in quick on if0 from 66.7.192.0/19 to any
block in quick on if0 from 66.35.39.128/25 to any
block in quick on if0 from 66.38.130.192/26 to any
block in quick on if0 from 66.49.128.0/17 to any
block in quick on if0 from 66.79.167.128/25 to any
block in quick on if0 from 66.79.168.134/32 to any
block in quick on if0 from 66.90.64.0/18 to any
block in quick on if0 from 66.116.125.0/24 to any
block in quick on if0 from 66.148.64.0/18 to any
block in quick on if0 from 66.154.0.0/18 to any
block in quick on if0 from 66.154.64.0/19 to any
block in quick on if0 from 66.160.186.0/24 to any
block in quick on if0 from 66.186.36.195 to any
block in quick on if0 from 66.197.128.0/17 to any
block in quick on if0 from 66.199.224.0/19 to any
block in quick on if0 from 66.225.212.0/22 to any
block in quick on if0 from 66.232.96.0/19 to any
block in quick on if0 from 66.235.160.0/19 to any
block in quick on if0 from 66.235.192.0/19 to any
block in quick on if0 from 67.131.248.0/24 to any
block in quick on if0 from 67.159.0.0/18 to any
block in quick on if0 from 67.205.69.32/27 to any
block in quick on if0 from 67.228.0.0/16 to any
block in quick on if0 from 69.13.0.0/16 to any
block in quick on if0 from 69.16.192.0/18 to any
block in quick on if0 from 69.31.40.0/21 to any
block in quick on if0 from 69.31.80.0/21 to any
block in quick on if0 from 69.50.160.0/19 to any
block in quick on if0 from 69.60.111.0/24 to any
block in quick on if0 from 69.64.64.0/20 to any
block in quick on if0 from 69.65.0.0/18 to any
block in quick on if0 from 69.65.20.0/22 to any
block in quick on if0 from 69.73.128.0/18 to any
block in quick on if0 from 69.93.241.192/27 to any
block in quick on if0 from 70.87.208.34 to any
block in quick on if0 from 72.18.150.0/23 to any
block in quick on if0 from 72.21.32.0/19 to any
block in quick on if0 from 72.22.64.0/19 to any
block in quick on if0 from 72.29.64.0/19 to any
block in quick on if0 from 72.32.0.0/16 to any
block in quick on if0 from 72.36.128.0/17 to any
block in quick on if0 from 72.36.168.153/29 to any
block in quick on if0 from 72.51.32.0/20 to any
block in quick on if0 from 72.52.116.40/29 to any
block in quick on if0 from 72.52.128.0/17 to any
block in quick on if0 from 72.55.128.0/18 to any
block in quick on if0 from 72.232.0.0/16 to any
block in quick on if0 from 72.233.0.0/17 to any
block in quick on if0 from 72.249.32.0/23 to any
block in quick on if0 from 74.50.0.0/20 to any
block in quick on if0 from 74.50.96.0/19 to any
block in quick on if0 from 74.63.64.0/18 to any
block in quick on if0 from 74.86.0.0/16 to any
block in quick on if0 from 74.124.192.0/24 to any
block in quick on if0 from 74.200.192.0/18 to any
block in quick on if0 from 74.208.15.0/24 to any
block in quick on if0 from 74.208.16.0/24 to any
block in quick on if0 from 74.208.64.0/19 to any
block in quick on if0 from 77.92.88.0/23 to any
block in quick on if0 from 78.46.0.0/15 to any
block in quick on if0 from 78.129.208.0/24 to any
block in quick on if0 from 79.32.0.0/15 to any
block in quick on if0 from 79.135.160.0/19 to any
block in quick on if0 from 80.67.25.0/24 to any
block in quick on if0 from 80.67.27.0/24 to any
block in quick on if0 from 80.69.92.0/25 to any
block in quick on if0 from 80.86.80.0/20 to any
block in quick on if0 from 80.92.64.0/19 to any
block in quick on if0 from 80.237.128.0/17 to any
block in quick on if0 from 81.19.183.0/27 to any
block in quick on if0 from 81.29.70.0/24 to any
block in quick on if0 from 81.169.144.0/20 to any
block in quick on if0 from 82.61.0.0/16 to any
block in quick on if0 from 82.99.30.0/25 to any
block in quick on if0 from 82.165.128.0/17 to any
block in quick on if0 from 82.208.60.0/22 to any
block in quick on if0 from 83.65.62.0/24 to any
block in quick on if0 from 83.149.90.0/24 to any
block in quick on if0 from 84.19.176.0/20 to any
block in quick on if0 from 85.8.128.0/18 to any
block in quick on if0 from 85.10.192.0/18 to any
block in quick on if0 from 85.17.0.0/16 to any
block in quick on if0 from 85.25.0.0/16 to any
block in quick on if0 from 85.88.12.0/24 to any
block in quick on if0 from 85.113.224.0/19 to any
block in quick on if0 from 85.114.140.0/22 to any
block in quick on if0 from 87.106.0.0/16 to any
block in quick on if0 from 87.118.64.0/18 to any
block in quick on if0 from 87.118.96.0/19 to any
block in quick on if0 from 82.165.0.0/16 to any
block in quick on if0 from 87.230.0.0/20 to any
block in quick on if0 from 87.237.60.64/27 to any
block in quick on if0 from 87.253.128.0/19 to any
block in quick on if0 from 87.253.176.0/21 to any
block in quick on if0 from 88.208.238.0/24 to any
block in quick on if0 from 89.138.0.0/16 to any
block in quick on if0 from 89.149.192.0/18 to any
block in quick on if0 from 89.163.128.0/17 to any
block in quick on if0 from 91.121.0.0/16 to any
block in quick on if0 from 91.186.0.0/19 to any
block in quick on if0 from 91.192.116.0/22 to any
block in quick on if0 from 92.48.64.0/18 to any
block in quick on if0 from 92.48.65.0/24 to any
block in quick on if0 from 92.48.112.64/26 to any
block in quick on if0 from 92.56.0.0/16 to any
block in quick on if0 from 92.243.8.0/21 to any
block in quick on if0 from 94.102.48.0/20 to any
block in quick on if0 from 193.164.132.0/23 to any
block in quick on if0 from 193.192.58.0/23 to any
block in quick on if0 from 193.254.184.0/24 to any
block in quick on if0 from 194.116.186.0/23 to any
block in quick on if0 from 195.56.55.0/28 to any
block in quick on if0 from 195.56.189.32/28 to any
block in quick on if0 from 195.225.176.0/22 to any
block in quick on if0 from 195.234.171.0/24 to any
block in quick on if0 from 195.242.98.0/23 to any
block in quick on if0 from 200.63.40.0/22 to any
block in quick on if0 from 204.13.64.0/21 to any
block in quick on if0 from 205.177.79.0/24 to any
block in quick on if0 from 205.178.128.0/18 to any
block in quick on if0 from 205.234.96.0/20 to any
block in quick on if0 from 205.234.132.0/24 to any
block in quick on if0 from 206.51.224.0/20 to any
block in quick on if0 from 206.188.0.0/26 to any
block in quick on if0 from 206.190.65.128/25 to any
block in quick on if0 from 207.58.128.0/18 to any
block in quick on if0 from 207.150.188.0/24 to any
block in quick on if0 from 207.234.128.0/17 to any
block in quick on if0 from 208.53.128.0/18 to any
block in quick on if0 from 208.66.68.0/22 to any
block in quick on if0 from 208.66.194.160/28 to any
block in quick on if0 from 208.71.128.0/22 to any
block in quick on if0 from 208.99.192.0/19 to any
block in quick on if0 from 208.101.0.0/18 to any
block in quick on if0 from 208.109.0.0/16 to any
block in quick on if0 from 208.112.107.20 to any
block in quick on if0 from 208.184.65.0/24 to any
block in quick on if0 from 209.2.34.112/28 to any
block in quick on if0 from 209.9.240.0/21 to any
block in quick on if0 from 209.25.128.0/17 to any
block in quick on if0 from 209.40.192.0/20 to any
block in quick on if0 from 209.59.167.50/31 to any
block in quick on if0 from 209.66.122.0/24 to any
block in quick on if0 from 209.85.0.0/17 to any
block in quick on if0 from 209.97.192.0/19 to any
block in quick on if0 from 209.126.128.0/17 to any
block in quick on if0 from 209.160.0.0/18 to any
block in quick on if0 from 209.160.64.0/20 to any
block in quick on if0 from 209.163.169.0/24 to any
block in quick on if0 from 209.172.32.0/19 to any
block in quick on if0 from 209.200.0.0/18 to any
block in quick on if0 from 209.205.0.0/18 to any
block in quick on if0 from 213.165.64.0/19 to any
block in quick on if0 from 213.194.149.0/24 to any
block in quick on if0 from 213.225.101.128/27 to any
block in quick on if0 from 216.17.96.0/20 to any
block in quick on if0 from 216.32.64.0/19 to any
block in quick on if0 from 216.93.160.0/19 to any
block in quick on if0 from 216.104.37.120/29 to any
block in quick on if0 from 216.120.224.0/19 to any
block in quick on if0 from 216.180.224.0/19 to any
block in quick on if0 from 216.182.224.0/20 to any
block in quick on if0 from 216.185.128.0/24 to any
block in quick on if0 from 216.242.44.96 to any
block in quick on if0 from 216.245.192.0/20 to any
block in quick on if0 from 216.255.176.0/20 to any
block in quick on if0 from 217.20.208.0/20 to any
block in quick on if0 from 217.70.128.0/22 to any
block in quick on if0 from 217.70.132.0/23 to any
block in quick on if0 from 217.169.46.96/28 to any
block in quick on if0 from 217.172.187.0/24 to any
block in quick on if0 from 217.197.152.0/24 to any
# Proxy servers and services and hosting companies with proxy server clients, listed by the full CIDR of the hosting company.
block in quick on if0 from 61.206.125.0/24 to any
block in quick on if0 from 62.171.194.0/23 to any
block in quick on if0 from 75.126.0.0/16 to any
block in quick on if0 from 80.33.0.0/16 to any
block in quick on if0 from 80.58.0.0/16 to any
block in quick on if0 from 81.12.0.0/17 to any
block in quick on if0 from 83.16.154.152/29 to any
block in quick on if0 from 85.10.219.104/29 to any
block in quick on if0 from 85.92.130.0/24 to any
block in quick on if0 from 85.185.0.0/16 to any
block in quick on if0 from 88.198.241.104/29 to any
block in quick on if0 from 88.198.252.144/29 to any
block in quick on if0 from 145.253.239.8/29 to any
block in quick on if0 from 150.188.0.0/15 to any
block in quick on if0 from 193.164.131.0/24 to any
block in quick on if0 from 194.112.195.202 to any
block in quick on if0 from 198.145.112.128/25 to any
block in quick on if0 from 198.145.182.0/26 to any
block in quick on if0 from 200.30.64.0/20 to any
block in quick on if0 from 200.43.108.0/24 to any
block in quick on if0 from 200.75.128.0/20 to any
block in quick on if0 from 200.126.112.0/20 to any
block in quick on if0 from 200.172.222.0/26 to any
block in quick on if0 from 200.202.192.0/18 to any
block in quick on if0 from 200.210.0.0/16 to any
block in quick on if0 from 203.160.0.0/23 to any
block in quick on if0 from 207.44.128.0/17 to any
block in quick on if0 from 207.210.192.0/18 to any
block in quick on if0 from 208.72.159.68 to any
block in quick on if0 from 208.110.68.144/29 to any
block in quick on if0 from 216.104.32.0/20 to any
# Individual Proxy Server IPs
block in quick on if0 from 64.20.205.251 to any
block in quick on if0 from 64.202.161.130 to any
block in quick on if0 from 66.6.122.130 to any
block in quick on if0 from 66.36.230.163 to any
block in quick on if0 from 66.37.153.74 to any
block in quick on if0 from 66.63.167.166 to any
block in quick on if0 from 66.79.162.102 to any
block in quick on if0 from 66.212.18.89 to any
block in quick on if0 from 69.50.208.74 to any
block in quick on if0 from 69.94.124.137 to any
block in quick on if0 from 72.55.146.175 to any
block in quick on if0 from 72.167.115.164 to any
block in quick on if0 from 74.208.16.108 to any
block in quick on if0 from 75.175.243.195 to any
block in quick on if0 from 76.76.15.73 to any
block in quick on if0 from 77.235.40.189 to any
block in quick on if0 from 85.92.130.117 to any
block in quick on if0 from 88.198.5.220 to any
block in quick on if0 from 88.214.192.24 to any
block in quick on if0 from 91.186.21.78 to any
block in quick on if0 from 206.221.184.108 to any
block in quick on if0 from 208.100.20.148 to any
# ThePlanet.com and Everyones Internet; home of many spammers, hackers and trojan horses.
block in quick on if0 from 12.156.0.0/21 to any
block in quick on if0 from 12.96.160.0/21 to any
block in quick on if0 from 207.218.192.0/18 to any
block in quick on if0 from 216.12.192.0/19 to any
block in quick on if0 from 174.132.0.0/15 to any
block in quick on if0 from 216.234.224.0/19 to any
block in quick on if0 from 216.185.96.0/19 to any
block in quick on if0 from 216.40.192.0/18 to any
block in quick on if0 from 69.41.224.0/19 to any
block in quick on if0 from 64.246.0.0/18 to any
block in quick on if0 from 216.127.64.0/19 to any
block in quick on if0 from 207.44.128.0/17 to any
block in quick on if0 from 64.5.32.0/19 to any
block in quick on if0 from 69.56.128.0/17 to any
block in quick on if0 from 66.98.128.0/17 to any
block in quick on if0 from 69.93.0.0/16 to any
block in quick on if0 from 67.15.0.0/16 to any
block in quick on if0 from 70.84.0.0/14 to any
block in quick on if0 from 209.85.0.0/17 to any
block in quick on if0 from 74.52.0.0/14 to any
block in quick on if0 from 209.62.0.0/17 to any
block in quick on if0 from 75.125.0.0/16 to any
# Rackspace - Hackers, spammers, scammers and phishers
block in quick on if0 from 69.20.0.0/17 to any
block in quick on if0 from 72.3.128.0/17 to any
block in quick on if0 from 72.32.0.0/16 to any
block in quick on if0 from 74.205.0.0/17 to any
# Performance Systems International (PSI) (Spies) (entire CIDR = 38.0.0.0/8 - blocking this is not advised)
block in quick on if0 from 38.100.41.64/26 to any
# End of file

After you make the changes, you'll need to restart the ipfilter service. I use Solaris for my webservers now and the command to restart ipfilter is svcadm restar ipfilter after you issue that command run svcs ipfilter and make sure the service is still running. If it's in maintenance mode you'll need to make sure you don't have any errors in /etc/ipf/ipf.conf.

You can use the techniques above to block any webhost, dedicated server or colocation provider.

Blocking entire Countries

The above helps you block automated scripts on servers, but in less developed contries, where scamming a couple of dollars a day means a lot, there may be real people that are trying to steal your content, leave spam or in some cases are being paid to click on advertisements. These people sometimes click on other ads too to try to disguise their actions.

It really sucks to have to do this, but if you run a website that doesn't do business in these countries or your advertisers do not target individuals in these countries, it's just easier to block them because the ratio of spammers and scammers is likely going to be higher than legitimate users.

Wizcrafts.net maintains a list of bad ip address ranges that I started my ipfliter list from. I then add ip ranges that I find that weren't on the list.

To convert the iptables format to ipfilter format, I simply open the list in a text editor and use two regular expressions (regexp) to add the ipfilter specific text.

There's probably an easier way to do it in one step, but I don't use regexps much and can only remember simple ones :) I start with ^[0-9] and replace them with block in quick on if0 \0 to add the block directive to the beginning of all lines that start with a number and [0-9]$ and replace with \0 to any to add the end of the line text.

Did you like this article?

If so please take a moment to share it by clicking on a service button bellow.

0
comments:

Post a Comment

Information on how to make honest money online by taking your ideas and turning them into successful websites in a sustainable fashion. Topics covered will be:

Technical information for the website developer

Internet marketing tips and tricks that won't get you in trouble

Making the most of your website

Ways to promote your website

What you wont find here are lies about how making money online is easy or scams to line my pockets. Any advertisements I place on here will be for products and services I think are good and not the $49 I make $x,xxx a day eBooks.