Finding Causes of Heavy Usage

You can use the PuTTY client on Windows, or SSH on UNIX and UNIX-like systems such as Linux or Mac OS X.Your account must be configured for shell access in the Control Panel.More information may be available on the article's talk page.

Overview

It’s possible that your site sometimes uses more resources than you expect. Because of this, you may notice your site loading slowly or not at all (downtime). This may be due to heavy usage on your website.

The main causes of heavy usage are bots and inefficient scripts.

Many plugins and modules on dynamic websites that offer additional functionality can also compromise performance. Reducing plugins and modules will almost always result in consuming fewer resources (unless the plugin or module is specifically used for the purpose of making the site more efficient, such as a caching plugin or an anti-spam plugin).

Since you would lose the functionality of the plugin by removing it, this wiki guides you through alternatives to find the causes of heavy usage and mitigate their impact.

Viewing your access.log

You can confirm exactly what is hitting your site in your access.log file.

If you have a VPS or Dedicated plan

Generates a list of all traffic for all domains (for multiple domains on a VPS or Dedicated server).

You can run this command from within any directory.

tail -f -q /home/*/logs/*/http/access.log

Watches your server logs in real-time to see if the issue presents itself with a specific IP (for intermittent issues).

You can run this command from within any directory.

Bots, spiders, and crawlers

Bots, spiders, and other crawlers hitting your dynamic pages can cause extensive resource (memory and CPU) usage. This can lead to high load on the server and slow down your site(s).

One option to reduce server load from bots, spiders, and other crawlers is to create a "robots.txt" file at the root of your site/domain. This tells search engines what content on your site they should and should not index. This can be helpful, for example, if you want to keep a portion of your site out of the Google search engine index.

If you prefer not to create this file yourself, you can have DreamHost create one for you automatically (on a per-domain basis) on the (Panel > 'Goodies' > 'Block Spiders') page.

Note:

While most of the major search engines respect robots.txt directives, this file only acts as a suggestion to compliant search engines and does not prevent search engines (or other similar tools, such as email/content scrapers) from accessing the content or making it available.

Blocking robots

The problem may be that Google, Yahoo, or another search engine bot is over-browsing your site. (This is the sort of problem that feeds on itself; if the bot is not able to complete its search because of a lack of resources, it may launch the same search over and over again.)

Blocking Googlebots

In the following example, the IP of 66.249.66.167 was found in your access.log. You can check which company this IP belongs to by running the ‘host’ command:

Blocking Yahoo

Yahoo's crawling bots comply to the crawl-delay rule in robots.txt which limits their fetching activity. For example, to tell Yahoo not to fetch a page more than once every 10 seconds, you would add the following:

# slow down Yahoo
User-agent: Slurp
Crawl-delay: 10

Explanation of the fields above:

# slow down Yahoo

This is a comment which is only used so you know why you created this rule.

User-agent: Slurp

Slurp is the Yahoo User-agent name. You must use this to block Yahoo.

Crawl-delay

Tells the User-agent to wait 10 seconds between each request to the server.

View further information about Yahoo robots by clicking the following:

Once your account is created, you can set the crawl rate and generate a robots.txt.

Blocking all bots

To disallow all bots:

User-agent: *
Disallow: /

To disallow them on a specific folder:

User-agent: *
Disallow: /yourfolder/

Warning:

Bad bots may use this content as a list of targets.

Explanation of the fields above:

User-agent: *

Applies to all User-agents.

Disallow: /

Disallows the indexing of everything.

Disallow: /yourfolder/

Disallows the indexing of this single folder.

Use caution

Blocking all bots (User-agent: *) from your entire site (Disallow: /) will get your site de-indexed from legitimate search engines. Also, note that bad bots will likely ignore your robots.txt file, so you may want to block their user-agent with Htaccess.

Bad bots may also use your robots.txt file as a target list, so you may want to skip listing directories in robots.txt. Bad bots may also use false or misleading User-agents, so blocking User-agents with .htaccess may not work as well as anticipated.

If you don't want to block anyone, this is a good default robots.txt file:

User-agent: *
Disallow:

You may need to remove robots.txt in this case, if you don't mind 404 requests in your logs.

DreamHost recommends that you only block specific User-agents and files/directories, rather than *, unless you're 100% sure that's what you want.

Blocking bad referrers

For detailed instructions, please visit the wiki on how to block referrers.

My unique IP is making a lot of connections

You may find in your access.log that your site’s Unique IP is making a lot of connections. This is not an issue and can be safely ignored.

This occurs because Apache is internally generating these connections in order to shut down unneeded processes.