Optimus Cache Prime

Optimus Cache Prime (OCP) is a smart cache preloader for websites with XML sitemaps. It crawls all URLs in a given sitemap so the web server builds cached versions of the pages before visitors or search engine spiders arrive.

Since Google began penalizing websites with long response times in their rankings, serving all of your pages quickly has become more important than ever. Optimus Cache Prime helps you do that by making sure your cache — be it an in-memory cache like memcached or APC, or a flat file cache like WP Super Cache or W3 Total Cache — is primed so random requests are served lightning fast.

Prime each URL in a remote sitemap only if a cached version of the page doesn’t already exist, and stop after priming 20 (uncached) pages

./ocp –print http://mysite.com/sitemap.xml | xargs curl -I

Don’t prime, but run curl -I to get the response headers from each of the URLs in a sitemap or set of nested sitemaps

Run ./ocp without any parameters to get an overview and explanation of all available options.

Features

Cache checking

Nested sitemap/sitemap index support

Priming an arbitrary number of URLs simultaneously

HTTP KeepAlive and session re-use (when applicable)

Warns if the pages in your sitemap don’t load, return a 404 Not Found status, etc.

Customizable User-Agent (–ua flag)

Doubles as a general-purpose (nested) sitemap parser for use with external commands (–print flag)

Run OCP on any machine specifying a target sitemap — e.g. one that links to all of your pages, or one that lists only the high-priority pages — and Optimus primes the links therein.

If run locally on your web server, OCP can probe your static file cache before making any requests to your web server, reducing the amount of requests and redundant log messages drastically. Pages are only crawled if they aren’t already cached.

OCP checks up to 10,000 pages per second with local mode enabled if the cache is mostly primed from previous runs. Local mode was designed for use with W3 Total Cache and WP Super Cache for WordPress, but will work with any system that uses a URL-relative flat file cache (i.e. /about/ is cached as e.g. ‘about’ or ‘about/index.html’ on the disk.)

FAQ

Q: How do I install and use OCP?A: On Windows, download and extract the zip file above, then run ocp.exe either through a command prompt, or by right-clicking ocp.exe, making a shortcut, changing the parameters for that shortcut (in Properties) to e.g.: “D:.exe” -c 10 http://mysite.com/sitemap.xml, and then running it. On Linux, copy the link for your architecture above, then run curl -s <link> | tar xvz, and you’re good to go. cd ocp, and run e.g. ./ocp -c 10 http://mysite.com/sitemap.xml.

Q: Do I need to have WordPress, W3 Total Cache, WP Super Cache, memcached, … to use OCP?A: No. All you need is an XML sitemap. To use Local mode you need something which stores its cached pages with file/directory names that are relative to the original URLs. (Both W3 Total Cache and WP Super Cache do just that.)

Q: Can you demonstrate how to use Local mode?A: You can run ocp with parameters like:

Translation: Look for already-cached files in /var/www/patrickmylund.com/wp-content/cache/supercache/patrickmylund.com, where the cached file for e.g. https://patrickmn.com/about/ is <path>/about/index.html

Q: How do I know if Local mode is working?A: You shouldn’t see any requests from “Optimus Cache Prime” in your web server’s access log, and runs subsequent to the first should complete in less than a second.

Q: How do I preload the cache regularly?A: The easiest way is to set up a cron job. On most Linux distributions you can do this by adding a cron entry using crontab -e. The entry can be e.g. /5 * * * /home/patrick/ocp https://patrickmn.com/sitemap.xml, which will run OCP every five minutes. For more information, see Ubuntu’s Cron Howto.

(Note that Cron’s environment/path is very minimal, and you might need to use full paths to your commands.)

Q: How can I make sure only one OCP process runs at the same time?A: If you’d like to run OCP e.g. every minute via cron, but don’t want several copies of the program to launch if the priming process is taking a while, you can use one.sh to launch OCP.

Changelog

Compiled with the latest Go release, fixing some issues with priming sites using new SSL configurations

The new command-line parameter –insecure-ssl allows you to skip SSL certificate verification when priming sites using self-signed certificates

The Windows version of OCP is now 64-bit (let me know if you need a 32-bit version)

2.6 – 2012-04-23

The new command-line parameter –ua allows you to customize the User-Agent header set by OCP in each GET request. (Can be used if a site behaves differently depending on whether the requests “come from” a mobile device, Firefox, Internet Explorer, and so on.) Example: ocp -ua “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)” https://patrickmn.com/sitemap.xml

2.5 – 2012-02-11

Fixed a bug where OCP would crash when unable to establish a TCP connection to the server with a remote sitemap

Fixed a bug where OCP would crash when unable to establish a TCP connection to the host specified in the URL from an entry in a sitemap

Significant performance improvement when priming many uncached pages

Sitemap parser is now case-sensitive (per the XML standard): sitemaps must have all-lower-case tags

It is now apparent from the log shown with -v whether a GET request is sent to the web server, or if the cached page already exists on disk

2.3 - 2011-11-23

New command-line flag, –print (used exclusively), which causes OCP to simply print all of the URLs read from the sitemap (or set of nested sitemaps) sorted by priority. This can be used with xargs to run arbitrary commands on the URLs, e.g. ocp –print sitemap.xml | xargs curl -I

2.2 - 2011-11-17

OCP now reports dead/broken links and pages that can’t be loaded (e.g. HTTP status 500 is received from the web server). The warnings can be turned off with the –no-warn flag

2.1 - 2011-11-17

Sitemapindex/nested sitemaps support; OCP will now prime all URLs in all listed sitemaps of a Sitemapindex XML file