The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Optimization & Caching solutions Anyone?

We've been thinking on and off about some sort of optimisation/caching solution for our flagship production site.

It's not that it receives a massive number of page impressions, just that many pages do feature hefty database queries (using MySQL 3.23.xx so no subquries here) and customers with 56k+ connections, using old machines (pre 2001 Macs for example) do often moan about the speed of the service (Oh good - the data-quality is OK then! )

I know about adding indexes to those tables with many rows (Of which our biggest table has something like 15-20,000) - although I've read that while this can yield query speed enhancements it also adds to overall server memory/cpu load. (This said we're due to transfer the site and it's LAMP setup to a very fast machine indeed... *rubs hands*)

I'd welcome your opinions, experiences and thoughts, with regard to caching mechanisms and code/db/server optimisaton solutions (of which The Zend Optimizer I think is one of the latter??)

Some of this is a little OT for sure, but I reckon it's best not to post the same topic in multiple forums.

Indexes use slightly more space on the disk and make updates slower because the indexes have to be updated with the data. In general though, indexes are a good thing and could be the route of your problem (20000 rows is small fry). CPU load will be much reduced if the DB doesn't have to scan the data linearily.

This is my first post to the forum, but it is very simple. I am the programmer behind a commercial CMS, chaseMe CMS, which is one of the fastest loading due to its caching system. Here is all that you need to do for it. If you have access to .htaccess, then do something like this.

This will allow gzip compression and caching at the same time. All that you have to do is in cachestart.php, make it see if the file exists as, if not, generate it, otherwise load and exit. The second file outputs to the file if needed. Its pretty simple using ob_start. Email me if you want code for all of the files.

If you have not indexed columns in your data tables then doing so will get you an immediate improvement. In general any column used in a WHERE clause should probably have an index. Otherwise the database must search every row for every query. Not good.

As everyone will say indexes will make a huge difference. as arborint says, normally index anything in the where clause, or any other fields used for criteria (for instance fields that have GROUP BY, ORDER BY etc as well).

What you should do is find all of the main 'typical' SELECT queries that get used moreless on every page request, and place an EXPLAIN in front of them. This will return a result set explaining MySQL's query plan. Each row shows a new piece of criteria used in the query. You should index most or all of the fields that listed on each row. You will find when you do that, you will get some KEY's under the 'possible_keys' bit when you re run the EXPLAIN. This means the MySQL is using the indexes. Also you should see that the 'rows' column should go down a lot (before it will probably be = to the number of rows in the table you are querying, where as with indexes it will be reduced to the number of rows matching your criteria).

There's a load more I could waffle on about how to use the results from EXPLAIN to optimise you DB.

Although 20000 rows is small on DB scales (I'm dealing with a site with 1.8 million records in one table and that is still only medium sized as far as DB's go). However even here, you will find the average row search will go down from 20000 iterations to about 15 (as 2^14 < 20000 < 2^15).

You may also want to use something like XDebug to profile your code and find out where the slow points are in the PHP code. If you can move some logic from PHP into SQL (i.e. doing using ORDER BY's instead of sorting using PHP).

The other thing is to use Turck MMCache or something similar. This will speed your site up a lot if the same scripts are used a lot during requests. It will improve even more if you have a lot of small include files. This is especially true if you use a template system that uses code generation to PHP (i.e. WACT, Smarty, etc)

It's hard to pin point places where you should optimise, so i can't give any more specific info, but it seems your DB certainly will make the most difference.

I'd welcome your opinions, experiences and thoughts, with regard to caching mechanisms and code/db/server optimisaton solutions (of which The Zend Optimizer I think is one of the latter??)

If your long queries are used to present content that does not need necessarily to be updated in real time, you may use a content cache solution that stores the content (HTML pages or something else) for a reasonable period and so you avoid repeating such long queries during that period.

Personally, I use this cache class to store in cache files the HTML content of the pages that result from queries that take a long time to compute.

The class allows me to set life time in seconds or an expiry date and time, after which, when the cached content is accessed it is considered invalid forcing the regeneration of its contents and consequently run the necessary queries to generate it.

This class has proven to be very robust as I use it to cache most of the pages of the PHP Classes site that are frequently accessed and take several database queries to generate.

The top charts page is the one that takes more time to generate (about 30 minutes every day) as it computes all sorts of charts that requiring traversing millions of database rows.

The cache class employs file locking to effectively prevent corruption that could be caused by simultaneous accesses to a cache file during an expired cache update if robust locking was not employed.

One of the best improvements you can make is to upgrade to Mysql 4 and turn on Query Caching. That one step alone will GREATLY improve the performance of you website, especially if you have redundant queries that return the same information for every visitor (like building your navigation, for example). We saw a HUGE drop in database load and incredible speed increase in webpage delivery.

One word for you: "Benchmarking" - I did just this. Setting a microtime() value (in PHP) at the start of the script and again at the end and finding the difference between the two was the "measure" of the benchmark I performed.

One particular script with a DB intensive query was taking 12.43s to render without indexes. I then added indexes to those columns referenced in the where clauses (FKs and display on/off "switches") I was absolutely amazed at the speed improvement. Down from 12.43s to 0.33s!! (I took a mean from 5 page refreshes over 10minutes on the same static data-set)