Routing words to /dev/null

Menu

This is just some random musings on common problems I’ve seen over several years of web development.

1) No indexes on data (in databases or elsewhere)

It’s very easy for developers to use a small data set for their test cases, mainly because large data sets are hard to create, only to find out that something that ran quickly during development when you only had 10 records runs slowly when you have 1 million. What’s interesting about this problem is the lack of indexes is actually a good pattern to use if you are actually going to have a very small amount of data. However if you aren’t then testing against a large data set needs to be a part of your process so you can be certain that it won’t crush your site.

2) No caching or incorrect caching

Caching needs to be the part of a website’s plan, and it’s a really easy area to overlook or to get wrong. There are obvious places to cache data, for instance when you make a common database call it makes sense to store data in local memory. However what if you are using 10 web servers and they all need access to the same cached data? Memcached seems like an obvious choice, however what if each server needs a particular piece of data 20 times per page load? In some situations it may make sense to cache to a memcache cluster, but to also pull the data to a local memory cache.

This doesn’t only go for data that’s processed server side, resources such as javascript and css should be cached as well. The obviously place to cache these are on the user’s web browser, however you can also cache them other places. For instance you can instruct downstream servers to cache files for you, you can also ask downstream proxies or the client software to revalidate cached items before using them. (More info)

You can also use the same strategies for javascript and css for the pages themselves. Some web technologies will allow you to cache dynamically created web pages so you don’t need to reprocess the page at all, instead you can keep the resulting page in memory and just serve it to whomever requests the page. This is great for pages that don’t have user specific data on them and don’t change very often.

3) Too much stuff on a page

A lot of websites love to cram as much as they can onto individual pages, even if you didn’t request it. Every bit of information that is included on a page should be scrutinized, and if it doesn’t add value it shouldn’t be included. “Add value” can be very tricky, as many companies and web masters believe that if they are interested in something adding elements to a page it’s also good for the user. However whenever you add anything to a webpage you may be hurting yourself in three ways:

1) There’s now an extra element competing for a user’s attention
2) You’re now using more bandwidth to deliver the page (increasing delivery time, as well as costs)
3) You’re likely lowering the user experience as more complexity reduces experience.

This isn’t to say that adding information can’t help a webpage. For instance having a product for sale without a description is obviously a bad idea. However having a product for sale with a form to search for houses on the same page is a bad idea because it distracts from the user’s main purpose, and this distraction comes at the cost on the user’s end (increased load time) and your end (increased server usage, lose of sale, etc).

Of course if you aren’t certain if adding something to a webpage is a good idea you should A/B test it to see if it’s used or hurts your website’s performance, and actually honor the result of that test.

4) Loading too much data

A lot of frameworks nowadays make it very easy to access databases and other sources, however sometimes they have a cost that’s invisible to developers. Sometimes the developers themselves will screw up their data pulls, regardless of their framework. It’s important for all developers to understand how data is accessed so they can avoid some common problems, and also understand where the best place to do certain calculations is. For instance if you wanted the average housing value in an area it’s far better to query your database for an average value rather than pulling all your housing data to your web server and doing the calculation over there.

5) Making too many external calls during page creation

Some people when they feel clever will come up with a solution to query their database for a list of item keys, and the grab each item based on their key and place them in a shared cache (so later queries can grab them same items without having to run the exact same query). This is a good idea, however if your query gets back 1000 results and they all need to be individually pulled and cached you’d probably be better off not caching at all because now you’re optimized solution is making 1001 round trips to your database. This doesn’t only apply to database calls, making too many calls to anything not residing inside the application itself is a bad idea (including calls to memcached). Whenever possible external calls should be batched together, for instance in the example above you could try to pull all the items from your original query from the cache and keep and list of items that aren’t cached, then make one large pull for the remaining items.

6) Making too many calls from the page itself

Building on point 5, including several dozen javascript, css, image files, etc will really bog down a webpage once the original HTML is delivered to the user. There are lots of technologies and techniques to combat this problem, such as Combres or CSS Sprites, generally speaking you want to make as few calls for resources as possible whenever possible. For more information on where you’re making too many calls tools like YSlow can make all the difference.