We are designing a website/web application where we hope to to achieve high user counts and, in general, lots of use. More specifically, we intend to be using PHP as the programming/scripting language and MySQL for the relational DB needs as a start. We have not yet figured out whether to use a NoSQL database or not.

Related to that, we want to design with scalability in mind. What are the most common scalability pitfalls for websites? What are the key areas that we need to take into consideration, so that the system can be easily scalable?

5 Answers
5

I would add to that one very common thing - optimizing in the wrong place. I have seen tons of articles around that discuss nanosecond differences in PHP syntax constructs but much less that discuss how to properly design caching infrastructure for an application. So as it was already noted, test. But not just test - profile and find out what exactly is slow - is it CPU-bound? I/O bound? Memory bound? Is it database queries that bring you down, is it reading the files, is it calculations? Can you eliminate that or redo it so it works faster? Etc. Do not start with "let's use NoSQL because it's faster". Start with "we want to do this and that, what would be the bottlenecks? How we eliminate them? How it would behave if we get times 100 users?"
Without knowing more of the workload and app it's hard to say anything concrete, but I would start with thinking what you can cache and how to reduce filesystem/database/etc. accesses and especially modifications (since those would also invalidate the caches).

The most common scalability pitfall is not doing load tests early on. If you set up tests that simulate something comparable to your expected load early during development, then you'll be able to detect and correct any technological or architectural impediments to scalability before they become too expensive to fix.

Keep it simple!Don't overengineer or buy into fancy vendor-specific solutions.

Shared-nothing architectureKeep your state in the database and off your application servers (avoid even session data on the server). This way you can easily add additional app servers as needed.

Focus on front-end (static file) cachingUse a reverse proxy and later on a CDN. Whatever doesn't have to get served from the app server is less load on that server.

Measure the real systemBuild in monitoring so you know where your bottlenecks are. Ensure that you can predict future load based on growth curves.

Pay attention to your DB designTune your queries, use memcached to avoid querying at all, and shard your data across instances when you run out of breathing space on one DB instance (monitor to know this ahead of time).

Some pitfalls:

NoSQL vs SQL is a red herring.All the big guys are running their core on SQL databases. Use NoSQL if you're sure that it makes sense, but don't use it assuming it will solve your scaling issues. It won't.

Be careful about ORM's.They're state-heavy on the app server (contradicts shared-nothing architecture), and they require you to understand not just how to tune SQL queries, but how to tune the ORM on top of the SQL queries (in other words, they only simplify things if performance doesn't matter). Give preference to hand-designed queries and liberal use of memcached instead.

Don't worry about line-by-line code performance.You can always go in and fix hotspots later on (use xdebug or similar profiling tools). Having a scalable architecture matters MUCH more than code performance, so invest your brainpower accordingly.

The only real way to tell if you have scalability issues is to test it, so test early, test often as Michael Borgwardt says.

Other than that, a common reason why systems don't scale is because of resource contention. And that usually shows itself in the database --- trying to read and write at the same time. So you might want to think about using a CQRS approach that disconnects the read (Query) side from the write (Command) side.