Mark Maunder of No VC Required--who advocates not taking VC money lest you be turned into a frog instead of the prince (or princess) you were dreaming of--has an excellent slide deck on how to scale an early stage startup. His blog also has some good SEO tips and a very spooky widget showing the geographical location of his readers. Perfect for Halloween! What is Mark's other worldly scaling strategies for startups?

The Platform

The Architecture

Performance matters because being slow could cost you 20% of your revenue. The UIE guys disagree saying this ain't necessarily so. They explain their reasoning in Usability Tools Podcast: The Truth About Page Download Time. The idea is: "There was still another surprising finding from our study: a strong correlation between perceived download time and whether users successfully completed their tasks on a site. There was, however, no correlation between actual download time and task success, causing us to discard our original hypothesis. It seems that, when people accomplish what they set out to do on a site, they perceive that site to be fast." So it might be a better use of time to improve the front-end rather than the back-end.

MySQL was dumped because of performance problems: MySQL didn't handle a high number of writes and deletes on large tables, writes blow away the query cache, large numbers of small tables (over 10,000) are not well supported, uses a lot of memory to cache indexes, maxed out at 200 concurrent read/write queuries per second with over 1 million records.

For data storage they evolved to a fixed length ISAM like record scheme that allows seeking directly to the data. Still uses file level locking and its benchmarked at 20,000+ concurrent reads/writes/deletes. Considering moving to BerkelyDB which is a very highly performing and is used by many large websites, especially when you primarily need key-value type lookups. I think it might be interesting to store json if a lot of this data ends up being displayed on the web page.

Moved to httpd.prefork for Perl. That with no keepalive on the application servers uses less RAM and works well.

Lessons Learned

Configure your DB and web server correctly. MySQL and Apache's memory usage can easily spiral out of control which leads gridingly slow performance as swapping increases. Here are a few resources for helping with configuration issues.

Serve only the users you care about. Block content theives that crawl your site using a lot of valuable resources for nothing. Monitor the number of content pages they fetch per minute. If a threshold is exceeded and then do a reverse lookup on their IP address and configure your firewall to block them.

Cache as much DB data and static content as possible. Perl's Cache::FileCache was used to cache DB data and rendered HTML on disk.

Use two different host names in URLs to enable browser clients to load images in parallele.

Make content as static as possible Create a separate Image and CSS server to serve the static content. Use keepalives on static content as static content uses little memory per thread/process.

Leave plenty of spare memory. Spare memory allows Linux to use more memory fore file system caching which increased performance about 20 percent.

We built our own fast file storage routines from the ground up. It's loosely based on ISAM or MySQL's MyISAM in that it uses fixed length sequential records. It's a lot faster for certain specific operations that we require. Unfortunately it's not open source at this time but perhaps we'll release it in future.

WordPress MU creates tables for each blog, which is the system we found worked best for plugin compatibility and scaling after lots of testing and trial and error. This takes advantage of existing OS-level and MySQL query caches and also makes it infinitely easier to segment user data, which is what all services that grow beyond a single box eventually have to do. We're practical folks, so we'll use whatever works best, and for the 2.3m and counting on WordPress.com, MU has been a champ.

We built our own fast file storage routines from the ground up. It's loosely based on ISAM or MySQL's MyISAM in that it uses fixed length sequential records. It's a lot faster for certain specific operations that we require. Unfortunately it's not open source at this time but perhaps we'll release it in future.

If you are starting a company, read this book:http://www.amazon.com/gp/product/0470345233?ie=UTF8&tag=innoblog-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0470345233">How to Castrate a Bull: Unexpected Lessons on Risk, Growth, and Success in Businesshttp://www.assoc-amazon.com/e/ir?t=innoblog-20&l=as2&o=1&a=0470345233" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> by NetApp's founder, Dave Hitz, provides direct, honest, thoughtful business advice, applicable to business founders and leaders throughout the growth cycle of a business. He puts special emphasis on hard choices and decision-making processes, with an understanding that comes from a life-time of risk taking. If you are a first time entrepreneur, read this book. If you are entering a growth phase for your company, read this book. If you failed at your first venture and want to understand why, read this book. And if you want a few good laughs, read this book. It should make scaling your company more fun on the way.