Hi, if you were to design your own highly scalable website from scratch, what technologies would you use?

Based on Web 2.0 popularity, LAMP seems to be high in the running. But would you tack on CakePHP? Drupal? or build your framework/CMS from scratch?

What version of Linux runs best for a scalable website?

Would you consider Windows and .NET? Java? Or do you want to throw a brick at me for even suggesting such heresies?

Would you prefer Postgres, Tomcat, Perl, Python, or any of that other *NIX fancy stuff...why or why not?

Please forget for the moment, "use what you know" argument. I am pretty versatile, and can look for an expert in whatever platform I choose. So all skills being equal, I'm looking for the best community support, the fastest development time and most importantly, the best scaling approach.

Let's say, for fun, that I'm planning for the website to have as many messages going back & forth as an eBay.

Reader Comments (29)

> build your framework/CMS from scratch

Definitely don't rely on a CMS, especially if you have programming skills and want to create something unique and scalable. Otherwise you'll end up spending all your time figuring out how to make your CMS do what you want to do instead of just doing what you want. It's a slow and frustrating process. Drupal and Joomla are very nice for sites like this, but roll your own if you want to create a competitive product.

Cool, thanks. Yeah, I've pretty much ruled out Drupal, although I'll probably use it for my personal blogs.

The next issue is the framework. CakePHP, Symfony (sp?), etc. Again, these seem to run into scalability issues - there are lots of sites using frameworks, but according to postings I've scoured, any framework runs into limitations once you reach a certain number of connections.

More and more I'm liking the PlentyOfFish approach - drop all modules whatsoever, and reduce everything to if statements and foreach loops. I certainly know how to program it, if I can make the time to put it all together.

Here's my next dillemma: table layouts or AJAX? Is there any reason why I should give up tables? I really don't care about trendiness, as long as the site is functional and easy to use...any truly viral growth starts with some sort of PBR/Hush Puppies renaissance crowd, so a Web 1.0 approach might even help.

Then...there's the stateless argument - in order to go truly stateless, I'd need to drop the Session objects, and I haven't figured how to xfer shopping carts etc. to a stateless state. Maybe now I'm getting too geeky and granular.. I'm just thinking aloud.

Frameworks are usually a matter of performance and not scalability. You can make most frameworks scale by using the different dived and conquer techniques described on this site. So I wouldn't worry too much about that.

> Here's my next dilemma: table layouts or AJAX?

It's not really either or. AJAX adds asynchronous operations. You can do that within CSS or tables. A complete AJAX approach does have SEO implications however.

> .there's the stateless argument - in order to go truly stateless, I'd need to drop the Session objects

Store sessions in the database or in a cache. Or don't use sessions for state. Have a shopping cart database that has nothing to do with sessions.

I realize the original post seemed like an invitation for open discussion about general architectures, and it has turned into specific tips for me. However, this site is about chronicling & building specific real-life examples, so no troll no foul.

I guess my next question is this: If I'm getting my own dedicated server, what version of Linux will work best? Is it even that big of a issue, or can I just install something popular like Red Hat and forget about it?

(my background is databases and coding.. so the LA part of LAMP will have the greatest learning curve)

PHP is pretty much de-facto but misses an application-server.Perl is pretty good but harder than PHP.Python is pretty good but is **** about white space.

You can write a persistent perl/python middleware that talks to the lightweight webserver but this feature is non-existent in php (except perhaps in php-enterprise-bananas and this project is very very outdated)

If you want to make your site stateless, you might want to read up on REST architectures. It's not a framework, or anything like that, it's a architecture style. It might seem a bit unclear what the whole thing really is about, but one of the things that might be of interest is that it considers cookies a no-no, since it seriously breaches the stateless principle of REST. Try googling "Rest web services" or "Representational State Transfer", and you're off!Good luck with your site!

I've been off the site for a few weeks, but then again... I'm building a website.

I still don't get what REST is about. The most common thing I see is "It's just like HTTP". So why not just call it HTTP, or why not just USE HTTP?

As for the scripting language.. what would I need an application server for? If I did have a middleware need, say a box to process API calls after I distribute an API kit to 3rd party participants......can't I still code that on a separate box, also in PHP? Microsoft has this thing called Windows Services, and Java has resident-style programs...you are saying that PHP doesn't support this? Could I write a middleware app (say an email POP3 handler) in Perl or Java, and leave the webscripts all in PHP?

Next question. Memcached vs. Squid/Varnish vs. Akamai. My understanding is that Memcached is for caching dynamic content, say Database queries, and a CDN is for caching static content, say a user's profile pictures. So what is Squid or Varnish used for - they are "reverse proxies", yes? How does that fit in to the picture?

An application server for example would be a daemon that knows how to do things.For example: create user, delete user, something else. This application server may connect to the database and knows the the low level stuff about databases, etc.

Your webapplication may only know how to do presentation and how to get/manipulate data coming from this application server. The webapplication for example does not need to know anything about LDAP/MYSQL/anyother backend.

The application server may be a FAT server (LOAD EVERYTHING AND KEEP IN MEMORY ALL THE TIME), the frontend server (webserver application) can be a thin server. Get the data from the backend and draw.

This is also to keep the complexity in different groups. For example your chief archtect may manage the application server with a bunch of senior programmers which the front end services are written by web-savy (ajax,javascript, flash, css, html) gurus

Then again, this is just one model and it may not work for you.

Go on, do your thing and we look forward to read about your architechture here soon.

Being in the same shoes I started looking on how to create highly scalable and highly available web service until I hit very clever idea from someone who managed to build web project and succeeded - 'do not care about scalability, availability and performance until your project will face one of these problems'.This is very true as possibility that project will face rapid customers growth is 1-2% really and conclusion will be - use any tools you are familiar with and concentrate on idea. If idea works and people will hit your site/service you will start working on reliability, scalability and performance.Even though I continue looking at approaches I changed my priorities on delivering idea rather than methods of delivery.

Another thing to look at when you choose technology - natute of your web service. Content, images, sell/buy, video, audio, instant messaging - I guess this will drive your choice of tools as well as your team experience.You can not run on FreeBSD's if all your guys were born as NT admins.

If you want low cost and expect high traffic, Ruby on Rails is probably a no-go, since reports are that it takes quite a setup to scale Rails (think more than one server for sure). Unless this site takes off, you'd be just as fine with ASP, IIS, and Access as with the top-of-the-line technologies. Right now, you want to focus on what you know best to get the product out the door quickly. Would I worry about an IPO if I were starting a business today?

For a good combination of performance and low price, I recommend a LAMP setup:

Linux: Debian is very fast and easy to maintain. The same can be said for Ubuntu Server (with more recent "stable" software), but Ubuntu is nowhere near as stable as straight Debian. Go with Lenny, the "testing" distro, since it should reach "stable" by the time your site is getting any sort of big-time traffic.

Apache: 1.x or 2? Hell if I know which is better here.

MySQL: Great, fast server. Use PostgreSQL if you're more comfortable with it, I don't think you lose either way.

PHP: Use PHP 5 for better OOP and stability. Anyone saying PHP can't scale is uninformed, since Yahoo runs on PHP, amongst a ton of other big-time sites. For RAD, go with one of the MVC frameworks: CakePHP, Symfony, CodeIgniter, qCodo, etc. I personally like CakePHP, but none of these are known to power major websites. Use Smarty and PEAR if you want something more proven, but there's no reason to think that the previously mentioned MVC frameworks can't stand up to the task. I left Zend out since it isn't free.

Language: PHP 5 - no bloated frameworks, waste of time for me. You spend too much time trying to figure out the framework instead of getting work done.

Database - MySQL 5. I didn't consider Postgres because I've never used it. There are just a lot more tools available for MySQL.

Phase 2: Max Ram out to 64 GB, cache everything

Phase 3: Buy load balancer + 2 more servers for front end Varnish/Memcached/Lighttpd. Use original server as MySQL database server.

Phase 4: Depending on my load & usage patterns, scale out the database horizontally with an additional server. I don't expect the db to be a bottleneck for my website as only metadata info is stored there. I'll mostly be serving images stored on the file system. Possibly separate Varnish / Memcached / Lighttpd tier into separate tiers if necessary. But I'll carefully evaluate the situation at this point and scale out appropriately and use CDN for static content if necessary.

Phase 5: Max all servers to 64gb of RAM, cache, cache, cache.

Phase 6: If I get this far then I'm a multi-millionaire already so I'll replace all of the above machines with whatever the latest and greatest is at that time and keep scaling out.

The important point is that I know how to scale each layer when/if the need arises. I'll scale the individual machines when necessary and scale horizontally too.

I have used MySQL and PostgreSQL in projects over the last 5 years and have a very strong preference to PostgreSQL. I have found it reacts better with high load and is stabler. With the latest just announced 8.3 release it has a lot of new performance improvements and full text search has been built into the core, which was always a bit of a drawback previously.

The docs on the PostgreSQL site have been more than sufficient for most of my needs, as it's true there aren't as many howto / tutorials on Postgres vs MySQL.

You have to perhaps review this excellent product/company ( http://coraid.com ).They do not provide you with a demo unit of their production system but they do have a demo kit ( bunch of cards that you can plug together to test the technology ).

I have been avoiding them for a long time, since I was seeing full-page-ads in most linux magazines that I purchase, but this is another lesson that I have learned.

We have been using them for content (SAN and also their NAS product [simply a couple of debian boxes where you just edit some files] )

If you are building something that is going to take on a pretty decent size, building a table-less design will be far easier to format and facelift down the line since you can manage each div any which way by changing only the CSS file. Tables can be formatted with CSS but there is only so much you can do to it. You'll never be able to shift a table's position as flexibly as you can shift a div. In my opinion: in terms of scaleability, table-less will age and scale better.

I think more important than choosing a language is to think about what you're looking for architecturally, then look for a language that supports you to that end. For example, my website is write-heavy. That is, I didn't want any web pages ever waiting on a database write. I designed an architecture that optimized for this, and although it could have been implemented in any language, I chose Java due to its strong support for (de)serialization, since I was going to be passing objects as little files between several different processes, per my architecture.

Since you're looking at passing messages, that strikes me as write-heavy, etc, etc. I know its tempting to jump straight into the implementation (the fun part), but any time spent thinking about the bigger picture at the get go will save you more time overall than ANY benefits of one language has over another.

As with most "serious" things in IT there isn't a reciepe that will fit everyone needs. This said, and agreeing with some of the replies, I'd say go with what you're more familiar with. Today, most of the web technologies can scale well enough. Actually, truth must be said, most of them can scale more than what most startups will require. Why? Because most startups have their growth expectations and optimism on steroids (http://www.karendecoster.com/blog/archives/steroids.bmp).

Have an idea? Make a plan! Deliver it!That simple!

As a software architect, I'm forced to deal with most of the issues that have been enumerated throughout this thread, on a regular basis. What technologie to use for this, and for that? This OS or that? Ultimately it's all a bunch of crap of you have an idea for a web 2.0 startup project that doesn't have a dependency with a venture capitalist firm, that will try to scrutinize every thinking cycle of your brain.

Back to the technology, and to some of the points discussed in particular, I must say that you make a lot of assumptions and too little investigation to support your conclusions, or thoughts.

I confess that I didn't take the time to make an exact quote on everything, but I'm pretty sure that the idea behind each point will make you remember that part of the threat.

Enough of talking. Let's get our hands dirty:

Gave up on ASP.net scalabilityASP.net is as scalable, or more scalable, mature and rubust that most things arround. Don't get fooled by the Microsoft bashing people. Want a proof that it's scalable? Visit http://www.myspace.com/ and you'll be viewing a page powered by a lot of Microsoft technologies.

Tables or AJAXTables and AJAX have nothing in common. The question should be Table design or tableless design. The answer to that is simple: tableless design.Please, do take into consideration that tables were created in html to represent tabular data, not to craft layouts, for that you have tableless designs that will make an extensive use of CSS. This will make your code much simpler and sustainable.Talking about scalable websites, you should also take into account that a tableless design requiers less lines of code, thus reducing your bandwidth and cpu cycles required to process each request on both the server and the client. Pretty neat, huh? :)

PlentyOfFish.com (POF) style without modules, reducing everything to if, foreach, and output statementsWhy not use assembly while you're at it?/sarcasm offNot wanting to sound like a Microsoft advocate, even if there shouldn't be anything bad with that, I must stress that POF has unique caractheristics that you are unlikely to have. For instance, the owner claims, or claimed at the time of my readings about it, that he is only using one webserver for all that traffic. Well, do I need to continue with my argument?

Sure (and please), don't be fooled by the marketing guys and blindly start using all the wizards and eye candy things they throw at you. But spare me, if you use .net in that way you shouldn't use it at all. There are better languages to do that kind of programming.

Web 1.0 approachWeb 1.0 style plus caching usually make your pages load way faster. JavaScript has its overhead and drawbacks. For instance, an extensive use of AJAX can make your SEO put a prize on your head. It will also not help you reach people with disabilities as most screen readers have a tuff time preceeving what have changed when you do partial updates.Solution? MIX! Make some pages more 1.0 style, others more 2.0. Add a bit of AJAX here and there, and you should be fine :)

When you speak about ajax you must consider the previous tip about not going with Microsoft's marketing guys. Don't let the hype drive you. Let your brain drive you. It's just more effient!

Hey! Don't blame me, scalability is all about effiency :)

Plain PHPThat would be as good as plain . Again, go with what you are more comfortable with. Go with what your project really needs.

Memcached vs whatever vs AkamaiAs good as Tableless vs AJAX. Nothing to do with anything.If you're in a nix kind of mood go with memcached. For sure!Akamai is a Content Distributon Network (CDN). As with other CDNs you should use when your site hits a certain level. Imagin that you have a site that stores some files for major world wide download. Can you picture it? Are you using a CDN? Well, then you're probably missing all the fun, bells and whistles :)

Hope this helps you. And, to conclude, let me give this post a personal touch.I'm currently developing an idea for a web project, along side with my current professional activity.

So, I have an idea... Great... Now I needed a plan!

Off I went to start crafting a plan that could materialize that idea. However, during my professional activity I mostly work with Microsoft technologies. So, this time arround, and since it's a personal project, I've included in my plan the oportunity to try and install 2 distributions or linux, php, ruby, ruby on rails, mysql, nginx and a couple other things (all on linux, no Microsoft!).

This has been the most interesting and funny [IT] thing I've done in a long time. And, as I already knew, everything can be done with asp.net, php or rails. It's a question of specifics and comfort. Budget can come into play. Sure, it's more expensive to rent a server at a datacenter that has SQL Server installed, when compared with MySql (Web Server was excluded on purpose. Most decent datacenters won't charge you more for a windows up to standard edition 32 or 64bit). However, all things considered, it's a small price. By the time you need to do a major scale you're prolly rich or doing something very wrong ;)

Best regards and sorry for any errors and typos. I confess: I'm at work.Sorry boss!

We use Cold Fusion to build our web sites - you can build as you go - develop incredibly quickly and have a platform that is generally pretty stable. I believe Myspace is running on CF on ASP with blue dragon (i think)

the beauty is that you dont need to design from the ground up a massive web site - its very flexible and extremly fast to develop in. it is also able to call asp.net and java functions if needed.