So as I stated in the intro to my new untitled php app project, I’ve been doing a lot of research regarding scalable hosting solutions. Rhymes With Milk is currently hosted on Linode, which I love and is perfect for this piddly little wordpress site. But even Linode’s high-end dedicated servers wouldn’t be able to handle the kind of load a large project might require.

As an example, at my job we have several dedicated servers (not at Linode, btw) that have really great performance specs. Just one of them is enough to host over 100 of our clients’ websites. So one year we helped create an iPhone app for the Tour de France. I think the way it worked was the Slipstream sag wagon was carrying an on-board GPS that would ping our server periodically to update an XML file that tracked its location throughout the race. That worked just fine, until about a million requests per minute (that’s a speculative number — I actually don’t know what the exact figures are) were coming into the server for that one file as users were trying to get updates on the location. The file itself was being requested directly by the app, so negligible processing on the server end went into fetching it. But Apache just couldn’t keep up with amount of connections coming in, so our server kept giving up. It pooped out.

This was where Amazon’s AWS really helped save us. We were able to create an Amazon S3 bucket, drop that one XML file in there, and they were able to easily handle the request load.

So what is Amazon AWS?

Well, for starters it stands for Amazon Web Services. It’s taken a lot of research to answer that simple question beyond figuring out what it stand for. Amazon throws around a lot of acronyms, and has a pretty atypical pricing plan compared to more traditional hosts. So here’s a breakdown of their two main services that I think I understand. I’m certainly no expert, I’m just a guy that’s done a lot of reading. I haven’t even set up an account and explored these services myself yet, so this is all what I’ve learned from the outside.

Amazon EC2

Let’s start with EC2 since that’s most like traditional hosting. EC2 is Amazon’s version of VPS hosting. You set up an account and select what size of a virtual server you’d like (how much RAM, bandwidth, storage, cores, etc that you need). Just like many other VPS hosts (e.g. Linode), you can totally customize your server by selecting which flavor of Linux you’d like (you can also set up Windows servers) and where you want it to be hosted (e.g. east coast, Texas, etc). Apparently when you go to select your Linux distribution, Amazon also allows you to select popular packages to start with — like whether it should have LAMP pre-installed, or a mail server included. There is also a community driven list of packages. So if somebody decided that it would be useful to have a server come pre-tuned for video delivery, or have Subversion installed, etc, they could add that package to Amazon’s list. You select one of these, and Amazon creates your VPS with the package you selected.

Now here’s where things get a little cloud-computy. I keep using words like “VPS” and “packages” whereas Amazon uses the words “instances” and “AMIs”. When you create a new VPS, you’re actually creating a new EC2 instance, and those packages are what Amazon calls AMIs (Amazon Machine Images).

You can create a new instance whenever you’d like using one of Amazon’s preconfigured AMIs, or one from the community marketplace like I mentioned above. It’s automatically assigned a new DNS entry, so you can point your domain to it and start delivering content. Similarly, you can turn off an instance whenever you want. This isn’t used much in a single-server setup (we’ll get to multi-instance cases in a minute), but theoretically it’s possible. You can take a snapshot (I think Amazon has a special word/acronym/tool for these, but I don’t know what they are) of your existing setup, back it up, and turn off your server. This does destroy all of your data, so naturally this isn’t something you’d be doing often in a production environment.

I don’t understand multi-instance situations as much, but I think the point is to spawn new instances when the server is under strain. If you know your site was just Dugg or something, you can spawn a new instance and it will share the workload with the first (I think). In a more traditional setup, you might have a server for just routing traffic, and behind it are multiple duplicate copies of the same server each responding to only a fraction of the requests. Each instance is one of those duplicate servers. There are even programs dedicated to just monitoring your server performance, and automatically spawn new instances when it notices your machine needs help (I think Amazon’s CloudWatch is one name I’ve heard before, but that’s something for another post).

These multi-instance situations make it a little more clear why Amazon charges for these services hourly, not monthly. You might need your second instance for only one hour, so you get charged for two instances for that hour.

By the way, I think there are ways you can change your instance specs on the fly instead of spawning a whole new one. Like for example, if you realize you need more RAM all the time, but not necessarily double the processing power, I think you can increase that on your base instance. I’m not confident on that point, though.

Amazon S3

So what if you have a website with a lot of static resources? Things like images, videos, anything that might take up a lot of space, or get requested a whole lot? That’s where S3 (Simple Storage Service) comes in. It’s where you can host all of those resources without putting the strain of delivery/storage of them on your EC2. Remember that XML file I talked about in the beginning? The one getting requested so often it crashed our large dedicated servers? Yeah, S3 was the solution for that. We dropped it into a bucket, and only paid a minimal fee for the number of requests it got. Storage and bandwidth are both really cheep, and you only pay for what you use. I’m about 90% sure web giants like Pinterest, Netflix, and Tumblr all use S3 to store and deliver all of their goods.

—

Alright, that was a lot of info. I’m not sure how to conclude after that. I’ve got a lot more learning to do, and I’m planning on actually setting up an AWS account soon to do that learning. I’ll let you know what I find.

Lastly, on a somewhat unrelated note, I’ve been very curious about where large-scale web apps live. Tools like WhoIsHostingThis.com and Netcraft.com have come in handy. A few interesting finds:

I’ve been busy lately, between getting married and spending nearly a month honeymooning in Hawaii, work, Diablo III coming out (!), and getting ready for a bike ride I’m totally not prepared for. But along with that, I’ve started working on several side projects and hopefully I’ll have enough free time to see them through.

One project I’m working on is creating a simple blank WordPress theme. But instead of being totally blank, it’ll be pre-styled with the intention of being re-styled. I’ll include all of the necessary WP-as-a-CMS functionality, and everything will be styled enough that it’s ready to go out of the box, but will also be really easy to apply custom styles to. It’s mostly something I’ll benefit from at work, since I’ll able to use it instead of the annoying and bloated TwentyEleven as a starting point for custom themes.

There’s another project I’ve been mulling over, mostly just the seed of an idea at this point. I’ve been wanting to try my hand at building a large-scale web application, and this would definitely fit the bill. I’m not sure whether it has the potential to actually see a large user base, but I’d still like to approach it with those intentions. That way I’ll learn from it, and if it does actually gain people’s attentions then it’ll be ready to handle it.

This project has forced me to start doing some research into scalable sever solutions, php frameworks, and general concepts of php application development, most of which is new to me. I feel like I need somewhere to dump my new found knowledge, and my process of discovery. This will probably be that place.

For now I’ve decided to tag these posts PHP App Development, so my progress can be followed there. Geek on!

I’m pretty fickle when it comes to My Favorite Things, so I’m sure this will change soon, but I have a new favorite WordPress breadcrumb solution. You can check it out in the RWM Sandbox. I’ll keep you updated if I find a new favorite to replace this one.

google music

This is awesome. I got a beta invite about two weeks ago, and I’ve used it nearly every day since. Here’s why it’s awesome.

I spend about 90% of my weekly computer time on my work computer, far away from the hard drive I’ve worked so hard over the years to cram full of music. Sure I have my iPhone, the Hype Machine, Pandora, and (until they made the move that turned me against them — restricting their mobile app to paid subscribers only) Last.fm, and they’ve all done a great job of keeping my musical needs covered, but all have fallen short in some way. They never have exactly what I want when I want it. It’s always hard to find or bookmark specific songs. I was missing MY music library.

Then Google comes out with what I believe is their most selfless and philanthropic use of their gigantic servers. Over the last two weeks (yeah, it took that long which was lame, but c’mon, it’s a once-and-done process) I’ve been able to upload nearly 100GB of music. I’m sure it gets compressed heftily, but as far as streaming compressed music goes Google Music sounds pretty darn good. Imagine if every user uploaded a similarly large music library. That’s a lot of space Google is giving out for free. Sure sites of theirs like YouTube will surely always top Music’s disk-space needs, but it’s covered in intrusive ads. None of those on Music…yet at least.

Sites like Mashable have come out and said they’re not really excited about Music and think it leaves a lot to be desired…but what the fuck do you want from a FREE service? They cite things like the inability to purchase music from the site like you can on Amazon’s comparable service, and you surely will be able to from Apple’s forthcoming iCloud. But you know what you CAN do? Go to those other sites (that would charge you long before Google would to store the same amount of music), buy the songs you want, and upload them to Google. Now was that too difficult? And the lack of offline caching? How often are you not connected to the internet and really itching to hear that one song? Yeah, never. Mashable, get off your high horse.

Anyway, now I have my whole music library accessible from my work computer and my home computer without an external hard drive tethered to it. That is awesome right now.

github

This is mysteriously awesome. I can’t figure out why it’s useful to me, so that’s why it’s a bit of a mystery that I’m so infatuated with it right now. Maybe I like it so much because it’s the only “social network” that I’ve signed up for that requires you to enter terminal-style commands to set it up and create updates. That’s badass. [Side note: working with terminal commands is pseudo-awesome right now. I'm really excited when I type in a totally geeky command to a black and white window with mono type font and see shit happen.] Their entire site forces you to use https. It’s secure, and they flaunt it. Are you listening, Facebook?

If you care to follow me (and improve my code, I guess. Is that the point of the site? I really don’t know…what is it good for?!?), check my shiznit out at https://github.com/kohnmd. It is awesome.

being engaged

This is romantically awesome because my financé is awesome. I’m so excited to get married!