This product we've been working full steam on and now (finally) happy to be announcing in an open beta program is called CloudPreservation. And what it provides is a web-based service that automatically crawls your web properties at chosen intervals, building an archive of html source code and resources, high quality snapshots, and a robust full-text search index. The service makes it a breeze to go back in time with all of your web sites, blogs, Facebook fan pages, and Twitter accounts to search content, preview the site, and export data.﻿

There's a bunch of reasons why organizations need this service, foremost being regulatory and legal compliance. But another huge one that I think affects this group of folks is backup and business/site continuity, and this works great for keeping a historical archive of the content and (coming soon) images and resources (javascript/css) files from your site.

Just like our other products, we're leveraging cloud computing to do the crawling and imaging of these sites, so we're able to scale this out enorumously (and I really mean infinitely) at extremely affordable prices. There's a 30 day free trial, and we've got pricing plans that should suit the needs of any personal blogger who wants some piece of mind, or a huge corporation who can finally be compliant with regulations. We've even got a free plan for those that can get away with it.

We want you to give it a spin. Sign up for a 30-day trial of one of our paid plans, or just take us to the cleaners and pick the free one :). Since we're in beta, I really need your feedback. I know lots of you are some of the most opinionated and nitpicky folks I know, and I want all the nitpicks and opinions because they're super valuable to me, and can really help improve this much needed service. Feel free to give feedback here on the blog, send me an email, or send an email to our feedback email address.

Tuesday, March 30, 2010

This American Life had a great radio show last week about New United Motor Manufacturing, Inc. (NUMMI), which was a joint venture between Toyota and General Motors so that GM could learn the Toyota Production System and so that Toyota could learn how to apply the Toyota Production System in the United States.

Just about every week, Ben and I talk about how this is the week we're done with Facebook and Twitter. We talk about how big of a distraction it is, how little great information we get from it, and how things people say can get us worked up for no great reason. Tired of the .02% of your "friends" who just flood your information streams with useless status updates or political rants.

Don't get me wrong, I think that 1-2% of the stuff that I read there is interesting, nice to know or informative. But that's a pretty low hit rate for signal vs. noise.

There's this weird and unhealthy emotional attachment to it, which causes me to never hit the off button. Sometimes I feel like I've built up this big property, and I'm scared to just let it go (like there's a bunch of other jfiorato's out there waiting in line for that username). Then there's just the fear of missing something important, that I won't see anywhere else, or that I'll be doomed to find out later than everyone else.

But I don't like the way these attachments make me feel. As if we don't have to enough live in fear of these days, fearing that I'm not up to date, or fearing that I'm not marketing myself as well as I could, just isn't necessary anymore. I've just been sick of having so much of my stuff owned by others. Tired of banks owning my shit. Tired of TV owning my CPU cycles. Tired of Facebook and Twitter owning my words and pictures.

I feel like I got so much more value out of reading and writing more than 140 characters, but all these 140 character "efficiencies" have ended up paralyzing me.

But, still, unready to fully commit to anything, to test things out, Ben and I made a pact to not check Twitter or Facebook for 2 weeks (had our wives change our passwords), and then see where we're at then. Maybe after the two weeks, deleting the accounts, or maybe just leaving them there without knowing the password, not sure yet.

Saturday, January 09, 2010

Why is it that development teams seem to rarely ever know what the financials look like of the product they are building? Is it that the company doesn't have that information? Unlikely. Is it that it's sensitive information that shouldn't be passed around to just anyone? Maybe. Is it that the company doesn't want to concern the team with all the gory details? Most likely.

The ultimate goal of the company is to make more money. Yes, please customers, employ more people are there, but the goal is to make more money. If all your team has for determining success is "on-time delivery" and "quality", the team will end up building process and metrics which may hit the nail on the head for "on-time delivery" and "quality", but my be of poor value when it comes to making more money. Things like "on-time delivery" and "quality" are both obviously going be a positive thing, but the company should draw the line between these metrics and the financials with the development team. The team is smart, and the company should let the team have some input on the metrics that result in making more money.

Hiding the financials is hiding score from the team. Like a basketball team that only knew how many times they turned over the ball, or baseball team that only knew how many stolen bases they had.

A company can't pretend that the goal isn't anything other than making more money. The team understands that, and wouldn't mind seeing a big fat chart that correlates the fact that they kicked ass to get a feature out in February and with a big spike in sales in July.

Sunday, December 13, 2009

Last year I decided it was time to update home theater pc. The old one was still on Windows XP with Windows Media Center 2005. The hardware was old and I wasn't really using it much any more, opting to use my cable box w/ much better high definition support.

So, I decided to do the build this time on my own, and I thought I'd detail it here for reference.

My requirements for the HTPC were:

High definition playback and recording (at least two shows at once)

A/V Component Form Factor for Case w/ built in IR Receiver

HDMI output to receiver/television

Blu-Ray Player

Energy Efficient

High definition playback and recording (at least two shows at once)

For the HD support, I opted to go with the SiliconDust HDHR-US HDHomeRun Networked Digital TV Tuner. I found that this setup (among the many I tried) was the easiest to setup in Windows Media Center. Additionally, you can use the broadcast signal throughout your house on any of your other PCs with VLC player or other software. This is a fantastic device, and they've got fantastic support.

A/V Component Form Factor for Case w/ built in IR Receiver

This was important to me because I didn't want a tower sitting around in my family room. Additionally, I didn't want any more cables that the kids would jack with, so IR receiver cables were not ideal. I chose the Antec case for the form factor and IR receiver, but was also pleased with it's cooling capabilities. One downside to this case is that the front display has very very poor contrast, making it difficult to read what's on there from 8' or more.

HDMI output to receiver/television

Again, less cables is better. Having the single HDMI out to the receiver, which is forwarded on to the television just makes everything simpler. When shopping for mother boards, I wanted to make sure they had this on board with a decent audio chipset. Asus has a great track record quality mother boards. Downside here is that at the time, there were no HDMI 1.3 mother boards, so audio formats like Dolby True-HD and DTS-HD aren't supported. Looks like you can find HDMI 1.3 capable mother boards now.

Blu-Ray Player

Might as well, right? Internal Blu-Ray players only run $50-$60 more than other optical drives. I opted not to get a DVD writer here because I rarely do any DVD writing, and if I did, I'd do it from my laptop and not my HTPC.

Energy Efficient

There's a few energy efficient aspects to this PC. First, the motherboard itself supports intelligent standby, which goes bare minimum power until the remote is used or a show needs to be recorded. Second, the chassis and CPU fan speeds are controlled by the motherboard. Third, I installed a 2.5" notebook hard drive for the OS drive, which is far less power hungry than a standard 3.5" disk.

So, that's the giddy-up. Happy to field any questions/comments on your own experiences.

Saturday, October 24, 2009

Scalability always seems to be the poster child for YAGNI. Lately however, the barrier to entry of scaling out is decreasing as the services that provide elastic infrastructures are wildly abundant. Scaling a website is easier now than ever for just about anyone, but for some reason upfront scalability design seems to still be a bit taboo.

I wanted to test how difficult it would be to do upfront scalable implementation, and pay little or nothing until the scaling needed to happen. So, I decided to dive in and see how I would create a site that needed to start with serving a small number of pages, do a little bit of work, and use a little bit of storage, while paying a little bill, but with the ability to scale out the web page serving, the work, and the storage, infinitely.

Recently my Dad was asking how to download all the photos from a set in Flickr so he could burn them to a CD for my grandmother. There's a few desktop apps and Firefox extensions for it, but everything requires you to have to download and install software or use a certain browser. So, I thought a web application that looks at a picture site, and zips up the original images in a set or album and let's the user download them would be a good sample application for this test. You can see the application here.

Overview

The application itself can be divided into three parts. The web application, which serves as the interface for both the user and the workers; the workers, which download the images and zip them up; and storage which stores the zipped sets for download.

Web App

The web app is a pretty vanilla Ruby on Rails web application. It serves up a single page for the user to enter the information about the set/album they want to download, and it also provides a REST interface for the workers to talk to. This is where the PostgreSQL database resides, and stores the information about the downloads that have been requested.

To get cheap scalability, it's deployed to a host called Heroku. Heroku is an infrastructure that provides process-level scaling for web based Ruby applications. When you deploy an application to Heroku, it complies it into a "slug" and this "slug" can be started as a process on a server. Heroku uses Amazon's EC2 infrastructure and deploys the "slug" to a user-specified number of load balanced processes, with processing slots available across the virtual servers they have running at Amazon. You pay for the number of processes you want to have. 1 process is free, so you can host a low traffic website there for no money at all and scale up the pay scale as you need it.

Workers

The workers are Ruby scripts that do the actual downloading and zipping of the set/album. Workers are doing periodic lightweight polling of the web app, looking for new jobs to work on. Once a job is completed the zip is moved to the shared storage, and the worker tells the web app that the job is done and gives the location of the zip file.

To get cheap scalability, the workers run on Amazon's EC2 environment. There are two workers per server, and when the Amazon EC2 instance starts up, the code is updated from Git, and the processes are started as service daemons. The EC2 instances are started by the web application, and they shut down after an hour of inactivity. The web application determines how many EC2s to start based upon the number of jobs in the queue. At $.03 per hour, I pay nothing for no activity, and very little as the application scales up.

Storage

Since the web app and the workers are physically separated, I needed shared storage that would accommodate for potentially some really large zip files. Additionally, I needed a network that could serve such files reliably for download.

To get cheap scalability, I used Amazon's S3 storage for this shared storage. I get great copy speed from the EC2s to the S3 storage, since they're both on Amazon's network, and again, I only pay for what I use. At a minimum, I have to pay for the storage of the EC2 image, and on top of that, I pay for the zip files that are stored there, as well as for the data transfer and requests. You can see the pricing here, and you'll see that it's very affordable. The zip files are deleted after a month, to control storage costs.

Conclusion

Overall this project took me very little time to do, around 30 hours. With it, I've got a solution that costs me nearly nothing if it isn't used, and would be affordable if it was used quite a bit. That said, it's a pretty simple application, and the lines of separation between web app/workers and storage were very clear, so you're mileage may vary.

Also, you probably were wondering about database scaling. In this situation the PostgreSQL scaling would happen vertically, adding more dedicated resources to the database server, rather than more databases to the pool, whichHeroku supports as well.

I do need to give credit to my colleagues at the Nextpoint Lab who initially put together the scalable design for our software (which gets raving reviews, I'm just sayin'), as it was essentially the blueprint for this design.

I think my conclusion is that if the situation is right, upfront design and implementation of a scalable solution doesn't necessarily have to cost you an arm and a leg to get it done, nor does it need to be a huge gamble if the scale isn't needed immediately.