2012-12-07

Why "Taking Hadoop to the Clouds" is the talk to vote for

The Hadoop summit vote list is up, and I have two proposals -currently undervoted. Even though I'm on the review committee for the futures strand, not even I could push through a talk which had zero votes on it -ideally I'd like my talks to get in through popular acclaim. I could just create 400 fake email addresses and vote-stuff that way, but I'm lazy.

For that reason, I'm going to talk in detail about why my talks will be so excellent that to even think about having them left out could be detrimental to the entire conference.

Deploying Hadoop in the Cloud, which looks at options, details and best practices. I don't see anything particularly compelling in the abstract -I assume it's got more votes as it's the one that comes up first. Or they are trying the many-email-address-vote-stuffing technique(*).

This is not me being egocentrically smug about the quality of my presentations, but because I'm reasonably confident I know a lot about the area.

My last time at HP Labs was spent on the implementation of the "Cells" virtual infrastructure: declarative configuration of the entire cluster design. The details were presented at the 5th IEEE/ACM conference on Utility and Cloud Computing, and will no doubt be in the ACM library. This means I know about IaaS implementation details; the problems of placement, why networking behaves the way it does, image management, what UIs could look like, what the APIs could be, etc.

I've spent a lot of time publicly making Hadoop cloud-friendly. I presume that MS Azure and AWS ElasticMR have put in more hours, but unless they're going to talk about their work, Tom White and myself are the next choices. Jun Ping and VMWare colleagues have done a lot too -and big patches into the codebase, but I don't see any submissions from them.

I have opinions on the matter. They aren't clear cut "cloud good/physical bad" or "physical bad/cloud good". There are arguments either way; it depends on what you want to do, what your data volume is, and where it lives.

I'm still working in the area, in Hadoop itself and the code nearby.

Recent cloud-related activities include

HADOOP-8545: a Swift Filesystem driver for OpenStack. This is something everyone running Hadoop on Rackspace or other OpenStack clusters will want. This week two different implementations have surfaced, getting them merged together is going to be the next activity,

That's why people should vote for me. The other talks will be about "how we got Hadoop to work in a virtual world" -mine will be about how we improved Hadoop to work in a virtual world.

(*) ps, for anyone planning the many-email-accounts approach, remember that the email addresses are something we reviewers can look at, and many sequential accounts all doing three votes to a single talk will show up as "statistically significant". Russ has the data, he likes his analyses. He may even have the IP addresses.

1 comment:

I would love to see both of your talks. Its a pity that both are not extensively voted upon. Having known the work you have been doing on the cloud + hadoop front, you are the right person to cover Hadoop deployment on the cloud!