Last week I attended two conferences organised by O’Reilly: Strata (themed around Big Data) and Velocity (performance and administration of web applications).

Recently I have been exploring various NoSQL databases, so when I heard the Strata conference was coming to London for the first time, I decided to attend – after all, many of the NoSQL products are very closely associated with the world of Big Data. Seduced by the discount for booking both, on a whim I decided to attend the Velocity conference as well. Two days for each, so that would be 4 days of presentations. I made sure I had plenty of sleep in advance…

Strata came first. My goal here was to get a wider understanding of the whole Big Data scene – hot technologies, interesting problems that the community faces, and so on.

My first impression was that this is a field still being explored – even the problems are not yet well-defined. Several of the speakers offered their own definitions of “Big Data”. I think it was George Dyson who suggested that the Big Data era began when it became cheaper to keep all your data than to spend human effort to delete it. A more subjective definition was that you know you have Big Data when you have to start thinking about the size of it – which suggests that the threshold will rise as the state of the art moves forward.

I’ve not seen Hadoop in real deployments, so it was interesting to hear the war stories about that, but there were plenty more technologies under discussion. I heard about RDF, Clojure, Cascalogic, techniques for visualizing and exploring data, and much more.

Funnily though, one of the talks that had the most impact on me was not a “deep techie” thing at all: in the last session on Tuesday, Felienne Hermans of Delft University spoke about PhD research she’d done into corporate use of spreadsheets. We all are vaguely aware that Excel gets (ab)used for all sorts of things – largely because it is a quasi-programming environment that is used by non-programmers – but do we really know the true extent? A spreadsheet can combine data, logic and presentation with a complete failure of “separation of concerns”. Felienne had worked with an investment bank where the management initially estimated there might be 10 thousand spreadsheets; the correct figure was more like 3 million. A timely reminder that while we worry about the challenge of slightly rough data in our databases, there’s a whole lot of business-critical stuff out there in users’ home directories…

Velocity followed on Wednesday and Thursday, and here my objective was to catch up with a field where I was a bit stale – my real web experience dates from 5 years ago, and of course things have moved on. There was a lot of talk about DevOps, but this isn’t so new to me; instead I tried to cast my net wide, and went to talks about queueing, monitoring, stories of real-life experience, and various new technologies.

The Velocity conference seemed slightly more “corporate” than Strata, perhaps because it seemed mostly to be about better ways of tackling well-known problems, rather than working out what the heck the problem actually is. Strata was asking, “What do I do with all this data? Is there a business model hidden in there, or knowledge that I can extract? How can do I do any of that?”. In contrast, Velocity mostly concentrated on more specific questions for a more mature field: “How can I monitor the performance of my app around the globe? What metrics should I track? Can I use DevOps-style agility to improve stability and deploy releases more quickly?”

At both conferences there was a good selection of exhibitors; particularly at Velocity where the more mature problem space means there are more players with competing offerings to sell. As a fan of open-source, I find it encouraging how many of the free products now have companies to back them and sell extra support (and conversely, how many companies choose to open their core products). Most of the stands were definitely geared to the technical nature of the conferences and were able to deal with proper in-depth questioning.

The least satisfactory aspect of the whole thing was the hotel conference rooms. All of them had the same narrow chairs bolted together in rows. I’m certainly no “big guy” but I was at least an inch or two wider than the chairs, so in a well-attended talk everyone ended up very tightly wedged, or taking it in turns to lean forward. In most of the rooms the projection screens were low and you couldn’t see the bottom half of slides from the back. A plus for the hotel was the good-quality food; though this may not have helped with the narrow seating!

As you’d expect, this event has sparked a whole lot of questions and further research to do. I’ll certainly be looking into a bunch of new technologies – Hadoop the Definitive Guide is first up on my reading list – but it seems that statistics is going to become a surprisingly in-demand skill as businesses try to extract the patterns from their data. Statistics in a Nutshell next, perhaps…

Overall, my first experience of these conferences was very positive – in both cases, it was a great way to get a survey of the scene and drill down to a few more in-depth areas too. Of course I can’t speak for anyone who had more specific objectives, but it seemed that the corridor conversations around the formal talks offered plenty of opportunities to make contacts, and get into more detailed discussions. I suspect I’ll return in the future, hopefully with some stories of my own to tell!

Gordon Banner is a sysadmin and infrastructure consultant who is interested in almost anything technological, but when forced to specialise will concentrate on supporting developers and maintaining applications at enterprise scale.