1,200 team leads, architects, project managers, and engineering directors descended upon London for the sixth annual, sold out, QCon London 2012. Over 100 speakers - seasoned practitioners themselves - presented 75 technical sessions and 12 in-depth tutorials over the course of five days. Industry luminaries such as Martin Fowler, Rich Hickey, Ola Bini, and Steve Vinoski – just to name a few – could be seen mingling with attendees between sessions and during the numerous social events held during the conference.

The increasing popularity of cross-platform mobile development, NoSQL, Big Data, and cloud computing was reflected in some of the track themes at QCon London this year. Credit Suisse’s mobile architecture; NoSQL & Grids in the finance sector, and Big Data architectures at Facebook and Twitter were a just a handful of the many real-world talks presented at the event.

This article summarizes the key takeaways and highlights from QCon London 2012 as blogged and tweeted by attendees. Over the course of the coming months, InfoQ will be publishing most of the conference sesssions online, including 19 video interviews that were recorded by the InfoQ editorial team. The publishing schedule can be found on the QCon London web site.

The conference started with two training days – six full-day tutorials each. From my perspective, the most interesting two tutorials were “Cloud Architecture” by Adrian Cockcroft, where he shared the architecture, best practices and decisions behind Netflix’ cloud (which Artifactory is proud to be a part of) and “Continuous Delivery” by ThoughtWorks’ Tom Sulston (that’s as close to the roots of the famous “Continuous Delivery” book as you can get). For me, the most fascinating thing in the Continuous Delivery process as ThoughtWorks sees it, is that its virtues are exactly the same as we based our Artifactory upon back in 2006: DevOps automation and rapid release cycle. We appreciated the validation of our concept.

We had several exercises where we had to discuss various topics in pairs and then we would call out responses. Honestly I did not expect any exercises given the topic but this was clever. I got to know how other people have organized teams, what problems they’ve encountered, got to hear different opinions on roles – not just Architect role, and it was really interesting because you don’t have opportunity to do this … well, ever… All in all, I never thought before that Software Architect would have so many responsibilities and so many soft skills required

Learning about how Netflix team works with Amazon providers: no sysops team, developers pushing builds over from testing to production. Fully in-cloud QA environments. Overreaching specs, almost no-existing capacity planning. Also, the out of the box account/keys management provided by amazon wasn't good enough, so they had to implement they own….

What is interesting is to learn about the risks they had to look into before even deciding to go to any cloud. Apparently, if forced to, they can move out of Amazon in 3 months. At the moment they would move into private cloud, simply because there is no other provider that can handle a company of the that size….

Moving to the cloud took 3-4 years. The bulk of work was to actually re-factor the system so it's more manageable. It's amazing how many things were broken and needed fixing before the move….

The main problems Netflix developers tried to solve during the switch were: the development teams interaction and the kitchen sink objects (like Movie or Customer). The first one was fixed by more service oriented architecture, with grained libraries and well defined responsibilities, for the second they used "facets pattern" which I'll describe later. Basically it's about understanding that objects can be represented differently when they are used different. It gives developers a way of working with the same objects without breaking each other….

Lots of good practices on logging and monitoring. Especially on how to use hadoop/hive and AWS for that….

We also heard about how they deal with security, making sure that only right people have access to the instances, but also limiting the ways services can cal each other - only services that are part of certain security group can call a service, easy way to find out who calls who….

An overview of current Netflix Persistence stack. It included info on Memcache, Cassandra, MongoDB and MySQL….

Cloud bring-up strategy - they used Shadow Traffic Redirection, worked in iterations, one page at a time, starting up with the simplest . They managed to "sell" the cloud to all developers early on a development boot camp. Most of the issues they faced were around key management and early lack of experience with the AWS.

The monitoring is based on logs (they log everything almost all the time). The logs are processed in Hadoop and are used to generate reports. They use AppDynamics as a portal that gives them deep look into what's running in production.

One interesting take that I got out of this tutorial is why Netflix so wholeheartedly embraced cloud computing in general and Amazon Web Services (AWS) in particular, even though they could continue to afford their own datacenters and are now actually competing with Amazon in the field of movie distribution. In most big companies, CEOs are the ones who are very careful about transition to the cloud. But in Netflix’s case the CEO was actually the one evangelizing the cloud to the non-believers. Reed Hastings, CEO of Netflix was originally a software engineer, as were many other senior managers at Netflix. As Adrian explained, Netflix is pretty much a company run by software engineers. Adoption of the cloud computing seems to be very much dependent on internal relations between developers and operations people. If Operations people, with their traditional reliance on hardware and software vendors, have the upper hand in the company, than transition to the cloud might be quite slow. But if the developers run the show, than the cloud looks like a very elegant way how to get rid of the annoying operations people.

The session was excellent, packed with information, lots of hands on as well. Definitely gave me enough to want more and I should really start reading Programming Erlang since there’s so much to it and we only skimmed the surface…. One thing that impressed me the most was Erlang’s lightweight processes. Your man, Francesco, spun off 10,000,000 of those (one was spun off by the next, sends it a message, receives a message and terminates) – it took just 12 seconds for them to finish. I thought that was really impressive. Another impressive thing demonstrated was how easy is to create distributed processes and communicate between them, and actually monitor if they are still alive and it was all in just few lines of code.

Keynotes

It was about weird and interesting things we do. Things that stuck were: Reusability is overrated, DRY has it’s dark side too, we love solving problems that nobody actually has, and ultimately – software is there to bring value. So we should be writing good enough software – and that was the highlight of the day actually. It was all about common sense, simplicity, back to basics.

My favorite keynotes are usually the evening ones. A beer in your hand makes any amusing talk even more enjoyable. One named “Developers Have a Mental Disorder” I couldn’t miss! Greg Young gave a great show, funny and entertaining, about serious dilemmas in software development that we, the developers, prefer to ignore. The brightest example, of course is the downside of DRY (did you ever think about one?). By removing duplication, we increase coupling, which can be much worse.

For someone like me, who doesn’t deal on a daily basis with Disaster Recovery, the session was astonishing. Looking behind the curtain of that kitchen reveals a totally different way of thinking and planning. It may be how individuals and teams have to perform during a disaster (e.g. personal heroism is bad even if successful; it sends the wrong message to the team), or simulating disasters on live production systems (I never could even dare to think about that). The most obvious, but still eye-opening advice that John gave is to learn from successes, not only from failures. It can give us a lot of information and happens much more frequently, no? The only organization with which I am familiar that embraces that technique is the Israeli Air Force.

There are number of things that go wrong when the teams are faced with a failure of a complex system. There's refusal to make decisions caused by either lack of authority (people want to make a decision but they are not sure if they can), lack of information (people can't make a decision, they can only guess) or bureaucracy and politics (people that are able to make a decision can't make it fast or without some approvals). There is "heroism" - individuals who walk away from the team to focus on what they think is the solution for the problem. If they success they send a wrong message to the team and the company - that the issues are solved by individuals, if they fail they abandoned the team when facing a disaster. There are also distractions - you need to be able to cut down the distractions to minimum when dealing with a disaster. This means irrelevant emails/links/social events but can also mean isolating business owners from the team if they only add distractions when "panicking" over the outage….

Drill - practice troubleshooting under pressure, be comfortable with your tools,try to come up wit new ways your system can break, and practice the ways of dealing with those. The are actions that can be taken immediately when a disaster happens, make sure those are fast and automatic for your team.

Near misses - communicate them and widely distribute them. They happen more often than actual disasters so you get much more volume of incidents to improve on. They are a reminder of the risks.

Spend as much time understanding why your team succeeded as on why the team failed. When faced with choice whether to analyze your success of your failure choose to analyze both. Think what and why things go right.

Learn from other fields, train for outages, learn from mistakes (and avoid politics and bureaucracy), learn from successes as well as failures.

The key note [today] was given by John Allspaw and was about Resilient Response in Complex Systems.Drills on live system, comparisons with running an aircraft carrier, near-miss events, learning from failures, but also from successes!, learning improvisation, Mean Time Between Failures Recoveries – all of that had been touched and more. Some of the mentioned – really scary stuff – like – simulate failure of components in live production. But also some common sense – why don’t we learn from our successes too when we presumably succeed far more often than fail.

He spoke about the differences between “Simple” and “Easy”. Sounds pretty similar, but in fact they are very far from being the same. Antonyms to the rescue: simple vs. complex, while easy vs. hard. Now it’s clear – we need to strive to prevent and remove complexity (go simple) without being afraid of the hard. Choosing easiness may end up creating complexity. Things which are easy, but complex, include OOP, mutable state, imperative loops and frameworks (especially ORM). Things which are simple, but not necessarily easy (at least not until you get them), are LISP-ish languages, like Clojure.

And actually that was the key thing of the talk – we need to know trade-offs we are taking when we introduce, say a new framework – what complexity it entails for the convenience of not hand crafting 20 lines of code for example. At least that was my interpretation of his talk. Also, he presented couple of tables – The Complexity Toolkit and The Simplicity Toolkit – where were looking a bit biased towards functional programming – however I took them as: if you take from the former know the trade-offs, if you take from the latter know possible benefits (over the former) and make decision for yourself. He concluded along the lines: Simplicity is a choice. No tool will do it for us – testing tools don’t care if it’s simple or not.

I was already astonished by Rich’s thoughts clarity in the Clojure documentation but found a real visionary man with the datomic disclosure a few days ago. He’s a kind of philosopher, trying to first go very deep to grasp the essence of concepts like state, identity, time, .. and then applying his thoughts to technology, and I’m fond of that way of thinking and doing.

I already wrote about state, identity and behavior some weeks ago. Here it’s a more general write about simplicity and also the subject of Rich’s keynote at QCon : « Simple made easy ». Simplicity is a subject that fascinate me, as I see it as one of the ultimate goal in my design activities. So I will humbly mix my thoughts (sentences starting with « I ») on that subject with the ones from Rich (sentences starting with « Rich »)….

Why take all those times to write about simplicity in what can be seems as very philosophical? Because simplicity is a key concept and a key skill to design our software, not only in their technological aspects but very firstly in their domain ones.

His point was that any company can have big data problems. If you have a single server and the only approach is to “buy a bigger server” when you need to scale, that will only take you so far down the curve. At some point you need to be able to scale horizontally, not just vertically, and doing so often involves a different approach to the problem solving domain (e.g. MapReduce instead of relational databases).

Martin Fowler and Rebecca Parsons opened the conference with the key note – Data Panorama. Martin was very funny screaming how big data is to open the talk. And it turns out that data is Growing and how we use it is Changing. Keywords were Growing, Distributed, Valuable, Urgent and Connected, and I believed every word Who would argue with this. Nowadays when even a fridge can tweet data is coming from all various devices which wasn’t the case just few years ago, so now challenge is even to decide how much to store and for how long, let alone how to use it. Response to this change are NoSQL databases – and Google with BigTable and Amazon with Dynamo started the trend out of necessity. The necessity being – couldn’t scale up anymore – had to scale out and relational databases couldn’t do it. So now we have new wave of databases that are offering convenience (easier to store aggregates) and distribution (sharding). Talk went on about polyglot persistence and event sourcing; then how the data sources have changed – it’s not tables only anymore, it’s text and images and video and connections … and we need to analyze it differently – obviously – with emphasis on visualization of the data. I could go on and on about this – obviously I found this highly inspirational.

The main idea was to use example of the team both speakers worked with in past 4 years to show how being agile, focused on good process and delivery can lead to "delivery zombies" - teams that only focus on delivering stories without asking how? and why?

They did the right thing at the beginning and it worked well. The team was piloting agile approach in the company, it took on new project, started off with brave decision of building up a new platform as a backend for the new site. Managed to successfully deliver but also to build a "perfect" agile environment, strong foundation for future projects, and a team that felt they work towards common goal….

Thanks to the success the team build up reputation, good relationship with business owners, delivery of features became easy. Shipping became a measure of success. Some early signs of future problems started showing up: the goals the team had in front of it felt small, the technical debt started building up, the "how?" question being answered the process felt too easy/boring, and the team started making mistakes - slipping. Task weren't always picked up in priority order ("This is first in the queue but it's a lot of front end, I can't work on it"). Sounds scary when I think about my team... :S

After 160 iterations the team became a "delivery zombie"….

[After] 176 iterations is a new project is introduced…

190 iterations in there are conflicts in the team, there is no common purpose, and people are not working together anymore. The team realises that delivery of the features can't be it's measure of success anymore - they are still delivering, but they can see that's not enough. They struggle to define new goals and values.

At the end the projects is still shopped on time, and there is a positive feedback both from business and users. But the team during retrospective doesn't feel successful. The takeaway: vision is important, needs to be reinforced often and on different levels. The team needs to share common values and goals, and needs to understand (and ask) why? as well as how? Keep on asking what it means for your team to be "good", "successful" and "right".

“Crazy Talk – When 10 second builds start to make you nervous” was indeed crazy – perfectly so – and won my vote for the best talk of the conference. Daniel Worthington-Bodart was passionate to the point of benign insanity about reducing build times. How to get a 30 minute build down to 2 seconds shows awesome commitment. Measure your build! – you know to do this for your app, do it for your build as well. Don’t split up your tests! – split up your app instead. Manual test! – not just AT.

Daniel Worthington-Bodart's talk on 10-second-builds revealed another major issue with heavily tested projects: they can take ages to build (especially when checking integration points), which can become a major problem in itself. If you follow another best practice of making many small, incremental changes, then waiting for a clean build after each can seriously damage your effectiveness. Coffee run build times can be handy, but you don't want that sort of disruption for the whole day.

Stefan was talking about system boundaries, how three layer architecture everybody draws is too generic and how one project doesn’t necessarily means one system. On he went to talk about system characteristics and argue that we should really have Cross System and Internal architectures, rules and guidelines (Micro and Macro architectures if you will). Where Macro architecture would define separation and interaction between the systems, and Micro would be responsible for individual systems where we could have even different languages between systems. …

The take is – redundancy in data is not necessarily bad, and most probably in relational databases, for various reasons we already have it. He argued that this maybe doesn’t feel right but it is better in the end.

A really entertaining, insightful and somewhat shocking talk. Not an unusual talk by Dan North in anyway. He talks about decisions (obviously) and that each one of them is a trade-off. Sometimes we forget to weigh these trade-off’s.

So what he said was that logic wasn’t really complex, the load was the challenge – 100K+ DB operations/sec, 50K+ DB updates/sec. What they did was to rewrite their backend 4 times in 2 years, and it was exciting to see how the architecture evolved. They’ve started with Ruby and MySQL, went on to Ruby and Redis, then introduced stateful server with Erlang and did saves to Amazon’s S3, and finally settled for Ruby+Erlang. This is perfect example of Polyglot Programming – Erlang is great for infrastructure kind of code – supporting sessions in reliable and super fast manner, but Ruby has syntax that’s much more appealing to the eye (talking about readability of business logic).

Jesper outlined the journey Wooga took in terms of evolving architecture, where each new game gave them an opportunity to try something new and evolve their technology.

Starting with a traditional technology stack (MySQL/PHP/Ruby on Rails), the engineers at Wooga eliminated their database bottleneck first by using Redis and ultimately by switching from a stateless server to a stateful one.

To build a robust stateful server, they used Erlang, which brought in other problems such as code readability, testability and maintanability. Their ultimate solution to this was to use Erlang for the core parts of their backend and handoff data to small workers in Ruby using a message queue, which gave them the best of two worlds.

Jesper emphasized how the Wooga's focus on small teams, collaboration, generalists, effort reduction and innovation paid off in spades in their journey to become the 2nd biggest social media games development company.

It all boiled down to iterative-incremental architecture development – small iterations with mandatory review of the architecture and some time for refactoring. Reviews and refactorings are important to prevent architecture erosion. Design for change, design for testability.

Old system which they replaced in 2011 was a big monolith - one code base for rendering of dynamic content, third party apps data and anything else you see on news page. The system was replaced with what they call micro-apps framework. They divided the page into content (managed by CMS) and other components. The components were rendered based on responses from micro-apps. This was an SSI-like solution based on HTTP protocol (simplicity)….

The cost of new system - support for all different apps was far more complicated, maintenance was harder as the applications were written in different languages and hosted on different platforms, extra work had to be done to make it easier for people to work within the system….

Things to think about when planning for failure of micro-app - do you always want dynamic traffic or speed? Do you have peaky traffic or flat one? Is small subset of functionality enough? In case of Guardian the speed is sometimes more important than dynamic content, so they came up with the Emergency Mode. In Emergency mode caches do not expire, the "pressed pages" are served, if no pressed pages are available the cached version is used, if cached version is not available only than render the page as in normal mode.

Pressed pages - fully rendered HTML stored on the disc as a flat file, served like static files, can serve up to 1k pages per second per server.

When caching be smart, cache only what's important….

Log only relevant and useful stuff, learn how to analyze your logs, make them easy to look at an analyze with the tools you've got. Measure your Mean Time Between Failures (MTBF) and Mean Time Between Recovery (MTBR). If you fail rarely but need a long time to recover it's bad. You want t recover faster.

@bruntonspall talked about how they scaled their system and importantly learnt to live with failure. (He noted that a MTBF metric is in fact less important than a MTTR, citing the example that a system which fails every year but has a week of downtime is much worse than a system which fails every five minutes but has a 1ms downtime.

The Guardian’s content management system uses a collection of server-side includes to build pages, so that widgets (such as the live twitter feed) can be merged in on the fly with the content about the article. These ‘micro-apps’ are mini widgets that are effectively composed by the top layer of their stack and then served/cached as a combined HTML file. A multi-stage cache and failover system performs global distribution (at the top layer) followed by load balancing in each data center; each of these goes through an HTTP cache to access content. Thus, the twitter widget may have a recently rendered view of the state of the world, and instead of the micro-app having to do a lot of rendering each time, it can delegate and cache results.

There were some additions to their cache processing; stale-if-error which permits cached (but stale) content to be served if the remote endpoint errors (e.g. if the service is transiently re-starting) and a stale-while-revalidate which permits stale content to be returned whilst checking the validity of the element. These are documented in RFC 5861, if not widely used. (Squid has stale-while-revalidate in 2.7 but not 3.x.)

Their apps also have a number of flags which can be switched on and off (and in some cases, automatically). These may be used to extend the lifetime of cached data (say, because the back-end database has crashed and the ability to regenerate content has disappeared) or because a sudden spike in load has meant an extra level of delay can be optimised by not having content re-generated. Having automatic processes which monitor the state of the application and turning the flags on automatically can remove the need for human involvement – though debugging and tracing trends in the monitoring and need for flags is still required.

They did have a couple of interesting nuggets in the presentation. One of them was a data-mining/visualisation map which looks a lot more interesting than it actually is (basically the structure of the map is derived from location data of people sending messages coupled with their interconnects; needless to say, people in one location tend to know people in the same location so thus it effectively becomes a way of knowing where people are based).

The other was the growth of Facebook data in TB (compressed):

2007 - 15

2008 - 250

2009 - 800

2010 - 8000

2011 – 25000

They also mentioned that they had a couple of data-center moves in the process and that the move itself was difficult in terms of the volume of data. In the end, large trucks were driven to take the PB of data in servers and then was subsequently sync’d with more recent changes. There’s more information on the moving an elephant blog post on the Facebook website.

When tweets come in to a user’s timeline, they get written both to that user’s timeline as well as all the users that follow that user. For a small number of users, this scales well; as the user tweets, it fills everyone’s timelines that it needs and permits the lookup for each user to be constant. However, for popular accounts, this breaks down; so for really popular timelines (say, 1m or more followers) a different approach is chosen. Instead of having a massive fan-out, the users that follow the popular account’s user timeline is tweaked to be a union of that user’s timeline and the popular follower's timeline. That way, data isn’t duplicated in large volumes, and the merge-upon-lookup is quick enough that it doesn’t affect most people. (Also, most people don’t follow that many popular accounts, but just a few.)

The other feature highlighted is the similarity between search and timeline. When a tweet is added, a background process tokenises the word and then adds it to a search index. For timelines, the user’s timeline is a union of one (or more) timelines and then returns with a set. These can be generalised into one process with ‘index’ being the lookup/data and ‘store’ being the database hit. This allows the same functionality to be replicated and new features added (although the #dickbar wasn’t mentioned …).

Interestingly, the communication between the front-end and the services was HTTP, the inter-process communication was done by RPC with Thrift, and Redis to persist the data into the database.

The talk was mostly covering the differences between the "traditional" databases and the new technologies that were introduced in last decade. It provided a helpful insights into what questions one should answer when deciding on which technology to use, and unsurprisingly "which one is sexier" was not one of those questions….

The situation in data sources domain is much different that 10 years ago….

Map Reduce is gaining popularity since 2004….

The new approach can be characterized as bottom-up approach to accessing very large data sets, that allows us to keep the data close to the code (close to where it matters) and to keep it in it's natural form….

As early as 2001 some people closely connected with "traditional" solutions realised that there are problems with existing DB - that they are good tools, but they force developers to "speak different language", they need to evolve and allow more flexibility. The NoSQL movement came about because developers wanted a simple storage that works on diverse hardware, that scales, and can be talked to in developer's language of choice. "One size fits all" is not true anymore.

Additional factors: discs - older DBs were optimised for sequential access over magnetic drives not random access over SSD, growing speed of our networks. Mind that there are relational databases that leverage latest technologies - they can achieve very good results in benchmarks.

When comparing and deciding on solution don't only think about the size of your data set, it's potential to grow or even only about abilities of each tool. Think about what You need/want. Do you mind SQL? Is your data changing a lot constantly, is the data isolated? do you want to/can you keep your data in the natural state? which solution you can afford?

Tobie and his team at Facebook put a lot of effort into analysing the most popular native applications and finding out what capabilities were missing in web applications to make them on par with native applications in terms of user experience.

Facebook recently launched ringmark, a test suite aimed to accelerate the adoption of HTML5 across mobile devices and provide a common bar for implementations of the mobile web standards. Ringmark provides a series of concentric rings, where each ring is a suite of tests for testing mobile web app capabilities….

Ring 0 is designed as the intersection of the current state of iOS and Android and 30% of the top 100 native mobile applications can be implemented using ring 0 capabilities.

Ring 1 includes features such as image capture, indexDB and AppCache. Browsers implementing ring 1 should be able to cater to 90% of the most popular native applications, most of which actually don't or need utilize advanced device capabilities such as 3D. Tobie highlighted that getting ubiquitous ring 1 support should be the short term goal for mobile browser vendors and developers to drive mobile web adoption.

Ring 2 will fill the gap with the final 10% of applications, with things like WebGL, Web Intents and permissions. Ring 2 is aimed to be a longer term goal.

Mobile Web should also be able to achieve beyond 100% of the native apps, with capabilities such as hyperlocal applications (e.g. an application tailored to a certain local event) and deep linking.

Twitter feedback on this session included:

joaogsantos: Only 30% of the top 100 apps of ring 0 (Android and iOS) could be completely built in HTML5. That's not much. @tobie#qconlondon

They are getting 7.2 mill messages through per minute during the peak, and the key is Shared nothing messaging: Peer-to-Peer architecture, Parallel persistence and Single message API. …He boldly stated that queueing [as we know it] is dead long live [publish/subscribe] queueing – btw, that’s my interpretation.

This has to be my favorite talk. These guys know their hardware well. They showed how it’s possible to create lock free algorithms which run in nanoseconds if you apply some mechanical sympathy to your code.

They showed how taking the time to construct a simple message queue between a single producer and consumer, which requires no locks could make things more efficient. But more than that, taking some time to consider the nature of the machine, the realities of exactly how a CPU handles memory, what bits inside a CPU are capable of what kinds of calculations, it is possible to make immense gains in performance (or perhaps avoid immense losses).

In their example they highlighted a couple of specific considerations. The first being that the default algorithm required a divide operation to calculate the position in the queue to use. Divide is a fairly expensive operation for processors. By making their queue length a power of 2, they could make use of the fact machines work in binary to construct a mask that allowed a much simpler operation to arrive at the same result.

Secondly, having given a whistle-stop tour of how CPU’s work, they talked about incidental sharing, where two variables, used together in their program would likely be allocated into memory next to each other. Each variable (both pointers) where relatively small structures, a 32bit int (4 bytes). However CPU’s access memory in 64Byte rows. this is the smallest unit they move, and when they access it, they effectively lock the use of everything in that row….

Under the hood they showed that the number of operations sent to the cpu was broadly similar, about 4500 operations, however without the machine sympathy, that translated to about 63Billion cpu cycles, with the machine sympathies this came down to a little under 8Billion. This means the ratio of instructions to cycles was horrifically inefficient before, and all they did was make each clock cycle count more.

Erlang is a programming language designed to satisfy all these six rules. It is not a coincidence that some of the world's most reliable systems have been written in Erlang….

One of the highlights of the talk was this quote from Alan Kay:

"Folks -- Just a gentle reminder that I took some pains at the last OOPSLA to try to remind everyone that Smalltalk is not only NOT its syntax or the class library, it is not even about classes. I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernel of Smalltalk/ Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase)...."

The key idea in OOP has always been messaging and encapsulation rather than classes or methods, which, unfortunately has been how OOP is being taught generally.

An interesting footnote from the session was when asked about his opinion about Node.js, Joe mentioned that he is not really fond of event based programming and the style of such programming is difficult.

The [availability] solution is based on idea of having multiply load balancers running as internal apps between front end applications and backend provided by Heroku. In addition to the balancers level they have supervising applications that are checking on the instances and bringing them up when necessary. This already shows that they use layered design - different problems are handled on separate levels in the system which makes it more robust and easier to monitor. Finally, they use well defined, non-app specfic messages that can pass increasing number parameters between layers and apps. The messages are versioned - explicitly or handled with graceful degradation.

The session was a high level look at how maintaining a sustainable speed of delivery can be the key to the success for teams working in fast paces, high traffic, web based industries….

Sustainable speed is desired because it gives your project responsiveness - you can react to changes in your users behaviour/business model changes/any external factors, it means greater returns - as you are able to deliver more features, especially in social gaming business it means you can earn more off your users, your investments are less - you deliver fast so you work less (time-wise, but not only)….

His point was that SQL is very well suited for some tasks (like batch processing tabular data) but that it struggles with some of the more hierarchical data; and that SQL as a language isn’t that bad, but rather the uses to which it is put. As an example, he showed a Java NIO Grep example which was performing a filter of CSV records whose 7th field was Think; on a standard developer laptop, it buckled at 100m data files and took 1.2s to process; but on a 1G data file, the Unix command cut -d ',' -f 7 | grep Think performed the same search in under a second.

He highlighted the uses to which grid computing is put; either as a large distributed memory cache (thereby taking a lot of memory distributed over many machines) or as a large compute farm (thereby taking a lot of CPU distributed over many machines). He also observed that invariably it’s easier to spin up a new VM to acquire this (possibly returning it when it’s not being used) than it is to go through the procurement process to acquire new hardware.

… an example of considering the way data moves over networks. It doesn’t just move in streams of bytes, it moves in packets, and packets have a maximum size. He gave an example of checking the size of basic icons on a website, which might be incidentally just slightly larger than 1 packet. Ignoring this means you require 2 packets to deliver that icon. But perhaps you can squeeze it down a little, and HALF the cost to your network of serving that icon. We’re not talking about great leaps of engineering here, just caring enough to both doing a little maths, and understanding the impact of your decisions….

When a customer hits a website, how much traffic will move over your network? how many calls to the back end will be made? How many database calls will that cause? what disk I/O? Are you monitoring all of these factors? Do you know what you system looks like when it’s healthy? If not how the hell do you hope to understand what went wrong when it fails? (and it will fail)

It would be enough to say this talk was delivered by a beautiful lady packed with a room of geeks but it deserves more praise. It’s a quite technical talk on an innovative and great piece of software or rather a framework: The Disruptor a concurrent programming framework. If you’re unfamiliar with it or even more so unfamiliar with concurrent programming you might find it interesting.

A new Object() takes up 16 bytes, and a new byte[0] takes 24 bytes. In addition, an object’s memory layout is often padded to the nearest word; so a class with a single byte may take 24 bytes of space, and a subclass with another byte may take 32 bytes. Some of this is hidden in libraries, such as Apache Thrift, which peforms network deserialization of data. In a class with a primitive field, a BitSet is used to determine if the primitive value has been set or not; but that BitSet adds between an extra 52-to-72 bytes worth of space depending on the word size.

Compressed OOPs can help – and are default on newer JVMs below a 30Gb size – but for applications which go above the 32Gb heap size, the jump is an extra 30% of memory to deal with the long words.

There was a general warning against using ThreadLocal storage, which has a nasty tendency to escape the local surrounding space and thus leak memory (or connections if there are many threads).

There were additional features such as -XX:+UseSerialGC, +UseParallelGC, +UseParallelOldGC but probably more useful was -XX:+PrintGCDetails, +PrintHeapAtGC, +PrintTenuringDistribution which can be used to determine whether the size of the nursery (or Eden) generation is appropriate. Tuning the young generation is far more important than any other aspects, since all tuning needs to start with the initial generation.

Java 8 release is planned for middle of 2013, after that they plan a new release every two years, with plans being made for Java 12 already….

We've proceed to talk about most important features planned for Java 8. The first one being Lambda Expressions - http://openjdk.java.net/projects/lambda/. They will make writing parallel map/reduce code much easier (although we'll see if it will be much easier to read). They will replace use of inner classes which gets the a gold star from me….

There's a new way to extend old classes with new functionality - extension methods will allow you to add a method to an interface that won't have to be implemented by it's children. If a class doesn't implement a method a default method defined in the extension method will be used. That way Collection classes can now use map, reduce, filter and parallel methods with lambda expressions.

Module-info.java file (http://openjdk.java.net/projects/jigsaw/) is a result of a trend towards modularisation of java apps and java as a language. The idea is that you can get rid of classpath and define your dependencies in a more flexible way.

The are still problems we face in the application development teams: randomness (random libraries are used, there are random ways of doing things), project infrastructure takes too long to set up, we use unknown, unapproved libraries, there is no easy way to have a clear idea of which project is at what state at any given moment, there's too little automation, there are few or no metrics on code quality, test coverage and re-use. PaaS could give a way for fixing some if not all of those issues….

The evolution of PaaS so far was about moving from just allowing developers to deploy applications (they still need to care about what the infrastructure looks like) through speciation, expending services towards dealing with multi tenancy. PaaS will obviously be still evolving.

The various efforts around HTML 5, JavaScript and the mobile web all point to an improved developer experience. The question is how soon will this future will arrive? Combined with browser vendors pushing updates aggressively and consumers changing mobile phones every 1-2 years, it might not be as far as it seems. Listening to the talks also confirmed my opinion that native mobile apps are only a stopgap solution and the future lies in HTML 5+ and JavaScript as the platform that will power applications in the future.

The Application Cache is generally a bad idea for real applications (there’s no control over order or priority on the order of elements). However, using an app cache in an iFrame can result in you firing up multiple applications (and the cache benefits therein).

There was some mention of things like onClick being broken (you can roll your own with touchstart and touchend) as well as numbers being interpreted as phone numbers (which can be disabled with format-detection telephone=no).

For higher def mobile devices (such as the iPhone 4 and the new iPad) the rendering of images can be slower when rendering at full scale. Moving to a viewpoint initial scale of 0.5 can speed up rendering significantly – although this needs the images to be twice as big.

Allen started off his talk by illustrating the two major eras in computing, the corporate computing era and the personal computing era….

Every computing era had a dominant application platform….

In the personal computing era, the dominant platform was the combination of Microsoft Windows and Intel PC (much lovingly called Wintel). In the emerging ambient computing era, it is becoming clear that the new application platform will be the web.

Each computing era also had a canonical programming language - COBOL/Fortran for mainframes and C/C++ for personal computing. The canonical language for the web and thus the ambient computing era appears to be JavaScript….

After the ECMAScript 4 fiasco, TC-39, the committee responsible for deciding the future of JavaScript, is moving a lot faster and is more driven and organized to improve the language. There are a lot of improvements to the JavaScript language coming with ECMAScript Harmony, which represents ECMAScript post version 5. Some might be considered controversial, such as the inclusion of classes, and are ongoing current discussion. Considering the slow browser adoption rate, even ES5 is not yet mainstream and will not be for a couple of years more. This unfortunately seems to be one of the biggest bottlenecks in moving the new ambient computing platform forward.

WebIDL and JavaScript have a cognitive dissonance problem. DOM was specified as an API for the browser programmers rather than the actual consumers of the API who are the JavaScript/web developers. It was also devised at a time where there were expectations that other languages than JavaScript would be consuming it, and artifacts of such an ideal still persist in the API. Moreover, DOM does not conform to normal JavaScript rules. The DOM types cannot be extended or constructed. It is not possible to do a new HTMLElement() whereas it would be very useful for many scenarios.

As web applications have increased in complexity, the disconnect between application data and the browser model has grown making web development painful. The developers have been trying to solve this using frameworks such as Backbone.js, however they are not perfect. Alex outlined two proposals to W3C that seek to make web development easier.

Shadow DOM is a way to create web components by a browser provided API. Modern browsers include native controls, such as the standard HTML form components. These built in controls are isolated from the rest of the page and are only accessible through whatever API they expose. There is currently no API to create third party components with the same strong encapsulation enjoyed by the native components.

The other proposal is Model-driven Views which reminded me a lot of how Knockout.js works. MDV provides a way to build data driven, dynamic web pages through data binding and templating via native browser support.

Having worked with Erlang for a real application, he shared several of his observations.

Erlang is simple: the core of the language is small; there are very few types, no classes and no object orientation.

Erlang is weird: It has a syntax influenced by Prolog, which nobody uses and is nothing like other programming languages.

Erlang is extremely productive: You can be very productive with it and produce small, reliable code once you come to grips with the syntax.

Erlang is built for the current reality: The Erlang model of isolated memory and processes is closer to the current reality than the shared memory space most programming languages use for the current multi core architectures.

However, it has a caveat: Erlang performance is slow. The Erlang VM, while beautiful in design is not as fast as other VMs like the JVM. Damien linked the reason for this all the way to the strange syntax Erlang has. A language needs mass adoption and investment to be fast, and for this to happen it needs to be familiar to programmers. Erlang's unfamiliar and "weird" syntax is preventing it from getting mass adoption.

Erlang is not a perfect fit for every problem, string processing being an example -- but it is perfect for distributed systems that need to be reliable.

A lot of the benefits of Erlang can be achieved in C/C++ by following certain practices, however will take as much as 5-10 times the coding effort, but 5-10 times the performance as well.

Damien Katz, creator of CouchDB, explained why he found Erlang weird, simple, reliable, beautiful alas slow. Dubbing it Language from the Future – with appropriate background picture (hint: trilogy ), he pretty much covered when you would want to use it and when you might resort to some lower level alternatives. The bottom line is – it can do anything and everything but slow adoption rate is hurting it and slow adoption rate might be simply due to weird syntax. He strongly declared that the language is beautiful and once past syntax barrier productivity is phenomenal.

Dan’s whole presentation was called ‘There ain’t no cure for the distributed blues’ and I think it boiled down to some of the same things that Horia said.
Distributed is HARD. Keeping it together is hard. Communication is paramount.

The next talk was Ain’t no cure for the distributed blues by Dan North. Very amusing and informational talk… This wasn’t really that much about working distributed as about how to cut waste, importance of communication and making people want to do something rather forcing them – ah yes, all with background Star Wars theme.

GitHub’s view is to enable creativity, and recognises that creativity is unlikely to happen between restricted hours. In addition, it also realises that people around the globe have different working hours, and as such, that not everyone is going to be in the same place at the same time. In fact, being asynchronous is the only way of solving that problem.

There were also some interesting nuggets; for example, for video conferencing they have a couple of iPads hooked up to a few flat-screen TVs for connecting others in different regions with the main office. They also have a couple of iPods hooked up to TVs dotted around the office, such that the web browser can be remotely driven and showing a web page, which can be instantly updated from anywhere to show the same information across the globe. This also extends to the music jukebox, which has an automated DJ playing a music track, allowing those in other parts of the world to listen to the same music playing – even if they aren’t in the same office as others – to build a company culture.

A consistent approach to measurement means that anyone can see how the servers are responding under load, and if there is a requirement to drill in to details then these are only a couple of clicks away. As part of their open process, Zach highlighted the recent SSH problem and showed the spike in SSH failures and failed pushes immediately following the reset and the period immediately thereafter.

The final point of that talk is that GitHub optimises for happiness, and that although you may not be able to make all the changes in your own organisation that at least it is possible for you to try and enact some of the changes.

A great talk and visually beautiful as well. Even has some singing in it! Zach is a wonderful presenter and explains how github does what it does so well. Their approach is very unusual and alien from the one we see in the corporate world yet it works so well.

Q: What technical breakthroughs we need to unlock the cloud and make it commonly used across companies?
The companies need to start developing software that is smart enough to understand the platform it's deployed to. Currently only 20% of applications deployed in cloud understand they are running on multiple and changing instances & they can figure out when to expected/contract it's hardware requirements. Developers, especially Java ones need to learn how to write applications for the cloud.

Q: How big is the gap between reality and the hype for the cloud at the moment? How it will change in 5 years?
At the moment the gap is really small, but the more widespread it will get the bigger the gap will be. The hype will probably peak around 2017, and we'll see a backlash caused by disappointed adaptors that didn't understand how to make the concept work for them.

The focus of this talk was the various tools and technologies used to help smooth team work despite never being in the same location. Horia himself has always worked from home, and currently works with a small team distributed over 4 continents. The first thing I noted about this example was that the entire team were remote from each other. So not a case where some people are co-located with others elsewhere, and I think that has an enormous impact. Where everyone is just as isolated unless they make the effort, I think everyone is trying to make those communication tools work, and ALL communication has to be made via these tools so its easy to keep up with what you want to know….

Horia Dragomir said that his team occasionally books a house somewhere so that the whole team can just get together, even though they work during the day just like they would normally, they then make sure the rest of the time is spent being social, going for walks, going to bars, whatever just to build up social experiences with the team. He even recommended conferences, if any two of your team want to go to the same conference, send them, their working relationship will only improve for the experience….

The one interesting benefit Horia pointed out for allowing very distributed, at home working, was that you can get talent from where ever talent is. If you are in a major hub (silicon valley) then that’s fine there is lots of talent, but they have lots of choices and they tend to move around. If you are prepared to take talent and let them work from wherever they like, then you open yourself up to a much larger talent pool. I can certainly see the appeal of living where ever I like, and simply working from there, rather than commuting to a central location, and I can see how you could really make that flexibility work for you in terms of team productivity and commitment.

The session was a deep dive into how Riak Core implements availability, eventual consistency and partition tolerance, which are the three key aspects of any distributed system. Possibly one of the most technical sessions that I've attended at QCon, it was an inside look into how a distributed system works and how Riak Core solves many of the problems such systems encounter.

Not surprsingly, Riak Core is written in Erlang, which makes messaging across distributed system easy since Erlang processes communicate the same with each other the same regardless of if they are residing on the same machine or not.

I had the pleasure of listening to Ade Oshineye sharing his experiences when developing Google Buzz & Google plus and how understanding how someone is going to use your code is very important when developing a public API, you cant just expect them to know everything you know.

The talk on Netflix architecture (a heavy Eclipse shop and based on Amazon Web Services) was worth waiting for. The use of AWS means that they can scale as high as 8000 nodes running on a Sunday evening down to 4000 nodes the following Monday at 4am. It was also that they were running out of space on their single node instance and a big database didn’t have any more room for growth, but moving the code over to the cloud meant they could scale horizontally instead.

The migration was piecemeal, with moving the base services off to the AWS cloud first of all, and then higher and higher layers until all dependencies were moved over to be able to serve content from the cloud.

To manage the scale of the system (57 independent cassandra clusters at one point) they have built a number of tools (since open-sourced on GitHub) which can scale up and configure the nodes. Priam is used to configure the Cassandra nodes, handle tokens and provide backup and restore operations. They also have an number of presentations before and some of the material will be very familiar to those who have seen them before. If you haven’t, they are well worth a look.

Netflix have also started a techblog and that with the open-source drive is both a result of using open-source systems and as a drive to recruit talented developers interested in working on open-source products.

Opinions about QCon

I've been going to QCon conferences for a number of years, at least from 2008 onwards. I've spoken on a range of things, from REST through transactions to the future of Java. And throughout those years QCon has never failed to be a stimulating place with people who want to cut through the hype and get to the real problems. In short, it has always been a great place to visit as well as present at. However, it's also been a relatively small venue, which helped to foster that almost 1-on-1 interaction with people that stimulates good discussion. Until this year! I don't know what they did, but QCon London 2012 was huge and yet still manages to retain the same feel as in previous years

This has been my third QCon in London, and it’s getting bigger and better each year. Part of the attraction of QCon over other kinds of conferences is the diversity of the presentations over what might be seen in a language-specific or tool-specific conference. It’s possible to see a presentation on hard-core Java, go to a presentation on the Lambda calculus followed by an agile development process and round of the day with how a startup handles high volumes of data or adapts their working practices around the happiness of their employees. There isn’t another tech conference like it, and if you’ve been following the #qconlondon hashtag then you’ll have a feeling of what it’s like to be there too. See you next year!

To sum up, the conference was great by every measure: technical sessions, attendance, networking, Artifactory exposure, and after-show quality time. Thank you, InfoQ, for this wonderful event in London.

Takeaways

Each year I also hear cogent and thoughtful explanations of why the fix proposed last year or the year before is actually a prime reason why projects fail.

Way back when it was SOA (Service Oriented Architecture) that was sweeping away the mistakes of the past. Next SOA itself was the mistake of the past and we got REST (Representational State Transfer). This year I am hearing how RPC is making a comeback, or at least not going away, for example because it can be more efficient when you want to transfer as little data as possible across the WAN.

Another example is enterprise Java. Enterprise Java Beans and J2EE were the fix, and then the problem, for scalable distributed applications. Rod Johnson came up with Spring, the lightweight alternative. Now I am hearing how Spring has become bloated and complicated and developers are looking for lightweight alternatives.

Test-driven development (TDD) brings fantastic benefits to software development, making it possible to change and improve your code while defending against the introduction of bugs. Yesterday though Dan North observed that TDD also has a cost, in that you write much more code. It is not uncommon for projects to have more test code than code that is active in production. If you did not write that code, you could be doing other productive work in the time made available.

Agile methodologies like Scrum were devised to promote or even create communication and agility in software teams. Now every big enterprise vendor says it does Scrum and runs courses, but the result is a long way from the agile (with a small a) original concept.

This year I have heard a lot about over-optimisation, or creating code for situations that in fact never arise. This is the problem to which the solution is YAGNI (You Ain’t Gonna Need It). Since they apply across all the methodologies, I suggest that YAGNI, and its cousin DRY (Don’t Repeat Yourself), and the even older KISS (Keep It Simple Stupid) are the most enduring software methodologies.

That said, even DRY took a beating yesterday. Greg Young in his evening keynote said that rigorous DRY advocates can end up creating single blocks of code where really the procedure was only nearly the same. If your DRY functions are full of edge cases and special conditions, then maybe DRY has been taken to excess.

The other lesson I have learned from multiple QCons is that effective teams and smart developers count for much, much more than any specific tool or language or approach. There is no substitute.

The first and the most pronounced trend seen at the conference was fixation with big data.

The next big trend could be called “open source everything”. Almost all case studies involved using entirely open source technology stack.

The third trend seems to be demise of object-oriented approach and rise of functional programming. While some functional programming languages such as F#, Scala, and Erlang are gaining wider traction, it will be influence of functional languages on traditional languages that will have the most far reaching consequences.

My favourite talk was definitely the “Cloud… so much more than a tool” by Patrick Debois. Not only was it an interesting experience report on the realities of using Cloudy technology to build a highly scalable video broadcasting service, it was also the best use of LolCats I have ever seen… ever…

Dan North was a cheeky a rascal as ever, actually making the audience think! At a conference! Oh, the humanity! Colin Humphrey from UK Atlassian partner Carrenza gave an overview of the fantastic build pipeline they create for their customers, along with insight into the business drivers of using such a build pipeline with respect to IaaS, PaaS and SaaS solutions…

Another great time at QCon London and it left me with lots of new people to catch up with in the community as well as much to think about.

My takeaway message from QCon was this: never stop thinking, and never stop questioning why you're doing something - especially when somebody else tells you to do it. Good programmers follow these principles, but better programmers always understand and remember their cost.

Conclusion

The sixth annual QCon London brought together 1,090 attendees and more than 100 speakers in what was the largest ever QCon to be held in the UK. QCon's focus on practitioner-driven content is reflected in the fact that the program committee that selects the talks and speakers is itself comprised of technical practitioners from the software development community. Presentations and interviews from the event will be posted on InfoQ over the coming months.

QCon London 2012 was co-produced by InfoQ.com and Trifork – producer of the GOTO conference in Denmark. QCon will continue to run in London around March of every year. QCon also returns to Beijing and Tokyo this month and in August will be held in sunny Sao Paulo, Brazil.

Is your profile up-to-date? Please take a moment to review and update.

Email Address

Note: If updating/changing your email, a validation request will be sent

Company name:

Keep current company name

Update Company name to:

Company role:

Keep current company role

Update company role to:

Company size:

Keep current company Size

Update company size to:

Country/Zone:

Keep current country/zone

Update country/zone to:

State/Province/Region:

Keep current state/province/region

Update state/province/region to:

Subscribe to our newsletter?

Subscribe to our industry email notices?

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.