Francesco Cesarini and Viktor Klang on the Reactive Manifesto

Recorded at:

Bio Francesco Cesarini is the founder and Technical Director of Erlang Solutions, has used Erlang on a daily basis for almost 15 years and is the co-author of "Practical Erlang Programming" by O’Reilly.
Viktor Klang (@viktorklang) is Director of Engineering at Typesafe, former Tech Lead for Akka, has a long background on the JVM, currently has a passion for distributed, resilient programming.

Code Mesh London is an annual conference dedicated to non-mainstream technologies. In 2013 it featured talks from over 50 inventors and experts in languages, libraries, operating systems and technologies that handle the programming and business challenges of today. Programming languages discussed ranged from technologies that have been around for a while such as Haskell, Clojure or Erlang to new languages such as Elixir, Rust, Go and Julia.

Francesco: Hi I’m Francesco, I’m the founder of Erlang Solutions, I’ve been working with Erlang full time since 1995 and before founding Erlang Solutions I worked on the R1 release of OTP as well as on some of the major Telecom systems that Ericsson developed with Erlang.

Viktor: My name is Viktor Klang, I’m the Director of Engineering of Typesafe, I started Scala programming a long time ago and was one of the first committers for Akka, probably the biggest committer over time for Akka.

Francesco: When Ericsson was working on the very first Erlang projects, very early on, they realized that they needed middleware in the Erlang Systems, needed a way to abstract all of the generic code, all of the reusable code into modules and behaviors which could be reused. And as a result, in time OTP evolved to a set of Design Patterns for processes, so they took all of the code which will be the same when you are dealing with client/servers, finite state machines, event handlers and packaged them into one module, and maybe allowing the user just to focus on the specific code, so they hid a lot of the of the issues which you have to deal with when working with concurrent programming. One of the behaviors which they package was a supervisor which allowed you to create fault tolerant supervision trees and finally packages your whole supervision tree into another behavior called an application and your system would consist of a set of Loosely coupled applications, so in other words, what OTP gives you is a way to structure your program in a tault tolerant way which by default allows you to isolate errors and just focus on the business logic and not focus on all of the generic parts in massively concurrent systems.

Viktor: I think in general once you are doing message passing systems, there are fundamental problems that you need to solve and what OTP did was to package the generic solutions to those problems and I think when Jonas Bonér started Akka he was sort of taking what really worked from Erlang and OTP and tried to make that work on the JVM. So what we found for instance is that Erlang and OTP has this notion of supervisors which essentially means that you have a way to manage failure that is easy to reason about because there is a clear separation of concern or responsibility, and when we started doing Akka, we essentially copied what Erlang was doing in that regard, but what we found was that declaring supervisors wasn’t something that people on the JVM were used to doing and since it was optional, people just didn’t do it, so adding fault tolerance after the fact is usually a quite hard thing to do because it permeates your thinking when you reason about these problems.

Francesco: You need to add fault tolerance from day one, you need to think fault tolerance when you are designing your system, is not something you can plug on, absolutely.

Viktor: So what we ended up doing was that all of our Actors which is sort of like Erlang processes, there are some slight differences but in general they are quite the same. They are hierarchical so a processthat spawns a process or an Actor that spawns an Actor is its Supervisor, so you can’t really opt out, its supervision is always there and now you need to deal with it, so I think that was one of the things that we realized that: “Here is something that we saw that could be improved as a usability thing for our users” so that was pretty powerful because now you can reason about the entire hierarchy of the application.

Francesco: You configure the supervisors in Akka or is it the same, or is the whole supervision structure the same?

Viktor: So there is a global structure, so all Actors are children of other Actors, there is a default strategy for managing failure which is a one for one strategy and you basically or the supervisor applies a rule to the failure saying that “if this failure then apply this directive” which could be like restart or stop or escalate, so now you have sort of the supervisor deals with the failure management and then the Actors themselves deal with the message processing, so I think that is a, it was a fairly good division of responsibility.

Francesco: With OTP supervisors, you need to go and actually define what type of restart strategy you want, if a supervisor is supervising ten children, you can pick one for one basically, if a child terminates you just restart that child , if one child terminates you terminate all the children which started after it and restart them and one-for-all where we terminate all the children ..., it’s just a way of describing dependencies. And then another thing which I believe Akka has taken as well is historical data over terminations to make the supervisor realize that there have been too many restarts, I have failed to restart the system properly and I’ve not isolated the error, so at that point the supervisor itself decided to terminate and send the signal up to its supervisor, and so you get a propagation of termination where you escalate the problems.

Francesco: Just at this conference it’s been very interesting, we will be learning a lot and I think a lot of the things which don’t exist in OTP and I’m thinking in terms of Reactive Extensions or Futures or Agents, things which in certain languages are constructs, in others are just part of the framework. In Erlang we’ve not really done these things because in many cases it's just a few lines of code so you just do in your program, you don’t think about it, you just code it, and one of the thoughts which I think we'd really like to do is try to go in and start to do packaging some of these concepts into applications, so that you ease the the bridge and someone coming over who wants to use an Agent, will have a ready-made Agent application, you don’t need to tell him that it’s 20-25 lines of code and is really easy to implement, you just provide a standard API which you can then start using and in doing and so you are providing then all of the benefits OTP gives you in the backend, so I think that is really one of the ideas which I liked and which we should be taking on in our community.

Francesco: So I think we are used to the whole non-mutable approach and yes, you have ETS which is a destructive, probably is the only destructive operation in Erlang, but ETS could easily be modeled as an Actor as well so it’s just there for convenience and for speed. But if you look at Erlang, in the way that the Erlang Virtual Machine is implemented, having the whole non-mutability built into the process is a luxury and it actually makes our life much, much easier, I really admire Jonas and Viktor for the hard work to create isolation among Actors on the JVM, it’s not for the faint of heart.

Viktor: I think there is this sort of trade-off so the JVM is really at good execution, it’s really fast, it has a big ecosystem of already existing libraries, there is a lot of innovation going into the machine itself, the virtual machine improvements and stuff like that. The thing is that when you have a more specific problem space or more specific thing you are working on, like processes for instance, if that is the only thing that you are doing, clearly you can optimize for that and having process local GC is a really good optimization when you have these problem space, so I don’t really know if it’s solvable in the general sense for the JVM, they had this proposal for Isolates back when but I don’t know if it’s ever going to be an alternative on the JVM, so I think in this case it’s something that we will sort to have to live with and make the best of. Clearly GC pauses are problematic in all applications that allocate a lot of memory. And even if allocations are in general cheap on the JVM, if you have a lot of processes that generates a lot of garbage then somebody is going to have to pick up that garbage. So GC tuning is also one of those sort of hard things to do, so it’s still one of those things where you need to deal with it.

Werner: It’s a problem, it's something that can’t be solved because it’s a shared resource, the memory is a shared resource, and at some point you have to go and clean up no matter how clever the algorithms get.

Viktor: I mean there are some solutions to it, like you could go off-heap and do a lot of the optimizations that Erlang does but you would have to, it’s sort of a different world and if you get an object passed to you, does that live off-heap or does that live ... there are these two worlds now, in Erlang there's just one.

Francesco: I think a lot of the complexity, a lot of the coding they need to do to get the whole model to work, in Erlang you have that complexity in the Erlang Virtual Machine and it’s highly optimized for exactly that so the Garbage Collector is on a per process basis, so you get very, very small burst which will only free the necessary memory. The only occasion you do a full sweep is to free all the memory for that particular Actor.

Werner: I think also GC can be very, very simple rather than the massive PhD-laden....

Francesco: It is incredibly simple and there is a really good paper which got written back in the very early 90’s about how the Erlang Garbage Collector works, and in fact you got single Assignment of variables and you got no shared state, means you got no recursive loops when your GC is just a tree you just traverse and either the variable has to be, the memory can be freed or cannot be freed, it’s a simple as that, so it’s not complex at all.

Francesco: I’m really interested in the way that they are actually looking at Reactive Extensions, Reactive Streams and Futures, I think those are areas where I really want to dive my head into and understand better how we can optimally implement those in Erlang, having the Actor model, having asynchronous communication I think will make it relatively easy, but I think those are the real things which I have taken home and which I’ve been interested in understanding.

Viktor: Just to be clear the manifesto is sort of a collaboration and its multiple people involved both Jonas Bonér from Typesafe was involved, Martin Odersky was involved, I think Erik Meijer, Martin Thompson and more people. I think it is a document that needs to be collaborative, that needs to be something that is sort of stewarded by the people that share the same view, and it has been improved over time as well, and the general idea behind the manifesto was that we sort of saw that were fundamental ideas or fundamental problems that we are trying to solve like Scalability, Fault Tolerance and all of these things that as ideas, they live completely outside of the any technical solutions that solves the problem and finding how we can build software that get these qualities like Resilience, Responsiveness, Scalability and they will be event driven because that it what ties these things together and communicating events, facts about things that are happened is the underpinning of essentially the real world like communication in itself. So the manifesto is taking these things and explaining how we can combine them, how we can use them and combine them to get these qualities that we want. We found that this fits very well into what we’ve done with Akka but we saw that there was tons of other things out there that had similar ideas and similar approaches, so why don’t we have a vocabulary that we can talk to each other about these things so that we are not sort of isolated islands trying to solve the same problems in similar ways for different platforms.

Francesco: When I first read the manifesto, my first reaction was “Who’s gone in and formalized everything we've been saying in the Erlang Community for the last 10-15 years”, “Who has gone in and actually described and found this terminology to describe what we’ve been doing”, which is in effect asynchronous,massively concurrent distributed programming, and it is really technology agnostic if you go in and read it and it states, well, for us who are working with the system it does sound obvious but it is formalizing it in a way which those who have not been working with these types of systems can easily grasp and understand and I think they’ve done a fantastic job at doing it.

Francesco: You need to react to all of the stimuli so it’s not just Dataflow. Reacting to inbound data is one part but you need to be able to react to load, what happens when you need to, when your load regulation kicks in, you need to be able to react to failure. If an error gets propagated through error channels you need to immediately react to it, try to isolate it and try to recover. So it’s not just, I think some people might find it confusing and mix it with the Reactive Extensions, I think the Reactive Extensions is just a very, very small part of it, it’s just reacting to an external stimulus in general which is what asynchronous programming is all about.

Werner: So why has this become interesting today, I mean obviously this has been interesting for years.

Francesco: It's not interesting today, it’s been that way for a long, long time. Those who’ve developing massively concurrent software type systems have been doing exactly and have been getting it right, have been doing exactly this.

Werner: But I guess they were mostly hidden in sort of like what Erlang did.

Francesco: If you go back to 70’s - 80’s this was Telecom Systems and you go back to the IT bubble that was web scale, it was web systems and now with big data with the Internet of things, the problem is becoming more and more relevant in a lot of other verticals not just Telecoms or Web.

Viktor: I think also with smartphones and everybody can write an app today and if you are successful you can get tens and hundreds of thousands of users in a very short time, so these problems, if you don’t make the right decisions early you are going to sit there with an application that might not cut it anymore, so I think hundreds of thousands of developers now that get exposed to these things and we need to have, we need to both communicate what the problem is and then…

Francesco: ….try to provide them with a framework in an abstract approach so that when your dating app becomes popular, they don’t need to think about the backend, the backend will automatically scale, I mean it's almost like being the unsung heroes providing the infrastructure, no one will ever thank you for it because it's not pretty and you don’t see the pixels.

Francesco: Of course, obviously, you need always to blame someone and it’s obviously not the pretty app which you’ll put the blame on.

Viktor: And I think historically also the scalable or the “scalable” things were generally quite unproductive so you had to make this huge trade off between developer productivity and having an application that will take you all the way and I don’t think that is, I think it’s completely orthogonal, we just need to have tools that are productive that can deliver scalability and fault tolerance as well.

Viktor: And you do that using a set of Design Patterns and Development Patterns and I think those Patterns are very well described in the manifesto.

Werner: So I think this is a good thing simply so that developers don’t have to relearn these things, or actually rediscover them.

Francesco: You relearn reinvent the wheel, it’s going back to OTP, why did they make OTP, it took all of the generic code which could be reused from one system to another and put it in libraries and that is exactly what Akka is about, and this is exactly what OTP is about. And if you get people to start thinking in terms of Akka or in terms of OTP, they will and if they do it right they will get the Scalability out of the box.

Werner: That is interesting to see that this has now become such an important topic because the Web scale suddenly the world is your customer without the luxury of pausing, this is a concern for everyone. In a way it’s like the industry having to grow up because realtime behavior was always the desiredbehavior, if my app pauses I hate it, but if the Web server stops I lose money right away.

Viktor: There are no opening hours on apps.

Werner: My bank has opening hours.

Viktor: I still find it embarrassing when I try to log on to my bank and I won’t mention what my bank it is, on my internet banking and I get a little message telling me: “We are currently down for maintenance, will be online again in 4 hours”. What upgrade takes 4 hours? You should be able to do live upgrades without taking down the system and even if you need to take down the system 4 hours, it shouldn’t be that way, systems are up 24/7 and there has been technology to make sure they are up 24/7 for a long, long time.

Werner: My bank website has actually opening hours so after 08.00 PM it's: “Sorry, come back tomorrow”. What are you doing, you have my money, so I won’t name the bank either but look at the reactive website and get job done.
I think the audience should look at the Reactive Manifesto [Editor's note: and here the link: http://www.reactivemanifesto.org/ ], should look probably at everything OTP has done, all Akka has done, what you figured out, essentially you have two worlds, you have Erlang and you have Java world and look at the Reactive Manifesto, so I think guys and girls you have homework and thank you Francesco and Viktor!

Tell us what you think

Thanks for the discussion. The 'reactive' description very much aligns with those systems I'm aware of that were built to be fast, scalable and robust based on message passing (MCS) and those built for transaction processing (TP). For the curious, I can point folk to reactive operating systems: RTOS (QNX), RSX (DEC), CTOS (Convergent) and AmigaDOS (Amiga), that largely work(-ed) in that way.

Given the performance benefits and in my view, the simplicity of each bit of 'productive code' in these systems, we are left to consider why their use went out of fashion except for specialist niches? My first thought is that a different approach is required and decision makers (especially in stressful contexts) rely on what's familiar before testing what's different. So push the reactive patterns until they become the new normal.