Topics

Featured in Development

Peter Alvaro talks about the reasons one should engage in language design and why many of us would (or should) do something so perverse as to design a language that no one will ever use. He shares some of the extreme and sometimes obnoxious opinions that guided his design process.

Featured in AI, ML & Data Engineering

Today on The InfoQ Podcast, Wes talks with Katharine Jarmul about privacy and fairness in machine learning algorithms. Jarul discusses what’s meant by Ethical Machine Learning and some things to consider when working towards achieving fairness. Jarmul is the co-founder at KIProtect a machine learning security and privacy firm based in Germany and is one of the three keynote speakers at QCon.ai.

Featured in Culture & Methods

Organizations struggle to scale their agility. While every organization is different, common patterns explain the major challenges that most organizations face: organizational design, trying to copy others, “one-size-fits-all” scaling, scaling in siloes, and neglecting engineering practices. This article explains why, what to do about it, and how the three leading scaling frameworks compare.

Ian Robinson and Jim Webber on Web-based Integration

Bio

Ian Robinson is a Principal Consultant with ThoughtWorks, where he specializes in the design and delivery of service-oriented and distributed systems. Dr. Jim Webber is the Global Head of Architecture for ThoughtWorks where he works with clients on delivering dependable service-oriented systems. Ian and Jim are currently co-authoring a book on Web-friendly enterprise integration.

About the conference

QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community. QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

JW: Sure. REST is like the trademark of the high priestesses of the RESTafarian Kingdom and unfortunately, we're not ordained in that church. We are using "web" because it's a bit more encompassing, it's a whole bunch of techniques that we see used out there on the big wide Internet, some of which aren't necessarily as pleasant or lovely or performant or scalable or sensible as REST, but have utility outside of that particular architectural style. We're a bit more broadly trawling the web for interesting techniques.

IR: I don't think either of us are RESTauranteurs much as I think people some people think when Jim stands up and gets very angry, but we are broader in our approach. We're just looking for some very pragmatic approaches to problems that we come across month in, month out with that kind of stuff we are doing.

IR: Yes, we use it. We use it quite a lot; or use web based approaches, some of which are more RESTful than others. It's not necessarily across the board in everything that we do, but we are introducing just very lightweight ways of working with the web with some of our clients and gradually, working our way out of that - that kind of RESTful stack, but beginning to split things up into resources and address them and then connect them and then actually start to drive applications by way of hypermedia. Yes, we are seeing an adoption in many different areas.

JW: I'd concur with that. I think in quite a reversal of fortune, if you like, from a few years ago, where REST would be laughed out of any serious austere enterprise, often now it's not at all comedic to go to a client and take a webby standpoint as you are your default. Indeed, my current clients, who are very interested in massively high performance systems would have potentially gone with traditional nterprise middleware if we haven't done some empirical experimentation and found that a simpler webby approach was actually just as well suited for their needs. Having done those empirical data points it sort of emboldens you a little - you kind of figured out that the web stuff works quite well once in one scenario, and then again in another scenario, and it embolds you to default to that when you are talking about distributed systems integration.

JW: Sure. I though Lenard Richardson has a brilliant scale, a chart of RESTfulness in decibels and of ROYs or something - I don't know! Leonard partitions it from level 0, which is basically tunneling, through to level 3, which is the hypermedia stuff, and I think that internally is my mental model. I tend not to share that mental model with people, because they get fixated then on the kind of "REST inside" sticker that they want to apply. Instead, when I'm designing systems and building systems, I'm just trying to think about fitness for purpose and often, at the moment, I'm finding that fitness for purpose tends to fall on the lower end of Leonard's scale. They are kind of web-aware, but not necessarily hypermedia- centric services.

IR: I think Leonard's model is actually a useful way of talking to clients about REST because he starts off saying "Take any problem - the simplest way to solve it is to break it down into smaller chunks". What do we do? - We just identify lots of resources and give them addresses. So, he's not necessarily talking about RESTful things in the first instance, he is just talking about how to break up a problem. Then he is saying "If we do the same thing over and over again, let's just do it in an uniform way" - that's his second level, just use those uniform methods.

Then his third thing is "If we are doing something interesting or specialized, then do it in a specialized way" - that's where he starts to talk about hypermedia. You can actually talk to clients, talk to other people, just about breaking problems down into simple chunks, doing the same kind of things in the same way over and over again and then specializing any way necessary. I think that's a nice way of talking about it. Then, you can layer on some very particular things about REST or about using the web. It's a useful way of getting that conversation.

JW: The angry man inside me - often not too deeply buried inside me - would like to tear your head from your shoulders at this point and insist that these are terribily bad ideas. However, in the real world, we are finding techniques like tunneling verbs and URIs are used in certain hopefully bounded contexts, sometimes less bounded, which makes them dangerous. Certainly, in a bounded context, what enables me to get some rapid tactical solution to market quickly, I'm willing to accept them.

It's a kind of sticking cluster approach - I confess - and we always intend to go back and redress those, but I think having the kind of architectural - I call it architectgasm - and designing the world's most brilliant RESTful hypermedia cached super thing may be not the simplest thing that could work immediately. Maybe we could take those kind of ugly steps, like tunneling, to get us rolling today and then as our system volume expands, as its requirements become more sophisticated, as it encompasses more systems, as its reach grows, then we can think about migrating that to more RESTful partterns, which are demonstrably suitable for that kind of system.

IR: We really want to exploit a large installed infrastructure. Things succeed because the web is already out there, but we also begin to accept some of the constraints that are there, as well, some additional constraints that are just there in the way in which the web has grown. Those constraints exist now, like the browser tends to accept only two verbs, GET and POST, and very often, we'll end up building solutions that just have to adhere to those constraints, whether or not we're particularly fond of tunneling stuff.

JW: That applies to some of the intermediaries - the REST architectural style is preached as if the web is this perfect utopia, where everything understands the full extent of the HTTP uniform interface and pragmatically, that's not true. There are just some actors out there on the web, which don't understand some verbs, even though HTTP suggests that they really should. That's the limiting to out thinking about, for example the curve of cache maturity that Mark Nottingham is such a fan of telling us about.

Those constraints pose real challenges at us at web scale because the web doesn't behave the way that REST describes that it should behave, so we have to take some pragmatic shortcuts to make systems work. There is value in REST, but there is more value in having working systems.

IR: It's "pick your path to adventure for interesting business processes on the web". We want to realize some goal having a couple of different things cooperating, we get to do that by serving up some HTML or some XML and the client or the consumer can begin, given a set of goals "I'm trying to achieve this thing or that", it can begin to pick its path through the server landscape picking up on links inside those representations and working its way towards the goal.

JW: That was really cute - I'm gonna steal that, we'll have to edit this to make it sound like it had been my idea. It's about the notion of servers leaving bread crums for clients to follow. It's leading the client through the business processes the servers implement. We tend to get bogged down in the kind of uniform interface and HTTP and all that stuff and really the heart of it is the server takes you by the hand and guides you gently through a business process.

JW: Experience is the main ingredient which is missing here, although the web itself is a really mature technology. I think we are only now learning how to direct its particular characteristics towards integrated systems. The web has been brilliant as a mechanism for connecting humans, particularly in recent years, when humans have taken to the web in their milions to interact with pokes and tweets and all that kind of stuff.

As distributed system engineers we still lack that level of experience for doing the same things with computers. We haven't quite figured out yet in any robust way how to extend hypermedia, for example, between systems. For me, that would be the key thing that I'd say it's lacking. I'm happy to work around quirks in the infrastructure, differences of opinion around the community, but I think really we need to just experiment with this stuff, learn how to make it sing.

IR: Things even at the level of client library being able to surface hypermedia in a relatively common or standard way. I'm thinking I get a representation back, but I just want a link query that allows me to identify all of the hypermedia and then, based on whatever it is I am trying to achieve right now, I can choose to dereference in those URIs whatever pursue that hypermedia.

IR: Whenever you want real reach for your applications, then I think REST and the web are an attractive proposition. It's a relatively low barrier to entry for anybody to be able to consume your application or work with it. Whereas if we are just working within Enterprise boundaries - whether it's a good idea or not - we are free to create our own idiom. If we are never going to have to explain that to anybody else, we could invent something from the ground up, but the moment we want to cross any of those organizational boundaries let's start looking for sufficiently sophisticated but nonetheless lowest common denominator way of working and cooperating.

JW: I can take that stage further and start looking at some of the architectural trade offs that present themselves when you're considering this use of technology and my favorite for the web is "Can you trade latency for scalability?" The web isn't a low latency system, but it's hugely scalable, particular the way you confederate load on the web. If you can afford latencies of seconds, minutes, probably about hours, days, weeks, the web is going to scale really well.

But if you can't afford high latency, then probably looking at a web inspired solution is the wrong thing and God will strike me for saying this, but some proprietary transport substrate with millisecond latencies or better may well be the thing you need. However, I've often found particularly techies will always insist that they need the millisecond transport substrate upfront without really holistically understanding the kind of business problem they're looking at, and the business problem may well call for something much more sensible, like seconds, in which case the web could be a sensible low ceremony way of achieving the same goal.

Geeks like us suffer terribly from the sin of pride because we always want the coolest, fastest, lowest latency, shiniest brass knobs-on on system and the web is really not about that. The web is like "hum-drum get on and do it". Trade latency for scalability any day of the week and if it comes out in terms of scalability with high latency, go with the web.

IR: I think there is a more general issue for distributed systems development as well. It asks us to think a little more about our tolerance for latency, for inconsistency. We've been accommodating these things for centuries. I can send a horse galloping of from one town to another with an order and some terrible things can happen in that intervening period. We've invented business protocols that can handle all of that and I think this kind of work is forcing us to look at those and to surface those protocols semantics again instead of always depending upon the low latency substrate and trying to delegate everything to the technology.

JW: That's interesting because the web as a distributed platform absolutely insists that we deal with distribution. From so many years in computing science now, we've been told abstraction is a great thing and we should abstract away all of that hard computing science stuff to the back room boffins and we should forget about it in living happy business web site land. Actually, you can't do that Waldo told us that years ago and he's been woefully ignored by the computing community, but when you decide to build a web based distributed system, the web doesn't hide that distribution from you. In fact, it gives you useful information to coordinate distributed interactions and, for example, to use the messenger-horse metaphor, to know when your horse is being robbed by a highway man at gunpoint and to take some corrective form of compensating activities. As a former transactions guy, I see the web as a big coordination platform - a kind of two phase consensus gone nuts.

IR: Just get over the fact you can't have a God's eye view of your success.

JW: Apart from apparently Sir Tim … he can see the whole web all the time. Seriously, you post a blog, he knows it - he is watching you. He is watching all of you right now through a webcam.

JW: No, next question. I don't think so. I think we're learning the kind of scales that the web works at, the classic two phase transactions aren't really suitable. Anyone that listens to Werner Vogels talk about this eventual consistency stuff, anyone who's read some of Gregor Hohpe' stuff about how Starbucks doesn't use two-phase commit, anyone that has actually applied any fleeting thought about this understands that particularly two-phase transactions can't work on the web. You trade off consistency for scalability and the web's all about scalability, potentially eventual consistency.

If it's not too much of a blatant plug for the book, chapter 12 discusses this. Actually, we do bake off - at chapter 11 now we scotch the chapters so that we can keep up with Stefan's prolific pace of writing in his equivalent German book. We actually do bake off, if you like, in fact we use WS-* techniques for things like security, transactions, reliable messaging and so on. We show the equivalent patterns and strategies that we use in a plain old webby HTTP world. We don't claim that we're RESTful, we are just saying, for example transactions you don't really need because the web gives you all of this coordination all the time.

It's kind of perverse that the web being this synchronous step-wise textbased protocol it shouldn't really work at global scale, but it does because for each interaction I have with the resource on the web, I get some metadata telling me whether or not that interaction was successful. So, I can elect to follow the adventure route - if you like -, I can elect to keep going following resources and making forward progress or in the event of a piece of metadata that suggest that my processing is failing, I can perhaps take another route through a set of linked resources and other processes where I could make alternative progress. That, for me, is a much more sensible way of dealing with undesirable outcomes, trying to wrap everything in a big hawking transaction.

IR: "You are confronted by a dwarf with an axe. What do you want to do next?" I mean, even with the WS-* protocols, I don't think we should be tempted to use them all the time to try and coordinate and involve a number of different services in some kind of transactional context. It may be that you actually want to use that behind some coarse grained boundary and some internal implementation of service, even if we are exposing it across the web in a RESTful manner that the internal implementation might depend upon some of those lower level protocols. I think that's fine. If we are prepared to tolerate the expense of locking a number of resources. We are seeking a coarse grained boundary where we don't necessarily have to do that at that level.

JW: That doesn't come for free, right? That takes explicit clever design decisions to get right because at the lowest levels, if you are using one of these legacy relational databases, you are going to have to think about these things - yes, I said it and I stick to it, too! - but you are going to have to design explicitly and be very wary about your abstraction boundaries for those kind of details don't inadvertently leak. If they leak to the web, you are screwed!

IR: Once you start giving somebody a key to your back door, they'll be in there.

JW: It is a useful capability. The notion of knowing an outcome that you want to get to and maybe some rules that will help you to get there is a fine thing. It's only when you tie it up in an inflammatory language like BPM, that it raises my hackles because that comes with a lot of baggage. We've all seen the kind of point and click ware BPM product are and we run screaming from them because they are dangerous things. The hardest point in using the web is the coordinating from the client side. If we could solve that problem, the web would be a much more amenable solution, but I completely agree that we need some kind of client-side coordination, but I don't think it should be of the same vein of the products and solutions we've seen today. Something like Prolog or a rules engine. might actually be a better way of dealing and orchestrating processes on the web.

IR: That can be an internal implementation issue for a client or for a server, whatever role they are playing at that point in time. It's not unreasonable to say in order to realize a goal, you might anticipate a few of the steps that you are going to have to go through. If your server is giving you back a representation, offers up a set of opportunities, you are applying some intelligence to that to pick your path, which does also suggest that there is an out of band mechanism as well, so that we can begin to communicate what is that you might expect to receive. It provides some reasonably standard interpretations of things such as "rel "attributes and stuff like that.

JW: That out of band intelligence could be a micro format. It probably should be because they are low ceremony and lovely.

IR: Yes, but many processes are very simple, sequential, or driven by events. It's relatively simple to implement them in the simplest fashion. It doesn't necessarily depend upon the rules engine or some work flow engine or anything like that.

JW: I confess I'm the fondest of very simple tools. I'm currently working on some rather high performance systems and it happens to be in Java, which is fine. We have, of course, several choices in Java we could get with Restlet, we could get with JAX-RS servers implementation both of which are substantially sophisticated Frameworks that take out a lot of plumbing for us. In this case we went with servlets because they were sufficient for us to get the job done in a very low ceremony way. Flip side: if you are on the .NET platform, for example, you've got the WebInvoke and the WebGet stuff from WCF that you could use or you could just use a HTTP handler.

IR: Or a HTTP listener as well, which is actually what WCF uses under hood if you are self-hosting HTTP. You can drop down to that and again, it's very simple to build things on top of that.

JW: The rather slippery answer is that you take your pick. If you are comfortable with using a highly abstracted Framework like WCF or JAX-RS, if you contain that in your business domain more readily than you're prepared to tame something like servlets, which is very HTTP request/response-centric, then it's your call. Use what makes best sense to you!

IR: One of the things I'm often looking at is how I'm going to communicate something around the application protocol and typically, I want to communicate it by way of tests. Tests are a useful piece of documentation. By application protocol I'm saying I want to be able to describe to you how you can expect my service to behave if you submit this representation to this end point, invoke this method, then you might expect to get back this kind of representation, this media type, these status codes, this HTTP headers. All of those things form part of that application protocol - we are establishing some little contract between ourselves. You see a lot of this stuff in the AtomPub spec, for example.

What I'd like to be able to do is to assert all of that in a test. One of the things I'm often looking for is can I do that without always having to spin up an instance of my service or communicate with it over the wire, so I'm often looking for very very lightweight abstraction that allows me to create expectations against all of those HTTP artifacts, without actually having to start up an instance in the service. I know you've done it with some of the mock context in Spring.
JW: With servlets and some of the Spring mocks it's actually a really nice way of not having to do the full bring up service wait 20 hours for Tomcat to come up kind of thing - very lightweight, very pragmatic.

IR: Whereas what I've done occasionally is create very thin wrappers around things such as a request or response. I can test independently that they actually do delegate to whatever runtime I'm using, but then I can basically write my tests against those or mock instances of those requests and responses.

JW: You have an informal description and then you have a bunch of Ian's fabulous consumer driven contracts.

IR: I'm thinking that very often the media type is expressing some kind of contract, is making some promises about the kind of representation you can expect to get back. The more interesting media types actually contain a lot of those more protocol-like rules as well. Again, I think at things like AtomPub that not only tell you what kind of stuff you are going to get back, but they tell you some of the methods that you can expect to be able to invoke and the status codes that you can expect to get back. There are contracts here, they are just being shifted around and I think we should be looking for media types that make very clear what is that we can expect to do, how we can expect to surface or interrogate these representations for hypermedia and how it connects us to hypermedia in order to progress an application.

JW: Yes. In a nutshell, yes. In fact, a friend and former colleague of ours - George Malamidis - once said to me "The web already has a contract language - it's called HTML." I'm still scared when I say that sentence. George is a very sophisticated thinker in these circles, but I have a tendency to believe he is right. I'm just scared to make the leap to where he is.

JW: I can't evangelize it. I think it has to be about a solution to a problem within a context. One of the systems I've been involved within the last year or so was originally penned to be based on JMS. That's great, I like JMS, it's a lovely idea, but the initial design was done without really any holistic thought to the environment in which the system was going to be deployed. JMS, lovely as it is, has its complexities. What we actually found was for the loads that we wanted to put through the system, by doing a small spike, few days worth of spiking, the HTTP was quite good enough for where we needed to be.

That had so many benefits in terms of improving our software delivery, it was a lot faster, easier to write HTTP things then it was JMS, they are easy to test with tools like Poster or curl, the delivery of that particular system was good and there is a man at the back of the room smiling about it, because he was involved with it and it was lovely and I feel that had we gone down the JMS route we would have to work so much harder to surface this system for testing particularly to our QAs. The fact that our QAs could bring in Firefox with the Poster plug in and probe the system, may be some really advanced but accessible exploratory testing and they broke us in wonderful ways that we hadn't expected because of the system surface area, which is open to them and that me smile a lot.

IR: It opens out to a larger constituency, doesn't it?

JW: Yes, so a reach thing again.

IR: Far more people having visible insight into the way in which the system is working or the way in which it exposes itself to the world. And they are seeing it in ways with which they are very familiar - they are looking at it in a browser, things like Poster and stuff like that. It's curious: we started all of this saying that, in fact, we are more interested in talking about webby, Webbery things and the REST and then we continue to talk very much about REST, and I think to evangelize REST within an organization is occasionally not the appropriate thing to do. I always get frustrated when people say "We want SOA". SOA is another one of those words that should be under erasure. We should just start talking about what it is that we are trying to do and talk about it in familiar ways because very few people aren't now familiar with the web. We can just talk about some of the simple things that we do with the web and say "Imagine if your application could also work like this."

JW: There is the danger as what happened with SOA that it becomes bound up in products such to an extent where it becomes "I can sell you an SOA" - "No, you can't" and I think we are seeing already this REST moniker being applied to software products. It really confuses the discussion because people think they can just plug in REST, they can just buy REST platform and they are suddenly RESTful. Then all they are doing is tick the "REST inside" box and they haven't really given any critical thought to why that might be useful to their business. It's just the senior IT decision makers and the vendors conclude on a decision which is not necessarily in the business's best interest and it is rarely in the best interest of the development team who are trying to service that business.

IR: It's rare to be able to insert some kind of adapter and take a WS-* application and suddenly surface it as a RESTful application and expect it to be a rich and useful RESTful application.

JW: That's a dangerous REST application because the underlying implementation isn't designed to have such a surface area or to be loaded in that way, the design to be loaded in a message centric or RPC-ish way.

IR: I think there this huge in and of itself thinking of things in terms of resources and to try to layer resources on top of something that's been designed around an entire different paradigm. You are missing an opportunity to discover something interesting about your business, about your process. Discussing in terms of resources, often surfaces the value inherent in doing something. Search results in and off themselves are useful to companies like Google. It's one of the ways in which they monetize what it is that they are doing. Surfacing a search result as a resource is a good way of thinking and talking.

JW: The primary reason why the human web doesn't support the full gamlet of HTTP verbs it's that HTML doesn't support it, so we are left with GET and POST support, which is a pretty limited vocabulary. I'm not too worried by this because to me the browser are already dead. It's the most frequent, but the least interesting agent on the web. I'm much more interested in what happens when computers interact rather than when humans point browsers at web servers and right now, that infrastructure creaks at the seams when humans push it, but it's good enough for them to facebook each other or whatever it is that kids do nowadays, so I'm really not worried about it. What actually worries me more is some of the future directions that some of the working groups in the W3C are heading towards, which is effectively trying to rewire the web. Right now, the web infrastructure as it is, has got this magic tipping point where it is globally available, it has global reach.

I'm concerned if some folks at W3C come through and for example HTML 5.0 somehow makes it out into the wild, that we got this weird paradox - half the web is the original web and half the web is this new web and it's all got web sockets and it's all very confusing and it's not all mark up language any more and that's what troubles me most. Right now, I'm looking for the browser providers to innovate - I'm comfortable with that, I'm not passionate about it, but comfortable with it. I'm looking for the W3C to nurture the web in a more evolutionary manner and I'm not looking for someone to become Sir Tim the 2nd. Unfortunately, I'm concerned that some people in W3C are looking that way - hands off!

JW: I'm not old enough to answer that question. Ian has seen several of these cycles, so he might have a proper answer.

IR: From the point of view of nostalgia-driven development, where every text begins well, wouldn't it be nice if we could do it the old way. As you were talking about simplicity and there being a drive towards simplicity, I think one of the benefits of REST evangelism - when it does take place - is not actually to insist on simplicity, but to insist on the constraints, to surface and recognize the constraints all over again. A lot of applications have been built on or around the web that abuse the web's infrastructure and the way in which it works. Good REST evangelism is surfacing and emphasizing some of those constraints and saying that if you work with or under those constraints, you will realize greater reach, better performance. That is a partial answer from me.

JW: You are right. We did put abstraction after abstraction onto our distributed system infrastructure and you know what: it hasn't worked out that well for us. Some of the largest and most sophisticated distributed systems on the planet haven't been all that large or sophisticated and then this kind of crappy protocol comes along that insists on being synchronous, and insists on being text-driven and it scales globally. That's shocking and does not make sense to us as engineers. That's the web paradox - it's the rubbishest thing on the planet, but it's scaled and for me that is what's hit the reset button because I was totally up for XML-based protocols that do all sorts of funky stuff.

I put my name to some OASIS work and some other stuff in the transactions phase - God forbid! -, but to be fair, we thought we had the best of intentions, we thought this stuff was going to be useful and it may still be useful in certain bounded context, but what the web and HTTP have shown us is that if you want to scale and reach out globally, you have to have something that's dumb. Dumb protocols are the base line through which everyone can interact and getting that interaction seems to be now what's critical in early 21st century computing. So - Yes, back to basics.