Ralph Johnson on Parallel Programming Patterns

Recorded at:

Bio Ralph Johnson is one of the four authors of the Design Patterns book. He is currently involved with the CS Department at the University of Illinois and the leader of UIUC Patterns/Software Architecture Group. He is mostly interested in OOP, especially frameworks, patterns, business objects, and Smalltalk, and researches refactoring being involved in creating the Smalltalk Refactoring Browser.

Starting in 1986, OOPSLA Conference has proven to be the cradle of many techniques and methodologies that have become mainstream over the years: OOP, Patterns, AOP, XP, Unit Testing, UML, Wiki, and Refactoring. Gaining its prestige with 3 academic tracks, OOPSLA Conference has managed to attract researchers, educators and developers every year. The event is sponsored by ACM.

I'll tell you a little bit about the group of people working on it. I suppose the motivation is multi-cores are upon us and it seems like maybe not everybody, but lots of us are going to be forced to do parallel programming and it's different, it's hard, it's not really new and there have been some people doing parallel programming for a long time and it's new to most of us. That was part of the recipe of Design Patterns. Object oriented programming was new to most people and people wanted to learn it and so the book turned out to be really helpful to that whole generation coming up to speed with objects. Can we do the same thing for parallel programming?

There are different people - Tim Mattson is one of the authors of the book called The Patterns for Parallel Programming, which had some of the basic patterns, and Kurt Keutzer from Berkley and I are working on this. There are actually a lot of other people who've been gathering people to work on it. There is a website with a bunch of different patterns. These are the patterns that you need to know to do parallel programming. One of the things that makes that tough is that there isn't one thing - Parallel Programming; there is always different models. Are you doing it on the GPU? Are you doing it on your multi-core? A lot of the parallel programming of the past has been on super computers and these people have thousands of processors.

The way you program a program with thousands of processors is different. Right now people have 4, maybe 8 and we're going to be having 16 and 32, but it's going to a little while before we have 1,000 processors on our desktop. That's not something that we really have to worry about right now. There is always different languages and some of them are based on shared memory, some of them are based on message passing. Those are very different ways doing things. Some of the best type of languages, in terms of easiness to program are ones where you have parallel data structures. When you write your program, it looks like it's sequential, but when you perform an operation it's going to happen in parallel over a large data structure.

Those are actually easier to program. It's easier for people to learn how to do that. Again, that's only one style of programming and every style has its strengths and weaknesses, some problems work better on one than work on another. We all know, with GPUs you get incredible speed ups. If your problem fits a GPU well, they never tell you about the problems they spent 6 month trying to get it to run on a GPU and it was slower than running this single core on the CPU because it just couldn't get it to work. Different problems are good for different architectures and that's part of what makes it complicated. We are getting a lot more patterns than we did with design patterns and that's one of the things I worry - that's going to be overwhelming to people.

One of the reasons is that with object oriented programming, the algorithm stayed the same. We rearranged our code, it changed the way we structured our program, but the actual algorithms we use were just the same as they always were. But that's not true with parallel programming! You actually get different algorithms, sometimes you use the same algorithm, some algorithms were parallelized, but other algorithms just don't parallelize. You go to an algorithm that on the surface is a little bit worse, but you can parallelize it whereas you can't parallelize the one you would prefer to use. That's one of the things that makes it complicated.

We've been putting the patterns in different categories. One category is the big parallel algorithm. You can have parallelism by just taking your data and doing some big operation on it all at once - so data parallelism. Sometimes you have to break it up into regions. It's geometric parallelism where you break it up into regions and you can do work on this region and parallel with that region, parallel with this so you can do something. But the problem is you come to the edge and now you have to communicate. This region has to communicate with that region. Those algorithms, the way they work, you do stuff on the interior on a single processor and it gets to the edge and it has to swap data with.

There is a trick that you keep a copy of the data belonging to your neighbor and he keeps a copy of your data and when you want to change it you have to swap it. Sometimes you get trees, you just put stuff up in trees and usually at the beginning there is only one processor, but then it goes down and you get 2, then you get 4, then you get 8 and before long you have more parallelism than you actually have processors so you have to limit how many processes that you get. That's more "Divide and conquer" type of parallelism. These are really looking at your algorithm and your data not caring at all about the machines you have.

That's one kind of pattern, ones that are pretty agnostic of your hardware. But there are other patterns that are much more so. If you are doing message passing, for example, one of the things you can do is to make sure that you are not wasting time waiting for messages, you look to see when you are going to need this data then you are going to try to send it as soon as possible. So, you look to your program and you find places where you have to send and you move that up to the front and you look for places where you have to wait for data and you move that as far down as you can. If it's on the message passing system, then that's a silly pattern. You don't need to worry about it at all. In the shared memory, people are worrying about locking and again, you don't worry about that in message passing system.

You get different styles, and some patterns are very definitely particular to a platform and other patterns are not. Then, there are patterns in the middle, where you say "This pattern could be done in anything, but it seems to work better on one than it does on the other." Ideally, everything would be architecture independent. We all know you want to do stuff platform independent, but it's much harder because the platforms change so much. The way you design something for a multi-core is not at all the way it would be for a GPU and that's a big issue. How can we write our systems that they are more independent of the architecture? Usually, the best thing we have now is that you get libraries, you get service standard libraries.

If you're doing linear algebra type of stuff, you get very good libraries, so you use that library and there is a different version of the library for every platform, so you can write your program pretty independent and you just link up with whatever library for your platform and it makes your program more portable. There aren't very many areas that we have good libraries like that. When there are good libraries, you should be using it, but unfortunately, there is lots of areas where we wish we had them and don't.

Whenever you do something, you always hope it will have an impact and so often you do something, you know it's good but nobody else seems to think it's good, doesn't have an impact. This had a very immediate impact. The obvious impact was just lot of people reading it, a lot of people saying this was good stuff, but it actually took maybe a couple of years before you really started seeing practice, but it could make a pretty fast impact on a group when you do have an experienced person in the group who would bring that book in, tell everybody "You guys read this book!" and "I was telling you last week that we ought to do something in a certain way - that's the Observer pattern! That's what I was trying to tell you! Just read here and you'll see!"

When you had somebody who was more experienced, who could use the book, you actually can get the whole team, in weeks even, changing a bit how they were doing things, but over time, then it became a vocabulary. I would say maybe in 2 years after it came out, the conversations at UPSLA were different, because as people were talking to each other about their designs, they would use the pattern names and in the patterns, the experts knew those patterns already and it wasn't like the experts learned something new from the patterns, but now they had standard vocabulary. Even before, two guys would know the same patterns, but they didn't have standard names for them and they just sort of talk past each other.

Over time, as it's become more of a standard, I find it pretty easy to go in and read some big library, I can look at things and I can see what the patterns are and the people who design the library have read the book and often they'll put comments in that tell you what the patterns are. So, if you are wondering "Is that really a mediator or not?", they'll say something like that. It just helps with communication. I also think this was unforeseen, we didn't realize it was going to do this. I think it has helped as when you start practicing the patterns. The patterns are all based on good object oriented design principles. As you are practicing the patterns, you start to intuit the principles.

One of the things I wish we did in our book, was talk about the principles more. The first chapter talks about a couple of principles, but there is more that we could have talked about. The Heads First Design Patterns is actually a pretty good job of explaining some of the principles behind the patterns. What I think is cool, is that people learn the principles by practicing. You can tell people principles, but in fact people have learned a lot of principles by practicing and so it's had an effect on maturing the industry. I bet half the books were sold and never read and an awful lot of people have not read the books. It's got to be realistic about what impact anything you do really has, but it has had a pretty big impact. If we can do even a 10th as much for the parallel programming, then we'll be happy.

Parallel patterns can definitely help that. When we are going to do a sequential program, there is an imaginary finger on the program and we are following down. That just doesn't work - when it's parallel, it doesn't work. It depends on the type of parallelism that you've got. If you've got data parallelism, you can still do that, you can still follow along. It's hard to tell exactly how long the program takes, because there is one step and now we are going to add up a million numbers. How long does it take to add up a million numbers.

You got some good parallel implementation of it, it depends on how many processes you have, but something might just be a real sequential part of the program and other parallel and depends on how big your data is how long it will take. The correctness - what's the meaning? That will be the same when you are dealing with this data parallelism, you pick a different approach and now you have to have different parts of the program communicate with each other, you are synchronizing in some way. The more synchronization you have, the more likely you are to do something wrong and you have to worry about what order things are happening.

That's an important part of thinking as you really want to be able to reason about your program in a time independent way. You don't want to say "If this thing happens first, and then this thing happens, and this thing happens ..." When you are thinking like that, it makes the programs really hard to understand. If you can pick a pattern, that doesn't require that, where the pattern will let you just break things up into pieces and solve them independently and then they get merged afterwards, then you are much less likely to have problems with it. You can't always use it. You have a problem and maybe now the patterns work and then you have to go and do something else and what you are going to have to do will most likely be harder, but when you can fit what you are doing into one of these standard patterns, then it makes it a lot easier.

In general with patterns, if you can make something in a library and use it, then you don't need to call them patterns. The patterns are the things that we aren't able to put into a library. What are we going to do? Are we going to give up? If we can't make a library are we just going to say "I'm sorry"? There is still reuse, there are still things, intellectual ideas and knowledge that an expert has, even if we can't encode it in a library. We clearly put in something in a library - you can have performance issues perhaps, but other than that, you are putting stuff in the libraries and is great, you can use it without having to know all the details, but life isn't like that for everything.

We can't do that for everything. With the linear algebra we actually have some pretty good libraries and it's important to know what things are in libraries so that you can know that you should use them. The patterns are really there for things you can't just look up and reuse. What that means is maybe a few years from now we'll be smarter, we'll figure out how to put something in libraries that we can't figure out how to do now. Something that now we need to think of as a pattern, we won't have to do that anymore, we can just use it in a library. We always do the best we can with what we know at the moment.

Identifying and by creating I think you mean writing them down - I had these patterns become patterns and there is probably some very smart programmer somewhere who did it for the first time but we usually don't know who they are and programming you see certain ideas, then, when you go off and you write your own program, you remember that old program you did things a certain way and you do it in a way like that. That's how these patterns get propagated without any book or without anything.
You study your program, you work on a program and you learn things from it and then you use the things you've learned the next time you go write something. When we write it down, then we can be more systematic about it.

It's not just that we can have some standard names, but actually it makes it easier to learn. If you decide you want to learn, you pick your pattern and someone says "Well, this guy did it in this program. Go off and read that program!", then you kind of figure out "Where is the pattern and what's not the pattern?" It's one example. But, if someone says "Here is a paper that describes the pattern", then, they will first of all tell you exactly what it is.

You'll probably have 3 examples, instead of just one example and so, if one of them is too hard for you, then maybe you'll figure it out from the other ones. There are advantages of writing it down, but how do you identify a pattern? I don't really know. You need experience. You look at lots of things and you say "I saw that before! Wasn't that just the same in that other program? Over here it's done!" But when you try to write it down, you have to do more than just say "Here is this thing I saw." You have to explain when do you use it.

That's what people want to know. What's the purpose of this pattern? Why would I use it? You have to figure out why the guy did it and why didn't they do it this other way instead? What conditions is this the right way? Once you get 10 patterns, then the question people always ask is "Which one should I use?" Now you have to explain that you could use either that one or this one and then ask yourself these questions and these answers, that one is probably the best, with the other types of answers, this other is probably the best. As time goes on, you can put more and more information about a pattern in the description of it, but it is no cut dry method for doing it.

I really don't know about finding patterns other than experience. With the design patterns, I had been object oriented programming exclusively for almost 10 years - it wasn't quite 10 years, maybe 8 years. Whereas for the parallel programming, just a few years ago I started doing parallel programming and so I'm not nearly as much of an expert on this. I will rely a lot more on finding experts and talking to them about what they do and we've been studying lots of programs and reverse engineering them and trying to see what type of patterns are in. Every so often we get somebody with a lot of experience in parallel programming to join in and that's always great, but the parallel programming world is pretty fragmented.

People work on particular type of applications, they work on particular type of hardware. Even someone with 10 years of experience might have a fairly limited view of the patterns. With multi-cores, nobody has 10 years of experience. They are all new! So, people are going to have a lot of parallel programming experience, but here is this new hardware. Even people at Intel have barely started programming them. How do we know that the patterns for these other machines are going to work well on these machines? We don't know. You don't know until you try them. We look with this work done several other kinds of machines, odds are good it will work here, you try it out, you discover if it does or not. That's the best we can do.

That's what people tend to do. The people do have Design Patterns, too. You got some software and it's too hard to change, so what are you going to do? We are using it in C++, but it wasn't very object oriented code. How can we make it more object oriented? People read it, look for the book, they find some things that look reasonable, they try them out and basically taking patterns and bolting them on. They don't really tend to get great results that way and people make jokes about it, but what else is someone supposed to do?

One of the things that I have learned and the first time it was crystallized in my mind was I read a book - an English book. An English professor recommended it to me and it's called How to Read a Book. It's apparently a classic among English professors. It actually has chapters on reading a history book, reading a math book, reading different types of books because the way you analyze them is a little bit different, but in the beginning is a more general on reading book. They say "In general, there are 2 types of books. There is theory books and there is practical books. You can learn theory from a book.

You cannot learn anything practical from a book. You learn practical things by practicing." Practical books are great, you read them, they tell you some ideas, but they tell you ideas of things to practice and you have to go off and practice it. It's only after you practice that you actually know. Then you'll say "Oh, now I know what that guy meant when he said that!" Because when you are reading it all seemed a little theoretical, but after you practice it, that made me feel a lot better, because people buy this wonderful book we wrote and people read and then they go off and do this stupid stuff and we didn't intend for them to do that in the book.

That's how you learn. You only learn that stuff by doing. It's like any program - you can't actually learn to program by reading a book. Of course you read the book, but you have to get to the system and you write little programs and you write bigger and bigger programs and after a while you know how to program in the language, but you've got to do it to learn. You are not really going to learn by just reading the book. Patterns are the same way. Patterns are extremely practical. You don't learn patterns by reading the pattern book.

Sometimes, when you read the patterns book, you say "Aha! I know that pattern already!" and other times you'll say "I think I need that pattern. That's what I ought to use!" and then you go off and you use it. Maybe you're lucky and that really was a pattern that you wanted to use and after a couple of attempts you get it and you're very happy. Odds are also reasonable that you will work on it for a week and say "No, that wasn't the pattern I should have used" and you are now wiser. You can't really learn it until after you do it. That's the way it was with design patterns and I'm sure that's the way it's going to be with the parallel programming patterns.

They are not magic, they are just what people do. It's a received wisdom of people who've gone before us, but even so, one thing is I'm getting older. When I was young I was a sort of person who listened to older people. Some young people don't, but I was a young person who did, but even so, they would tell me a lot of things. I would remember it, I wasn't like I disbelieved them, but it just a little bit hard to believe and I wasn't sure. Now, I'm over 50 and a lot of stuff makes an awful lot more sense to me and I say "Those old guys were right when telling me things", but you can't until you live it, until you go through it yourself, you really can't know it for sure.

When you get a pattern book, you need to be bold. You need to just try the stuff out and you need to expect you are going to make mistakes, too, so don't try it out when someone's life is on the line. You try it out in the privacy of your own home, where you are not going to damage anybody too much. I think that's actually a problem with industry - companies don't let people just experiment and play around.

Everything is "This is production code, we're going live next week and here is a book, read that and figure out how to put it in there." It doesn't matter what the book is on - whether it's going to be a database, or parallel programming, you have to do it 3 or 4 times until you really get comfortable with it and people are always putting this code out there that they just barely figured out how to make it run. That's not smart. Managers should give people more time to get it right before they push it out to the world. Some do, but probably yours doesn't, if yours is like most of them.

This guy seems pretty awesome, to experience a person through a book is different from seeing him in conversation. Considering how enjoyable both the book and this interview with him is I deem him to be quite awesome. I really hope the Parallel Design Patterns book will capture as much reuse among the different categories of architectural concurrency methods as well as the original book captures Object Oriented opportunity to apply into practice.I for one am looking forward to seeing into other areas of concurrency this future book might encompass other than what I have practiced in my category of concurrency as he mentioned.

I liked the interview. I think Ralph Johnson is being down to earth, realistic about what he knows and he doesn't.The multicore programming is very new.I just got Intel's Parallel Studio for Visual Studio last week to check out Intel's C++ Concurrent library.A good analogy is to thinks about "patterns" as "software karate moves".(en.wikipedia.org/wiki/Karate_kata)Ever tried to be a martial art expert without sparing or being bruised in a real fight?.

It's not a right or wrong moves but some are more approriate for certain situations.Like what Ralph Johnson said, knowing patterns by trying and practicing it.I look forward to the book!.