A New Way To Think About Parallel Programming

Monthly Archives: May 2013

When we think of programming a computer, we automatically think of using a programming language, such as Java or c++ or (shudder) COBOL. And it’s not surprising – programming languages were considered great advancements when they were first rolled out. Unfortunately, our program development technology hasn’t kept pace with our computer technology. Our computers are millions of times faster than they were when the first “human readable” programming languages were developed, but our ability to develop the programs isn’t much faster than it originally was.

A Short History of Programming Languages

Ada Lovelace is arguably the first computer programmer, working with Charles Babbage in the 1840’s on his Analytical Engine. There is some discussion about how closely they worked together, but in her Notes to an article about the Analytical Engine, she describes the algorithm to calculate Bernoulli numbers. They also worked/corresponded on the “Table & Diagram” that represented the punch card flow to be used for calculating Bernoulli numbers. The concept of punch cards were borrowed from the weaving industry, where the steam-powered mills required control mechanisms that worked faster than humans could.

Flash forward approximately 100 years and ENIAC (Electronic Numerical Integrator and Computer) was the first general purpose digital computer whose initial primary goal was to calculate artillery firing tables for the United States Army’s Ballistic Research Laboratory. Input to and output from the 30-ton beast was all done using punched cards. Each program card was punched with the actual numeric values for one or more machine level instructions that ENIAC was to perform; e.g., Move the Contents of Accumulator A to Accumulator B, etc.

Because the numerical codes weren’t particularly meaningful to humans, Assembly Language was created with abbreviations that represented the desired activity, such as MOV, ADD, JMP. The Assembly Language instructions were translated 1 to 1 into the appropriate numerical values for the machine language instructions to be performed.

Grace Hopper is credited with developing the first compiler for a programming language and conceptualizing the machine-independent programming language. Her contributions led to the development of COBOL (Common Business Oriented Language), one of the first modern programming languages. She is also credited with popularizing the term “debugging”. She was awarded rank of Rear Admiral in the United States Navy for her many contributions to computer science.

It’s A Generation Thing

First generation languages were typically machine specific and frequently had to be entered as binary numeric values using the switches on the front panel of the computer.

COBOL, Fortan, and ALGOL are examples of what are now referred to as Second Generation Programming languages. They were hailed as breakthroughs because each “high-level” instruction could actually represent multiple machine-level instructions and the ability to develop more sophisticated computer programs was enhanced. Second generation languages also brought more structure to programming languages.

Third generation languages, such as BASIC, Pascal, and c, added more refinements to make the languages more “programmer friendly.” Pascal was created to implement the concepts of “structured programming”, which tried to do away with evil “Go To” instructions and spaghetti code by forcing program modules to have a single entry point and a single exit point.

Fourth generation languages were all the rage in discussions in the late 80’s and 90’s, but they failed to provide any significant improvements. Probably more accurately, consensus was never reached on which language provided enough improvements to make it worth the trade-offs needed to change over to the language.

So third+ generation languages, such as c++, Java, and Modula 3 were developed, implementing object-oriented programming (OOP). OOPs programming was intended to encapsulate all of the functions and data associated with some conceptual chunk into one single object. By bolting-on object extensions onto the familiar third generation languages, the dream was that programmers wouldn’t lose too much productivity while they learned to use the new objects.

No Silver Bullet

OOPs programming has failed to significantly improve the productivity of developers or reduce the number of bugs in computer programs. Yes, the sophistication of programs was significantly increased, but so were the training prerequisites.

For example, consider the “hello world” program, traditionally the first program we wrote in the c language. It was just a few lines long, depending on how you placed the curly braces.

However, to write this simple hello world in a window necessitated using the standard Windows framework which required calling a “create new window” function with a half-dozen parameters. It’s a lot more sophisticated interface, but also a lot more complicated with more opportunities for mistakes.

The advances that we see now in program capabilities is based more on the availability of standard libraries and less on program language improvements. Our more sophisticated programs leverage the capabilities built into the libraries; our languages are essentially 30 years old and unlikely to change.

This series of blogs builds on the blogs written about brain limitations and programming language, specifically examining how programming languages fail to overcome human limitations. And until we address human limitations, we won’t be able to improve developer productivity or reduce the number of bugs in programs. This series will also keep an eye on parallel programming and why current programming languages cannot be the foundation for the efficient development of multi-threaded and/or multi-core programs.

The point of the previous post is that the use of programming languages wasn’t a planned choice or an intentional selection of the best possible option; we just sort of grew into them.

The “hot” technology in the 1700’s were the punched cards used to control Jacquard Looms in the textile mills. The sequential nature of weaving was similar enough to mechanically calculating numerical results that the cards made a good enough solution. A pre-existing technology was adopted and modified instead of methodically analyzing the requirements and defining a best solution.

Similarly, the use of numeric computer instructions led to the adoption of human-language abbreviations to stand for the numeric operations to be performed by the computer programs. This worked well enough that pre-existing human language was adopted and modified to describe the sequence of events to be performed in computer programs.

Our use of human languages for programming languages was a huge improvement initially and has generally been successful since it was first introduced. One line of programming code could generate many machine instructions, significantly multiplying the volume that a developer could write in a single day.

But the earliest languages tended to be either too cryptic or too verbose. They also had the side-effect of encouraging developers to try larger and more ambitious projects. Reaching the practical limits of the earliest languages, new and more ambitious programming languages were developed to overcome these limitations.

And as our programs grew, our ambitions grew even larger, requiring new languages with newer features grafted on. Careful work has to be done adding these new features so they don’t “step on” existing functionality of the existing language. Ideas in one language are propagated into other languages until we now have a massive sea of roiling languages.

And yet, with all the combinations and permutations and possible approaches that have been tried with the hundreds of programming languages now available, we still have the problems that we had in the beginning:

most development projects are either late or take longer to develop than was originally estimated. Some take so long that they end up being canceled

most programs spend as much time being tested and debugged as it took to originally code them

most programs require as much time for support (fixes, changes, updates, etc) during their lifetime of use as it took to originally code and debug the program

It seems likely that if changing the programming language was going to work, it would have worked by now.

Perhaps what we should have learned from programming languages is that sometimes a “good enough” solution isn’t the right long-term solution.

When developing programming languages based on human language, one aspect that isn’t discussed much is that human language is expressive but imprecise and ambiguous. Human language is fairly good at expressing the emotion or mood of a situation but it has trouble clearly and precisely describing actions.

For example, I think we’ve all had this conversation in a car. The passenger says, “Turn right at the next street.” And the driver says, “You mean this one? Or that one?” “The next one, not this one.”

We don’t notice it as much when we are talking to one another because we can reinforce our meaning with voice inflection, facial expressions and gestures and asking for clarifications. Unfortunately, we can’t do that with written statements.

How many on-line flame wars got started because an innocent thought was expressed in written language in a way that was misinterpreted by a reader of the statement? Many times after the flame war has died down, we realize that the combatants were both on the same side and agreed on everything but disagreed about the way it was said.

Developers are sometimes heard to remark that the first thing they do when they start on a project is to throw away the written specs because they’re worthless or too general. It is hoped that they’re joking, but there is a grain of truth in it. Most developers view their written specs with contempt or interpret them loosely because most of the written specs aren’t very good (imprecise, ambiguous or even just plain wrong). And yet every project of any significance or being developed for a customer has a written spec.

And the irony is that developers don’t view their own written documents (programs) with as much contempt. The simple act of successfully compiling their programs seems to wipe away all doubts that what they created could be obscuring just as many problems as the written spec that they tossed away. That their carefully crafted program could be just as inaccurate as the materials passed out at the project kickoff meeting.

Although Engineers use emails and other written documents, when they want to precisely describe a component or an assembly or a system, they don’t use language at all; they use engineering drawings or models. If a picture is worth 1,000 words, an engineering drawing is worth at least 5,000. Engineering drawings precisely describe things as large as the Golden Gate Bridge or the International Space Station all the way down to something as small as a few microns.

Not only do the drawings or models provide precise information, it does it unambiguously in a language independent way. Out on the factory floor it’s common to have people who speak completely different languages come to an understanding by bringing out the engineering drawing and pointing and gesturing at the part on the drawing and then pointing at the same spot on the part.

Can you imagine how long it would take Engineers to write out precise and unambiguous descriptions of components if they couldn’t use drawings? Not only would it take many many pages of text to describe even the simplest items, it would be very slow and error-prone, meaning a full time staff of proof-readers and editors would be required to support any engineering effort.

And even simple changes to the item would result in slow and time consuming changes to the text describing the item. Worse, the change process could easily miss correcting some text affected by the change, leading to contradictory information in the item description. It’s pretty easy to see why text descriptions aren’t used and engineering drawings are used instead.

Another interesting advantage of engineering drawing is that our eyes can jump almost instantly to the item of interest, like Random Access Memory. Written documentation is read serially, from beginning to end, just like the old tape drives. The information is there, it just takes a long time to go from the beginning of the document to the desired location.

Finally, humans are built for visual processing. A significant portion of the human brain (the occipital lobe) is dedicated to processing visual images. One of the arguments for using visual passwords is that we can pick out the right images on screen months later, even if we haven’t seen them or used them during that time. Even images that are shown to us for just fractions of a second can be identified as having been previously seen hours or days later. We wouldn’t be able to recall the image of the cat on the bicycle, but when it was shown to us later, we’d recognize it.

I’m not advocating that program development should be done with engineering drawings (at least not in this blog), but I’m pointing out how visual methods have successfully been used by other development groups for a long time to create extremely complicated systems. Maybe it’s time for computer program development to take advantage of some of the useful tools available to other developers.

It sounds silly to say this, but computer programs are full of words. Thousands of them, hundreds of thousands of lines of them, maybe millions of them. And out of that myriad of words, we have to find the ones associated with our intended work (bug-fix or modify behavior). Out of all those words, the developer/maintaner must figure out what all the relevant pieces are and how they all fit together.

The task is similar to that proposed by my literature professor who once held up a dictionary and remarked, “The greatest novels ever written are contained in this book; you just have to find the right words and put them in the right order.”

Every developer agrees that programs should be well-structured, but what does that really mean? Does it mean that every method or structure should have one entry-point and one exit-point? Does it mean that every object is normalized and factored to the nth degree? Does it mean that every object or function or method is part of a library to maximize reuse and utilization?

One thing that well-structured programs don’t contain is a Table of Contents. The word or object that you’re looking for could be anywhere in the thousands of lines of code and there are no hints to where in the body of code it might be. Instead we have to try to find the “main” module (or starting point) and see what it calls and then see what those modules call and keep recursing thru the modules until we assemble the whole program in our heads so we can understand enough of the “program logic” to figure out where the word or object should be.

A typical technical repair manual will have, for example, chapters on different components of the system with separate sections for maintenance, repair, replacement and disposal that will all be listed in the Table of Contents. A user with a minimum of training can turn directly to the appropriate pages of the book and find the information that they need.

If our programs really were well-structured, they might have the equivalent of a Table of Contents that would define the basic structure and major operations/procedures of the program. Instead of working on files, the developer would expand the “chapters” in the Table of Contents until they reached the right section and then jumped right to the code that they needed.

Consider what it would be like to read “War and Peace” in digital form, but instead of it being displayed in the published sequence, it was listed paragraph by paragraph in the order that Tolstoy wrote it. Or alphabetical order by the first word in the paragraph.

Would we ever be able to finish reading the whole book? Would any two ever read the same story? Following the paragraph on page 132, one reader might assume that the next paragraph was on page 417 while another reader might think that the next paragraph was on page 738. Without being given the correct sequence, War and Peace would be meaningless. And yet we have no similar requirements in computer programs, except that the modules and objects have to be compiled in order of dependencies.

And what about comments? Most developers believe that programs should be “well documented,” but there is no agreement on what well-documented means. Should every object have a comment describing how the object is used and what it does? Should every method in the object have a comment that describes what that method does and how it should be used? Should every block of code in a method have a comment that describes what that block does? Or should developers just add comments on the hard parts or the confusing parts? Except which part is hard or confusing differs depending on the developer and their level of experience.

And comments visually interrupt the flow of the code. A nice big helpful comment smack dab in the middle of a block of code can interrupt the thinking of the developer as they scroll down past the comment. Not to mention that comments are time or repetition sensitive; the first few times a comment is viewed it can be helpful but then it just gets in the way. It’s too bad there isn’t some way to hide or shrink a comment based on how many times the section of code has been recently viewed.

Not to mention the time it takes to write good comments and the time it takes to maintain the comments. And what about comments that aren’t updated when the code is updated? Pity the developer who reads the out-of-date comments about a section of code. Do they trust the comments and skip over the code? Or do they always distrust the comments and only trust the code? If so, does having comments provide any help to other subsequent developers?

The point of all this is that we have been overrun by the sheer volume of words needed to write programs. More words is not necessarily better. Sometimes having other tools or techniques (for example, Engineering Drawings mentioned in the last blog) can reduce the sheer volume of words AND improve our understanding of the structure and functionality of an object or a program.

For some reason, language developers seem to love to reuse words or punctuation or operators. For example, left and right parentheses are used to specify order of operation and grouping in mathematical statements, as well as marking the parameter list for functions, and also for identifying the desired resulting type for a Type Cast operation. In some languages, they are also used to specify the element in an array.

The asterisk (“*”) is used in mathematical statements to specify multiplication but is also used in c-type languages to represent pointers to a value or a function. The plus sign is used in math statements to specify addition but is also used to specify concatenation of non-numeric values, to specify an increment (“++”) operation, and to specify a value to add to a variable in a “+=” statement. And finally, in the Objective-C language, the plus sign is put at the beginning of method declaration line to specify that the method is a class method (instead of the “-” that specifies that it is an instance method).

In most of the programming languages in common use these days, Foo and foo represent different objects. But foo and foo() also represent different objects. In some languages, Foo and Foo can represent different objects, depending on which header file is imported into the current text file.

In some languages, redundant use of words seems to have been become a sought after goal, so you’ll see statements like Manager:manager, Goal:goal, Result:result. While it does prevent polluting the namespace with additional names, when reading some of these statements outloud, one starts to sound like a raving lunatic.

The net result of all these language-specific naming rules is that developers have to correctly interpret the usage of a word to correctly understand what the statement is doing. To paraphrase Shakespeare, “A rose by the exact same name could smell completely different.” All of these naming rules require interpretation which leads to slowed comprehension and unnecessary mistakes.

Special Uses

Every programming language has a list of Reserved words that can only be used is specific ways. For example, the word “for” in c-based languages can only be used as part of a looping statement, such as “for (i=0; i<10; i++)”. Accidental misuse of one of the Reserved words normally just causes a compiler error message, but it can also prevent one from being able to create natural-sounding code statements.

Every programming language has rules for properly naming variables. Most require that the variable name begin with an alphabetic character, but some will allow you to begin a variable name with one or more underscores. Some will allow only alphabetic characters in the name, most will allow alpha and numeric characters in the name, some will allow hyphens and underscores in addition to alphanumeric characters but most will not allow other punctuation in variable names, such as “@” or “%” or “(“. The list of rules goes on and on.

In c-based language the increment and decrement operations can be performed either before the variable is evaluated or after it is evaluated. Pre- or Post- is specified by placing the increment or decrement operator either to the left or to the right of the variable name.

Complexity Is the Enemy of Reliability

Reliability Engineers spend a lot of time thinking about how to reduce the complexity in the system that they are working on. Reducing the number of parts in a system is a common way to improve reliability.

Simplifying the code in a computer program is a great way to improve it’s reliability because it becomes more obvious where problems may exist. Simplifying code can also make it easier to understand what is supposed to happen in the code.

The problem with programming languages is that they can never be made absolutely simple because there are too many special cases, too many special rules, too many uses and reuses of words and punctuation and operators, to be able to write simple uncomplicated code. Instead, the rules and special uses keep piling on like interest on a credit card, making it more and more “expensive” to understand and modify a piece of code. No wonder our limited brains get bogged down when trying to translate a complicated (and sometimes not fully understood) algorithm into code.

I know that many are thinking that my criticisms just require that developers be properly trained, but that’s not really a satisfactory answer. In aircraft system design and user interface design, training is the last choice. Their goal is to make something as simple and intuitive as possible, such as showing pilots an outline of a plane and the line of the horizon to show if the plane is banking instead of giving them a positive or negative number to indicate the degree of bank. Sure, they could be trained to interpret the numbers correctly, but the limitations of the human brain guarantee that now and then they’ll miss the minus sign and bank the plane the wrong direction, sometimes with catastrophic results.

It is unlikely that programming languages will ever be able to provide a simple and intuitive way of developing computer programs, where all the developer has to think about is correctly implementing the algorithm. Instead, programming languages will likely doom us to counting parentheses, measuring indents, and trying to correctly interpret punctuation.

Did you ever find yourself nested about seven levels deep in a massive conditional and suddenly realize that you’ve been counting left-parens or right curly-braces for the past 10 minutes? And more importantly, what do all of these punctuation marks have to do with the problem that you’re trying to solve? And even more importantly, why can’t the computer figure this out!!?

I know, I know, these punctuation marks exist to explicitly describe the statements that you want the compiler to compile, bla, bla, bla. But what is the intrinsic meaning of a period? What about 3 periods in a row? Do three of them convey any more information than each one on its own? Their only meaning is in the context of other parts of the program being developed, such as objectDOTlocalVariable or the End of Line or End of Program marker. Same for semi-colons; in most c-derived languages the semi-colon exists simply to identify for the compiler the end of one statement and the start of the next statement. Can’t the compiler figure this out?

And yet for their seeming insignificance, an incorrectly placed period or semi-colon can bring a program to its figurative knees, potentially preventing the release of product, costing thousands (or even millions) of dollars and the loss of reputation and future profits. The same is true for all of the rest of the punctuation used in every programming language.

On top of their potential for causing really expensive failures, all this punctuation diverts our attention from the task that we’re trying to describe onto mundane counting tasks. As discussed earlier in the blogs about the human brain and the limits of our attention, every time we stop to deal with punctuation, we risk losing our train of thought. It is difficult enough to write code that stays on target; every time we manage the trivialities of punctuation, our brains have to dump the current coding details chunk so we can load it with the chunk that deals with punctuation. What a waste of time and mental effort.

Honestly, with all the time that you’ve spent chasing down brackets and braces and punctuation errors, don’t you think you should be able to list working as an Editor on your resume?

Using Human Languages For Development Has Failed

When we started down the path of using human languages as programming language, we inherited both the strengths and the weaknesses of human languages, meaning that we got both the potential for expressiveness as well as all the goofy punctuation marks. The choice to model programming languages after human languages served us pretty well for the first 30 years or so, but it has become increasingly problematic for the last 20 years. I believe the next big program development methodology will not be a new language but a new way of expressing or describing the desired executable program.

Up until this point, I have emphasized the failings of programming languages from two perspectives:

Physical limitations of human brain to use programming languages

Limitations of programming languages

But as we inch towards using multi-processor/multi-core systems, we need to accept how inadequate programming languages are for expressing parallel solutions. On top of all of the limitations of single-threaded programming described, programming languages have no intrinsic way of describing parallel activities. Yeah, we have added parallelism band-aids, like locks and mutexes and synchronization and so on, but they only exist because language cannot precisely express parallel activities.

I don’t mean just programming languages; I include all human languages in this statement. If you look at the long history of human story-telling, there has never been an example of parallel activities being expressed concurrently. Instead, the story-teller must use the technique made so famous in the Lord of the Rings (J.R.R. Tolkien); we get a chapter or two about the adventures of Frodo and Sam as they make their way to Mordor, then a couple of chapters about the simultaneous adventures of Merry and Pippin, then a few chapters of the concurrent struggles of Aragorn, Gimli, and Legolas as they try to rescue Merry and Pippin. Finally the storyline brings them all back together and the timeline of the narrative is once again synchronized.

It should be of no surprise that languages are not parallel-capable; our logical minds (cerebral cortex) are not parallel-capable. Our brains are exquisitely parallel, with multiple processors managing our vision and our hearing and our respiration and our heartbeat all without supervision from our logical mind. But there was no evolutionary advantage to being able to focus our attention on more than one threat or prey at a time. Even if we were able to track multiple tigers chasing us, it wouldn’t give us the ability to outrun any of the tigers. Even if we could watch 2 or 3 rabbits running from us, it wouldn’t allow us to aim two bows at the same time (shortage of resources (hands)). But being able to single-mindedly focus on making a snare allowed us to catch several rabbits.

No Programming Equivalent of Split-Screen TV

Think about it – in the past 6,000 years or so, we have never been able to use language to express parallel activities the way that a split-screen TV can. We have had some brilliant and talented people who have told stories that contained parallel activities but they never found a way of describing concurrent activities simultaneously; they always had to use interrupted sequential descriptions. Even if they’d found a way, it is doubtful that their readers would have been able to understand it.

If we want to see reductions in parallel programming development times, we need to find the programming equivalent of split-screen TV to precisely describe the concurrent activities of the parallel program. I am not suggesting that split-screen TV is the answer; most of us would probably just watch multiple sports events. But the the model of the split-screen TV, of multiple independent sequences of tasks, is a valuable one, a model that cannot be duplicated using languages, either human or programming.

We need a new way of developing parallel programs, one that uses a model that can inherently represent parallel activity, as well as makes maximum use of the strengths of the human brain and minimizes it’s limitations.

Using traditional language/programming languages to represent parallel activities is doomed to failure. If you disagree, go argue with the story tellers from the past 6,000 years.

The preceding 8 blogs addressed some of the issues that the use of programming languages force us to overcome when developing computer programs. In general, we got married early to a methodology that seemed enough like us that it would be a fruitful union, and certainly, it has born us many offspring. Despite any complications, delays and differences, we have remained faithful ever since the beginning. But perhaps we have irreconcilable differences and it is time for new relationships (it’s not you – it’s me).

Some of the information presented has been more historical than analytical, but it seemed important that we consider how computer programming got started and how that beginning started us down the path of using human-like language to describe computer programs. This series has discussed:

In thinking about the next generation of program development methodologies, I’m reminded of that old joke about the Irish farmer who was asked for directions to Dublin. “If I were you,” the farmer replied, “I wouldn’t start from here. I’d go over to Kilkenny and I’d start from there.”

In other words, if we really want to make dramatic improvements in program development efficiency, we have to break with what was done in the past and pick a completely different starting point.

I’m also reminded of one of Dr. Phil’s famous lines, “If you keep doing what you’ve been doing, you’re going to keep getting what you’ve been getting.”

Programming languages that resemble human-language have taken us a long way, but we’re not seeing any significant improvements in the quality of programs that are delivered, nor are we seeing any significant reductions in development times. It is time for something new.