A New Way To Think About Parallel Programming

Monthly Archives: August 2013

The top-of-the-line databases are all executing highly optimized code designed to be run on the highest-performing multiple CPUs/multi-core systems. The basic architecture of database software is all similar to operating systems, with protected sections of code that handle multi-core functionality, with lots of consideration and thoughts about how to produce the desired results, data caching, data locality, preventing data staleness, and so on.

So how could any parallel database system compete with the perfection of the top-of-the-line databases? By thinking outside the box and embracing the coming kilo-core future.

In kilo-core systems, it makes sense that each of those kilo-cores would have some local RAM or extended cache. This core-specific memory would prevent huge system delays caused by all of those kilo-cores trying to access the same shared system RAM.

So imagine each core in a 1,000 core system loaded some portion of a database into its local RAM. If each core had 1 MB available local RAM, a 1GB database could all be loaded in high-speed memory at all times, providing a significant speed improvement compared to reading data off a hard disk.

Additional performance improvements would come if each core was responsible for indexing only the data held in its memory and not some shared index. The size of the indexes that each core would have to search would be significantly smaller, producing much faster searches.

Now consider that a 1GB database broken into 1,000 pieces yields data volumes on each core that are small enough that all of the data could be indexed. Imagine retrieving data from any field at indexed speeds. Suddenly all of those off-index queries, such as searching for a name in a comment field, would be completed just as quickly as indexed queries.

Now consider what happens when submitting a SQL statement in a Avian parallel database; the SQL-interpreter bird sings (broadcasts) the SQL request to all 1,000 cores which causes all 1,000 cores to start searching their local RAM for the data that matches the SQL statement. Instead of multiple threads competing for limited system resources, all of the resources are searching simultaneously for the requested data. Instead of preprocessors optimizing requests and scheduling disk reads for optimum results, the data are just searched simultaneously and the matching values are returned.

And perhaps most importantly, the data held in the cores wouldn’t have to be homogeneous. Instead of requiring all “Name” data to be in a Name table that has one exact structure that is defined in the database, the names could come from any name resource, including database tables, XML files, JSON lists, etc. Each core would only have to know how to search it’s own data. All the problems with field sizes and data consistency would all go away.

Now imagine the capabilities of a Mega-core system running a parallel database. Even with modest local RAM, we’re talking a terabyte database that could return results on any field in relatively short times. The mind boggles at the potential.

The ideas for an Avian Parallel Database presented here illustrate how using massively parallel systems should produce significantly improved performance compared to traditional databases whenever more than 1 piece of data is required. Retrieving the name of the customer with ID = 12345 would not be faster, but retrieving the names of all customers who meet certain criteria should be significantly faster because thousands (or eventually millions) of data sources could be searched simultaneously.

One interesting potential of Avian Computing is its loosely decoupled nature. On occasion, I’ve compared an Avian program to a flock of birds landing in a field and looking for food and when they’ve found all their food (the program ends), they all fly away. But do all of them have to fly away?

What if the really basic birds didn’t fly away? What if the birds that handled keystrokes and serial ports and network communications and all the really low-level stuff remained in place? What if all of the birds that handled all the basic system behaviors remained active in memory? Suddenly, the single-threaded and highly protected kernel would disappear and all of its capabilities/functionality could be handled in parallel.

If you imagine a system with 10,000 cores, how would you process everything thru a single-threaded kernel? Or thru a tightly controlled multi-threaded kernel? Certainly they couldn’t all stop and wait for the kernel processes to complete before they continue. If we are going to derive any benefits from moving to kilo-core systems, we will need different operating system architectures to get those benefits.

What if the birds (cores) that handled system events all worked thru a standardized process, like the TupleTree(s)? All of these activities could work simply and asynchronously, without a lot of planning or coordination. It would provide a structure that enables highly parallel processing of operating system activities.

And if the “operating system” birds were auto-adjusting, like they are in Avian Computing, whenever a particular OS service needed additional processing power, the bird(s) responsible for that OS service could clone itself as many times as required to meet the need or hatch the appropriate kinds of birds.

Instead of compiling a single kernel that should handle most situations, a flock of OS birds would “fly down and land” into the computing hardware and grown/shrink to match the usage encountered for that individual system. Adding new features and capabilities to a system would be a relatively simple process of releasing a new bird with no need to install new drivers and rebooting or recompiling the kernel and then rebooting.

User-level software would have no need to know the details of how keystrokes or ethernet inputs were gathered; they would simply attempt to get the info they need from the TupleTree(s). The providing layers would look in the TupleTree(s) for the food that they need, and when they eat it, they’d store it back in the TupleTree(s) for the user-level software. The providing layers would eat food that was put in the TupleTree(s) by the operating system birds. It would be consistent from the lowest levels to the highest levels.

Security measures would need to be developed so individual systems aren’t compromised and infected with viruses and unwanted processes, but these could be developed in advance of releasing an Avian Operating System. And any updates to the kernel could be handled dynamically so there would never be a need to reboot.

One of the reasons why I am personally optimistic about Avian Computing is because it’s potential extends far beyond what can be realized in a single parallel program. Avian Computing extends parallel computing so it could include operating system functionality, so the same ideas and concepts would describe not just individual programs but all capabilities within a system. An Avian Computing Operating System would effectively be a “fractal system”, where the same behavior at the lowest level would be repeated at each higher level. It would be consistent from top to bottom, completely parallel and completely asynchronous.

It is chaos. It is unstructured and unpredictable. It behaves like real life. What’s not to love.

That’s enough. What does it look like? What shape and color is it? What does it sound like? How does it feel – is it smooth or rough? How does it taste – is it sweet or sour? What does it smell like?

Isn’t it funny that when we try to visualize a parallel program (or any kind of program), we get sort of a big blank, or maybe a flow-chart-ish kind of image. The five senses that we normally use to interact with the world around us are completely unavailable to us when we try to think about (parallel) programs. Programs are implemented in logic without any direct connections to our everyday world (unless you consider textual representations of auditory sensations a direct connection).

As developers, we have worked ourselves into a corner where the only tool we have available to us is “logic”, meaning that we have abandoned using all of our five natural human senses. Effectively, we have decided to sculpt the statue of David using only 2 fingers. Yes, logic may be the capstone and crowning achievement of human intellect, but that doesn’t mean that pure logic is necessarily the best tool to use for every task. Try using your cutting wit or incisive logic on a chunk of marble.

When I try to visualize a program (parallel or otherwise), I either end up with something that looks like a block diagram or something that looks like a spiderweb with silken strands stretching between vague amorphous blobs of functionality. And the only smell I get is that rotten smell of bad code. If I try to zoom in on my mental image, I generally get an image of text, of the code that actually implements the functionality of the program. And once I start to visualize code, I lose the rest of the program image. Logic is single-threaded and can only focus on one thing at a time. If you’re not convinced, check the rabbit/duck or candlestick/faces images in this blog.

Now visualize the Mona Lisa painting. If you mentally zoom in on the winding river or the tree-lined horizon, it’s not hard at all to back up and visualize the whole painting, to see the enigmatic half-smile of the lady. If you focus on her eyebrows, it’s not hard to zoom out and see the complete image and her quiet smile.

Now let’s imagine we’re playing Pictionary and you get the card to draw UDP – how do you draw Service Ports or Packet Structure or Checksum Calculation? And then your opponent gets the card to draw a kitten playing with a ball of yarn. Who has the easier task?

One of the goals of the Avian Computing Project is to make it easier to visualize what is happening inside parallel programs by modeling them as things that we can easily visualize. Things that we can visualize or in other ways imagine are easier to think about and talk about.

In Avian Computing, we imagine flocks of birds flying around together but think about each bird independently and asynchronously performing separate actions. Instead of thinking about mutexes and lines of code and locks and blocks, Avian Computing allows us to think and visualize the work we want each individual thread to perform. Since all of the birds automatically behave the same way and follow the same life cycle, developers can focus on the unique actions that each bird needs to perform.

For example, in an ETL (Extract – Translate – Load) program, some birds would ONLY know how to extract data from the DB, others would ONLY know how to transform the data, and others would ONLY know how to load the transformed data into the new DB. Allocating the ETL logic between three birds makes it easier to visualize, easier to code, and easier to debug.

Remember those old hierarchical databases based on 80-column records? Remember the convoluted logic we’d need to use to get the header record and then combine that info with each of the detail records so we could create complete records that could be written into one or more relational tables? Hard enough in single threaded logic, but trying to code that into a standard parallel program can be a daunting task.

But in bird logic, one type of bird knows how to get one header and all of its detail records. When it has that, it puts that info into the tree as Food A. Another type of bird only knows to eat Food A, combine the header info with the detail info, and write each completed record into the tree as Food B. Another type of bird only knows how to eat Food B and convert the info into Foods C & D and put them back into the tree. Another bird eats only Food C and puts them into the Cust_Name table. Another bird eats only Food D and puts them into Cust_Credit table.

Using only this common vocabulary of activities and events (Avian lifecycle), it becomes fairly simple to visualize and describe the major functions of a parallel program. We can mostly think about what we want the birds to eat and what we want them to do while they are digesting (processing) their food; we no longer have to manage structural requirements, such as locking or synchronizing or variable scope, etc.

Instead of thinking about code and how we’re going to prevent data corruption and deadlocks, we can visualize what we want individual birds to actually do on one piece of food, one individual object that each individual bird has absolute control over. And when they’re done with their piece of food, they put it back into the TupleTree for a different bird to eat, digest, and process.

Sure, there’s lots of other stuff going on, but the only locking in the Avian environment happens when food is eaten (removed from the TupleTree) or when food is stored (inserted into the tree). And all of that locking is invisible to the developer, so the only thing they need to worry about is what should be done to each chunk of food.

You could call it the 5-year old test. If you can explain what your program does to a 5-year old and have them understand it, then you’ll be able visualize it with sufficient clarity that the actual implementation of the program in code should be relatively straightforward. And certainly quicker to get started coding, quicker to get functional, and quicker to get into testing and production.

Programming languages may be good at implementing a computer program, but they are a lousy way to think about the intended program. Instead of thinking about what we want the program to do, we end up thinking about the features that are available in the selected programming language. And then we develop the program structure based on the features available in that language instead of what we want the program to do.

For example, if we want to build a program that will find the square root of the volume of Kim Kardashian’s butt (don’t ask why – just go with it), we first have to decide if we’re building a command line program or a GUI program. And that basic decision shapes our thinking about how we will build the program and the language that we use to implement the program. Very little of the code from a command line version of the program could be used in a GUI version of the program.

And then when we talk about how we are calculating that value, it is almost impossible to explain what we are doing without describing the code we are using to implement the calculations. What we are doing becomes all mixed up with how we are doing it. For example, if our chosen language includes a “calcButtCheekVolume” function then our code will be relatively simple; without that function, we have to calculate the volume ourselves based on butt cheek curvature and cheek width, etc, making it much harder to code. All because of one single function. It would tempt one to use an inappropriate or obsolete language just to gain access to that one function.

The Mac programming world has jumped onto the Model-View-Controller (MVC) paradigm, where the Model and the View do not directly interact but instead process all updates thru the Controller. This separation allows the programs to cleanly separate display requirements from implementation requirements, allowing developers to modify the back end processes without affecting the front end (display).

So back to Kim Kardashian’s butt, using MVC, we would find a way to Model the volume of a butt cheek and then add a way to display (View) the volume (graph, chart, photo-morphing, etc), and then the Controller code would make adjustments to the model, depending on if she is standing or sitting, gaining weight or losing weight, etc.

Unfortunately, we programmers are still functioning at a primitive level, where the program structure is still based on the language that was selected to implement the program. We have no way to deeply describe the functionality of a program without also including implementation details such as the language, the library, and how the user will interact with the program.

Programming languages need the equivalent of MVC so it would be possible to cleanly separate what we want a program to do from how we want it to do it and implement it. For example, the Model would be the data that the program uses and the View would be the user interface of the generated program, and the Controller would respond to changes in the Model or manipulate the Model as required, updating the View as required.

Developers would then be able to calculate the square root of the volume of Kim’s butt without worrying about which language (and library) was being used to implement the solution or how that solution would be displayed to the user (CLI vs GUI). Which means Fortran could be used to calculate the volume and c could be used to calculate the changes in the shape if she is jumping and Java could be used to calculate the shape if she is sitting, etc. The compiler(s) and MVC-like Programmer’s Interface would manage the code and generate the desired output of the eventual program without requiring language and structural concessions on the part of developers.

Blindly adopting the MVC model is not the answer because programming languages solve many problems beyond the MVC model. The point here is that our thinking about developing programming solutions has remained mired in a 1950’s mentality, where programs are developed in a single muddy interconnected language-feature-functionality-library lump, where changing any single component dramatically affects the implementation of the whole program.

We need to develop a programming vocabulary that can describe what a program will do that is independent of the programming language chosen if we are to ever hope to become more efficient and faster at developing computer programs. And if we can’t get better at developing relatively simple single-threaded solutions, we’ll have almost zero chance at getting better and faster at developing the comparatively more difficult parallel programs.

One of the baseline assumptions of Avian Computing Project is that the number of cores and processors available for a programming task will increase rapidly. Here’s an article published by ZDNet’s Nick Heath that supports that assumption.

University researchers in the United Kingdom are working on solutions to the growing problem of power consumption as mainstream processors are expected to contain hundreds of cores in the near future. <<emphasis added>> Power consumption outpaces performance gains when additional cores are added to processors so that, for example, a 16-core processor in an average smartphone would cut the maximum battery life to three hours. In addition to mobile devices, data centers crammed with server clusters face mounting energy demands due to the rising number of cloud services.

Left unchecked, the power consumption issue within three processor generations will require central-processing unit (CPU) designs that use as little as 50 percent of their circuitry at one time, to restrict energy use and waste heat that would ruin the chip.

The University of Southampton is part of a group of universities and companies joining in the Power-efficient, Reliable, Many-core Embedded systems (PRiME) project to explore ways that processors, operating systems, and applications could be redesigned to enable CPUs to intelligently pair power consumption with specific applications. PRiME is studying a dynamic power management model in which processors work with the operating system kernel to shut down parts of cores or modify the CPU’s clock speed and voltage based on exact application needs.