Cameron and Tracey Hughes

Dr. Dobb's Bloggers

Go Ahead, Drink the Koolaid

March 24, 2009

It really all depends on what we are referring to when we ask the software agent to solve the riddle of which tastes better coke or pepsi?. In the typical user-web-browser-search-engine-interaction, the onus for all of the parallel processing is on the user.

It really all depends on what we are referring to when we ask the software agent to solve the riddle of which tastes better coke or pepsi?. In the typical user-web-browser-search-engine-interaction, the onus for all of the parallel processing is on the user.

Let's say the user poses the coke or pepsi question to the typical search engine. Caution over simplified explanation follows! The search engine will do a little question classification. It will then consult an index of entries that are tied to known web URLs. It will then identify some candidates, rank those candidates by a few mysterious probability formulas. Once the candidates are ranked the search engine will then return to the user a list of web pages or urls that have some probability of being relevant to the string of tokens that the user typed.

Are there opportunities for parallelism and multicore computers in this user-web-browser-search-engine process? Of course there are! The Cloud is becoming more parallel by the day. But the real work rests with the user. The typical search engine does not understand the meaning of the user's question. To the typical search engine the user's query is a string of tokens consisting of (hopefully) potentially important keywords. Perhaps the search engine has seen the pattern of keywords enough times to accurately guess what the user wants, but in general the search engine does not understand the semantics or pragmatics of the user's query. Although among the dreams of the Semantic Web crowd, there is a cure to this particular malady.

Ah but alas, this aforementioned keyword-search-probability-matching-engine is simply not the ghost we're chasing. We would like to shift the onus from the user to the software agents. Or at least make it a 50/50 proposition. We want our software agents to understand the semantics, pragmatics and even relevance of the user's query. We want our software agents to actually understand what the user is asking. According to the jury this takes enormous computing power. According to some jurors we will never have enough computing power to implement such software agents. Ba humbug! We once returned from a snipe hunt with snipe in hand to the absolute shock of our interlocutors. Tracey and I happen to believe that among other things parallel programming techniques are at the heart of getting software agents to rationally and efficiently approach such questions. But herein lies the rub:

First, to the uninitiated there is nothing apparently parallel about the question. It lacks that quality of being embarrassingly parallel. On the surface there is no immediate connection between this question, parallelism or multicore computers for that matter (appearances can be deceiving).

Second, there aren't any obligatory data structures or algorithms associated with the question. Outside of it being a string of words or tokens it doesn't suggest anything in the way data structures and certainly nothing reminiscent of parallel architecture hints or clues.

Third, it's not clear whether software agents can solve or answer such a question in the first place (just ask the jurors).

This all takes us to one of the ultimate challenges of parallel programming, especially when we have access to massive parallelism. How do we move from the original statement of a problem to some solution model that involves parallelism? Sure if we're on a path well traveled like client-server, boss-worker, or peer-to-peer it can be rather obvious of how to get from said problem to said solution. But what if we're in unchartered waters? What if we need the computer to really understand rather than pattern match? Further, what if we told you that understanding requires parallelism. The regularly traveled roads of parallelism tend to be mute when it comes to computer understanding. Only the brave dare take this journey or at the very least only those with nothing left to loose! But we digress.

Back to the problem I would like my software agents to solve. In order for the agents to satisfactorily answer my question there are at least five high level tasks they will have to complete. What we've found so far is that these tasks devour as much parallelism as we can dish out. Ahem... These tasks have a almost eery connection to the notion of AI-complete problems. We'll introduce these tasks here, but we'll have to revisit them many times as we connect search, problem solving, parallelism, multicore computers and AI-complete. But for now lets present the basic outline of what our agent's problem solving pattern looks like:

begin
fowim() // figure out what I mean
foiip() // figure out if its possible
foitc() // figure out if they can
dapss() // deploy appropriate problem solving strategies
paaos() // present acceptable answers or solutions
end

It's getting past my bed time and I can't explain all of this now, but I do keep track of IOUs.

My original question about coke or pepsi does not indicate what coke or pepsi is. The traditional search engine approach would rely on the user to pick which web pages or URLs had the appropriate coke or pepsi reference. In our case the software agent has the task of figuring out what I mean by coke or pepsi. But before it could figure out what I meant by coke or pepsi, it would have to figure out the meaning of the the string of words in the question. What does 'which' mean? What does 'tastes' mean and so on. In fact in the parlance of NLP (Natural Language Processing) there are several layers of analysis that at least one of our software agents would have to perform:

Lucky for us, some of the embarrassingly parallel attributes of our problem show up in these layers of linguistic analysis! Yep, our software agents have to actually try to figure out what I mean (fowim) by this string of tokens.

Ah yes... there are many places where NLP and the notions of AI-complete meet and conspire to deprive us of a winning hand. We've tried this from every angle we could think of but no matter what we try the process of having the software agents fowim() from such skimpy information just leads again and again to astronomical combinatorial search. We'll explain how we end up with astronomical combinatorial search but that's part of a future adventure. The pressing question we have at the moment is:

Can our super charged multicore computers running our software agents perform the linguistic levels of analysis fast enough to allow the agents to fowim() in an acceptable amount of time?

What's even harder than fowim(), is figuring out if what I mean is even possible, foiip(). That is, am I presenting the software agents with a problem that cannot be realistically solved? Am I attempting to send the agents on fools errands? Again the computational power required to figure out whether a practical or acceptable solution exists is considerable. We will eventually have to open the can of computational complexity worms because the computational complexity will let us know whether multicore computers and parallelism can help and if so, to what extent, and if not why. But first we have to fold a little space and do a little paradigm shifting.

Not so hard, but still tip toeing around the AI-complete fault line is having the agent's determine if they have enough resources (knowledge, computational power, etc.) to solve the problem or answer the question fotic(). fowim(), foiip(), foitc(), have proved to be so computationally intense that Tracey and I are thoroughly convinced (at least for right now) that we absolutely need all of the processors we can get in order to get our software agents to execute the problem solving pattern in a timely manner.

The sad thing is that our current approaches to parallelism don't cut it. Our procedural paradigms of parallelism buckle under the kinds of computational complexity that we're facing. Even if we could initially use the old true-and-tried approaches to implement the software agents, the brittle results would simply be too costly to maintain. We will have much to say about that later, but we are presently persuaded that our current idioms of parallel programming need to evolve. At the risk of sounding too apocalyptic we must say a paradigm shift of gothic proportions is in order before we can really mix massive parallelism and computer programming. Of course, a certain amount of drinking the koolaid will be necessary before you totally agree.

You will first have to get to the place where a computer program that answers the question:

Which tastes better coke or pepsi?

is an obvious candidate for parallel processing. But at this point Tracey and I will assume that you need more koolaid (which we will supply regularly) . The flavor is not so important as the amount. The more you drink the clearer things become. The Singularity, the Cloud, the Semantic Web, all of those new ontologies, our chickens named pepsi and coke each beg the question of parallel computation and yes, we suspect that the ghosts of ICOT and the Fifth Generation project conceal clues to the answers.

I'll have a grape and my friend here will take a fruit punch, large please ...

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!