It’s going to be a big year for real-time strategy fans with several eagerly anticipated games slated to come out this year.

There is, of course, Blizzard’s long “in progress” Starcraft II which still seems to be on track for a mid-year release. It is Blizzard so it will probably be a very good game. It is Blizzard, so it probably won’t be terribly innovative (and early reports have not been encouraging in that regard) because that is Blizzard’s MO: nothing particularly new, but do the tried and true better than anyone else. There are more than enough Blizzard fanbois to ensure that this will be a huge hit, although expectations are so insanely high for this game that there is no way it can possibly deliver everything to everyone.

Ubisoft is releasing Eugen’s R.U.S.E. in the first quarter. Whether or not its deception-based game mechanics will “refresh” the strategy genre remains to be seen but it is an intriguing idea with a lot of potential. It is a considerable advance over the traditional (and usually poorly and unimaginatively implemented) “fog of war” approach. The term “fog of war” is usually misapplied, actually, when it comes to strategy games. In combat situations it refers to the wide variety of factors that sow the seeds of confusion in the minds of commanders: limited knowledge about the enemy, about the terrain, about the disposition of your own units, previously unrevealed problems with your own communications infrastructure or the capabilities of weapons and units, and so on. In gaming terms, it is simply “here is a part of the battlefield that we won’t let you see” and more by good luck than good management this approach usually manages to accomplish the first two limitations described above, but little else. R.U.S.E. takes this one step further in that some of what you see will in fact not turn out to be what you think you see (a column of light tanks rolls toward your position and is suddenly revealed to be a squadron of King Tigers , that sort of thing). While many RTS games (the Command and Conquer series, for example) have employed units with stealth and/or deception capabilities, this falls well short of R.U.S.E.‘s vision of deception as a core gameplay mechanic on the virtual battlefield. Again, some suspiciously high expectations for this one, but again, that may just reflect your average gamer’s inability to keep the potential of any game in meaningful perspective once they catch the scent.

Napoleon: Total War has just been released. It has been promoted as the game that Empire: Total War should have been upon its release, with an improved battlefield AI, some long overdue campaign options (armies suffer attrition when marching through extreme cold and heat) and better eye candy. I have it, haven’t played it yet, reserving judgment.

These, then, are the heavy hitters of the RTS gaming world. However, the anticipation surrounding the release of all these titles will ensure that other titles, potentially just as innovative if not more so, will go unnoticed, simply because they come from smaller studios that don’t have the means to blanket the Internet and the gaming press with publicity.

Like this:

Situational awareness is a military term that you also find used a lot in discussion of combat tactics and in various military gaming simulations (my experience with the term comes from many years playing online aerial combat simulations). The term encompasses the information processing demands faced by individuals and groups as they participate in complex, stressful, and rapidly changing combat situations. Essentially, SA involves treating the battlefield (both virtual and actual) as a dynamic environment where dispositional assessment (what is the location and status of friendly and enemy units) is coupled with short-term and long-term predictive behavior (where will friendly and enemy units be in 30 seconds? What is that enemy likely to do if I perform this action? What are my escape routes?). SA is especially challenging in a flight context that requires players to act and extend their attention and anticipation through three dimensions. Furthermore, the speed of aerial combat even in propeller-driven aircraft ensures that the change-of-state that forces ongoing updates in a pilot’s SA can be highly variable. For two aircraft engaged in the same plane (turning to try and get on one another’s tail, for example) the fight may unfold almost in slow motion. The situation with two aircraft going head to head at over 600 scale miles per hour is very different. Moreover, a fight is usually a mix of these states, with a pilot being forced to concentrate on both the slow-motion fight in which they may be engaged, while other aircraft around him or her present a dizzying and rapidly changing set of possible threats and opportunities.

In a gaming context SA is intimately involved with a player’s perception of the sophistication and complexity of the game’s AI. SA is both predictive and iterative (requiring a repetitive process of evaluation and re-evaluation): the more frustrating the AI, in that sense, the greater the chance that the player will attribute to it human-like characteristics. If, on the other hand, AI enemies prove to be utterly predictable in their individual actions, and reliably similar in their overall patterns of behavior (i.e. amenable to “rinse and repeat” kill strategies) then the AI will be perceived as artificial, true, but also dumb. Therefore, one of the most important and influential ways of fooling players into attributing complex, human-like behavior in AI entities is to influence their perception of the environment.

To put it simply, attributing a human-like complexity to an AI opponent depends not simply upon the behavior of the AI itself, but upon the player’s perception of the complexity of their environment. I have pointed out in other posts that there has been a lot of bullshit distributed quite liberally concerning the supposedly precious attributes of human intelligence. However, this same issue of environmental complexity also plays into our perception of human intelligence more than we might think (or like to acknowledge). When our interactions with our fellow human-beings aren’t being shaped by a massive set of assumptions, our perception of the nature and sophistication of their communication is largely an information processing challenging. To be sure, the amount of information we get when interacting with a fellow human being is extensive, and much of it is processed without our necessarily being aware of it: body posture, gestures big and small, facial expressions, even less tangible elements such as smell, tone, the tone behind someone’s tone, and so on.

It is worth considering how individuals get away with some form of trickery in situations of either face-to-face or (perhaps more useful for our purposes) mediated interaction: outright lying, misdirection, strategic omission, and all the rest. The chances of putting something over on an individual are greatly increased if that individual’s ability to assimilate information is impaired. This impairment can happen if a) the individual is overloaded with the sheer quantity of information, b) they are receiving a lot of pieces of information via different channels each of which requires its own set of analytical and response procedures (the multi-tasking phenomenon) or c) the amount of information is so limited that the individual is forced to act based largely on ignorance. Politicians exploit one or indeed all of these strategies all the time. Little kids learn these kind of manipulative strategies in the womb. (Come on, you must have figured out pretty early on that you had a better chance of being allowed to do something a little sketchy if you asked one of your parents while they were distracted by other things, or if you happened to leave out a few crucial details (“I just want to go over to Steve’s place and meet some friends. . .one of whom will have brought his parents’ Ferrari and is going to drive us all up to Santa Barbara where there is this kick ass Rave/DIY tattoo party going down.”)

To be sure, there are other factors that play into the likelihood that someone will be able to deceive someone else. The deceiver could be, well, just really skilled at the dark arts of deception. The person at whom the deception is aimed could be a moron. From a game design point of view, however, we can’t consider either of these factors as having any bearing on the matter. As I’ve argued in previous posts, the quest to create authentically humanly intelligent AI has been a quixotic failure to this point, and will probably be so for the foreseeable future. Nor do good games generally result from the assumption that the player is a moron. It may well be true–given that the mean statistical distribution of stupidity in the general population is quite high–but if you design a game around that belief it tends to produce really bad games. Or a bestseller by id software. Whichever. So what we’re left with, from a game design point of view, is manipulating the environmental information available to the player. Make no mistake, the goal here is deception. Designers are trying to perpetrate a (healthy) deception on players by getting them to accept a radically limited AI capability as a sophisticated, adaptable, capable, almost human-like opponent. It is a complex game of “pay no attention to the man behind the curtain” where success depends on keeping the player too busy to be able to part the curtain, or unable to locate the curtain in the first place.

Like this:

I think we’re getting into the ever-present, ever-frustrating topic of corporate influence on game development. Maybe it’s a copout, but I can see how “Good AI” development can be pricey and thereby unappealing to developers (specifically development firms) driven by profit. As much as I would argue that games are someone’s or a group of people’s works of art, I do recognize a significant difference between artists in the traditional sense and game developers: money. Even the most famous of traditional artists starved, often surviving only on their love for what they did. Please do correct me if I’m wrong, but I at least have the impression that there are few if any starving artists in the game development community who would have enough passion and resources to invest the time and money in developing better AI, not knowing beyond a shadow of a doubt that it would make them rich.

However, the above is my belief related to the development of a perfect (or close to perfect) single artificial entity, a bot. Because of corporate interests and the easy alternatives that Twitchdoctor pointed out, I don’t think we will see development companies focusing on making the bots in their games ‘think’ rather than simply giving them more health, stronger weapons, better aim, and of course, more grenades. Twitchdoctor’s post (Good AI, Bad AI) presents a powerful alternative to adjusting the bots though—changing the conditions of the game. As in Twitchdoctor’s example of Thief, the conditions of the game can be changed to accommodate difficulty increase and substitute for (or at least distract from) imperfect AI.

Share this:

Like this:

TwinHits’ comment on my previous post perhaps had an unstated question behind it: isn’t it time to stop tearing down other people’s AI tests and come up with something creative? And the answer would be, yes. . .in a minute.

Just to be clear, I think that both the kind of Turing Test represented by the Loebner prize and the Botprize started out on the right track but are confusing two issues: one of them useful, one of them of less so. Turing’s original question was twofold: can a machine think? And if so, how would we know? His hypothesis was that if a machine could offer responses that were indistinguishable from those of a human, it could be said to think. And since then most Turing Tests such as the prize contests I have discussed have taken it as axiomatic that these two things are connected. However, I don’t think they are necessarily. Moreover, the emphasis in these tests is always on the AI itself, on its level of intelligence. Which seems to be the blindingly obvious point of the whole exercise. But the blindingly obvious isn’t so far yielding some very interesting results.

The “intelligence” issue may be interesting from the pure research perspective–although as I pointed out, if you were really interested in evaluating that aspect the entire contest would be run in a less tritely comparative fashion. The end result is that these prizes now come across as little more than cheerleading exercises for our own human awesomeness: we are so smart, creative, flexible, adaptable, sophisticated and cunning that no AI can yet fool us. Even if by some miracle an AI did manage to fool people sufficiently to pass one of these tests, that still wouldn’t be that useful a result for the rest of us. By and large people do not interact with machines, much less their overall environment, in the form of a rigidly controlled testing procedure. Therefore gamers and game designers should be interested in the other side of this question: not the capabilities of the AI itself, but the ability of the AI to fool us. In other words, less emphasis on the intelligence, more on the artificial (hence the name of this blog!).

The ability of the AI to fool us into thinking that it possesses some human characteristics is going to be based in part upon the inherent capabilities of the AI, naturally. But my argument is that it has much more to do, ultimately, with the design of the environment in which the AI is to operate, and the corresponding latitude afforded the player. Especially for gaming purposes what counts is not how smart the AI really is but how smart it appears to be. That perception is heavily shaped by the context of the game in general and that of the player in particular. It is possible for a well-designed game to “fool” the player into feeling as if they have encountered some smart AI even if the pure technical capabilities of the AI may be relatively rudimentary.

At this point, then, it might be useful to start identifying some examples of AI design strategies in games that are either particularly bad or particularly effective. I’m going to start with a couple of my favorites; you’ll notice that in several of these examples some of the good and bad aspects I’m describing can be attributed to game design issues in general as much as the AI design specifically–but that’s my point.

BAD AI

Dumb Difficulty Substitutes for Smarts: One of my least favorite approaches to creating a challenging game is where the “difficulty” settings simply boost the stats of the AI (and/or correspondingly reduce those of the player). So the enemy doesn’t become smarter, it just becomes physically tougher, more accurate and its weapons deal more damage, and/or there are more of them. This approach has been around virtually as long as electronic games themselves, and in small doses and in an appropriate environment it can provide a fun challenge. But it has become the unimaginative default for game designers. And sometimes it can be so badly implemented that it produces some unexpectedly hilarious side-effects that completely destroy a player’s immersion in the game.

One example is Call of Duty: World at War. As with all the other titles in the series, the game features painstaking attention to historical detail as it recreates WWII battlefields in both the Pacific and Eastern Front. The level design, sound design, weaponry, all make for a pretty immersive experience. . .until you crank it up to the highest difficulty level. The actual quality and tactics of the AI doesn’t change significantly at any difficulty level. But at the highest level, the Japanese AI suddenly starts up a passionate love affair with the grenade.

Picture this. A scenario has you as a marine fighting to clear a Japanese-held island late in the war. The defenders have been bombed, shelled, and strafed to buggery. Their supply lines have been cut, they’ve been reduced to eating rodents and the more substantial examples of the local insect population. They are so low on ammunition that they frequently resort to Banzai charges. But crank the difficulty up to max, and suddenly they discover a bottomless supply crate of grenades. Which they proceed to rain down on you with all the precision and frequency of an automated launcher. My “Sod this for a joke” moment came when I watched no fewer than 6 grenades land in a neat circle around me, after I had dodged no less than ten in the previous two minutes. This is not “creating a challenge,” it is covering up the limited abilities of your AI, and is a fundamentally unimaginative game design strategy (see “Change the Player”, below).

The One-Trick Pony: This is where the AI entities may have one single mode of operating which they deploy constantly every time you meet them. Sometimes this mode is shared by many entities that are supposed to be functionally distinct. This again tends to be a game design default, but the example that springs immediately to mind is Doom 3. Yes, I know id software is responsible for all this: Wolfenstein 3D and Doom helped to get me hooked on the whole games thing in the first place. And yes, I know their focus is really on creating great multiplayer combat. But their core game design and game design strategies since then have never changed: put all your effort into high end graphics at the expense of narrative and challenging AI. Doom 3 is no exception.

While the game does a great job riffing on (or ripping off, depending on your point of view) the original Doom, the AI is amongst the least challenging and least interesting out there. Most AI entities pretty much have one form of attack and one attack only. After the 300th time that an imp jumped out from the shadows with blinding superhuman speed and then simply stood there ripping at me while I poured lead into it, I recognized the game for what it was. Not smart. Not challenging. Simply a chew toy for the slavering OCD crowd. Never finished it. Glad I didn’t pay too much for it (thank you Steam!).

GOOD AI

Limit the opportunities for the AI to be stupid: One of the most satisfying game AIs I’ve encountered was that in the original F.E.A.R. The firefights always felt very tense, and organic, with the enemy soldiers maneuvering to try and get better shots on me, panicking when I had killed too many of them, taking cover and refusing to come out, using the odd grenade at a strategically appropriate moment. But the genius of the game was the fact that the AI was made to appear smarter by the fact that it was given a very constrained environment in which to operate. Most of the combat in the game takes place indoors in very close quarters, with limited sight lines. First of all, the AI may have been behaving in some pretty questionable ways, for all I know. While I was cowering behind a piece of furniture they may have been spending all their time running headlong into walls or playing “pull my finger.” But your limited view also limited the chance that you would actually witness the kind of AI behavior that might crack the immersion for you. This strategy was enhanced by the fact that so much of your environment could be destroyed; even if you could get a clear line of sight to your enemy your view was often filled with smoke, plaster dust and swirling pieces of destroyed furniture. Secondly, the tight quarters means, in effect, fewer opportunities for the AI to screw up. Not that I want to pretend that issues like pathfinding in a confined space don’t pose a significant challenge. But this is one game where a very nice balance of AI, environmental factors, and limited player abilities all coincide to actually make the AI appear relatively smart.

Unfortunately, Monolith moved away from this design in subsequent games and resorted to the Dumb Difficulty move (see above); enemy AI didn’t change in the sequels, they just become more accurate and more resilient: not smarter, just tougher.

Change the Player not the AI: What are some alternatives to the Dumb Difficulty move? While that move is unimaginative, it is responding to a couple of real issues: the need to promote replayability and players’ desire to challenge themselves. An effective response to this dilemma that doesn’t simply involve turning your AI into super-soldiers is a key part of the Thief series of games. In these games, changing the level of difficulty in the game doesn’t change your enemies at all; it changes the nature of the tasks you have to accomplish in each level and the way you must accomplish them. The amount of loot you need to steal goes up, additional objectives are added, and at the highest difficulty level you are not allowed to kill any NPCs and sometimes you aren’t even allowed to be seen by anyone. Voila. The game is now challenging and you haven’t had to resort to populating your entire game with walking tanks masquerading as humans. This is a great example of your perception of the difficulty of the game and even the enemies you meet being influenced by a change in the way that you are forced to relate to your entire environment.

I’m interested in other ideas people might have for smart or stupid AI and/or design strategies in games.

Like this:

Gain 5 points for each correct answer. Lose 3 points for each wrong answer. Lose 4 points for each question skipped. Gain 6 points for attempting an answer then giving up in disgust.

In my previous post I argued that while the best-known instance of the Turing Test, the Loebner Prize, is ostensibly set up to evaluate the ability of an AI bot to fool a human in simulated conversation, the parameters of the competition focus more on testing a human being’s ability to differentiate between human and machine.

If we turn to the world of electronic games we find something very similar, albeit with some revealing differences. In December 2008 Aussie developer 2K Games sponsored the inaugural Botprize, the “Turing Test for Bots;” a second iteration of the contest has just been played out in Milan. The contest, held in conjunction with the IEEE Symposium on Computational Intelligence and Games, is designed to test the ability of a bot to pass as a human player of a first-person shooter. The format is once again the classic Turing model: a judge faces off against a human and a bot in a deathmatch shootout using a modified version of Unreal Tournament (2004). The test is operating in a different ballpark than the Loebner prize (its more of a neighborhood sandlot, really) with its offer of a cash prize of only $7,000 and a trip to 2K’s Canberra studio. To win the major prize a team needs to fool 80% (typically 4 out of 5) of the judges. As we might expect from the long inglorious history of the Loebner prize, no one has come close to grabbing the major award, which leaves everyone fighting it out for the minor money: $2000 and a trip to the studio for the team whose bot is judged to have the highest average “human-ness” (their word, not mine, I swear).

To cut a long, but predictable, story short, the bots fail. Miserably. In 2008, 2 of the five bots failed to convince a single judge. Two bots convinced only two of the judges. While complete results have yet to be posted for the 2009 prize, the bots as a whole did a little better, with each fooling at least one of the 5 judges. Woohoo.

Now on the face of it this looks like a very simple challenge. Whichever player kills you and then takes the time to teabag you, that’s the human. (There’s an idea; let’s replace the Turing Test with the Teabag Test: the winner for the Loebner prize under these rules would be the bot that convincingly spews random homophobic insults at you at the slightest provocation). But seriously folks. . .

The frenetic pace of an online deathmatch does make picking the bot in each round a daunting task for the casual gamer. (You can check out a series of short videos from the 2008 contest and try it for yourself). However the judges’ notes indicate that they have a series of behaviors that they are looking for: reliance on a single weapon, losing track of their target, failing to pursue a target, for example, can all be telltale signs of a bot. However the Botprize as a whole suffers from the same weaknesses as the Loebner prize. In every round the judge always knows that one of the avatars they will be facing is nonhuman which makes it a contest more focused on their skills at differentiating machines from humans (something that is tacitly acknowledged by a “best judge” award). Although it is entirely possible to run this test with different configurations (two humans, two bots, and the judge always in the dark) there doesn’t appear to be any interest in employing such a more methodologically varied test.

However, while this form of traditional Turing test applied to chatterbots produces a completely artificial and constrained conversational context that bears little relationship to real human conversation, the method does, it is true, have some marginal utility when evaluating bot/human performance in the world of multiplayer FPS games. After all, in the world of online gaming, cheating tools like speed or damage hacks are common enough that most players are likely to have experienced them firsthand or heard of them. Thus, while trying to figure out whether the entity you are facing is human or not has no relevance to everyday human conversation, wondering about the possibly enhanced or downright artificial nature of the player you are facing in a game is a distinct possibility!

It is also important to note that the AI design task in each of these Turing tests is very different. In the Loebner prize, designers are faced with the task of “smartening up” their AI to make it capable of holding the kind of relatively sophisticated conversational exchanges that are, somewhat romantically, envisaged to be the stuff of everyday human interaction. When it comes to FPS games, however, it is relatively easy to design AI characters that are all powerful super-soldiers. Many of us have played games with this kind of AI design (usually not for very long). This is the NPC that can kill you with a single headshot from 500 metres while standing on their head with the weapon clenched firmly between their butt cheeks. Gamers just love that kind of “smart” AI. The challenge for the Botprize designers, therefore, is to dumb the AI down, to make it play more like a fallible human.

Nevertheless, there remains this reluctance in either of these Turing tests to provide a more methodologically varied test and it is fair to ask why. Part of the reason is undoubtedly that the Turing Test has acquired the status of Holy Writ amongst AI geeks. Despite the fact that there is some debate as to what the parameters actually were when Alan Turing first postulated the idea of testing a machine’s ability to play the imitation game, rewriting the “rules” seems to be regarded by people as akin to rewriting the ten commandments to remove, say, that pesky adultery clause: it would make life a lot easier and more interesting but, you know, it’s just not done!

There is another, more important reason, and it is indicated by a less obvious result of the 2008 Botprize. Of the human players involved in the contest, 2 managed to convince only 2 of the judges that they were in fact human. Of the five players, only one convinced all five judges that he was human. These Turing tests are not designed around criteria for meaningfully evaluating AIs, they are instead designed around a set of criteria that is supposed to define what is believed to constitute human behavior, either in a conversational or a gaming context. What I suspect people are reluctant to acknowledge, however, is that these criteria are, at best, highly romanticized, and at worst, complete BS. Most human conversational interaction, for example, is completely unlike that imagined by the Loebner prize. Rather than being focused, intense, and driven by evaluative need, most everyday conversations are trivial, characterized by a high degree of inattention, consist mostly of filler, and have no purpose except to keep open channels of communication. Most people just don’t have much that is worth saying and they spend their time saying it badly but saying it a lot.

Were the Loebner prize and the Botprize to be run in a more methodologically sound fashion, I would hazard a guess that one immediate result would be that the number of “humans” who were determined to be machines would rise dramatically, certainly in the case of the Botprize. The patently limited parameters in both these Turing tests, in other words, are designed to prevent us from finding out how truly awful we are at attempting to affirm and enforce the criteria that supposedly render humans distinctive. More disturbingly (or intriguingly, depending on your point of view) it might show how inclined we are already to see one another as species of machine.

Please use a No. 2 pencil. You have one hour. There will be no bathroom breaks.

Incanter and I have been engaging in that popular past-time of many gamers, talking about Artificial Intelligence in games. And when I say talking, I of course mean complaining. True, compared with the dark ages of the early 90s when I began taking games seriously, there have been some notable improvements in gaming AI. For the most part FPS enemies no longer simply leap out from behind corners and stand there blazing away at you with a pistol while you saw them in half with a mini-gun (there are, of course, always unfortunate exceptions, (cough) Doom 3 (cough)). They don’t usually stand there contemplating the transcendent beauty of the texture maps while you walk up unnoticed beside them and put a cap in their arse. The quality of sidekick characters has improved markedly and there have also been dramatic improvements in the area of the kind of unit AI you find in the best RTS games. However, I think it’s fair to say that gaming AI has really evolved only from the level of bloody annoying to that of not too aggravating. I feel about smart, satisfying game AI the way people who grew up in the 1950s must feel about personal jetpacks: what the hell happened?

It might be useful at this juncture, then, to ask whether or not some of the assumptions informing the quest to develop sophisticated game AI are in need of an overhaul. I want to start, however, by looking at a slightly different issue: the tests used to evaluate the intelligence of artificial entities.

There is, of course, the Turing Test, the best known instantiation of which is probably the Loebner prize established in 1990: a grand prize of $100,000 to be awarded to an AI that is indistinguishable from a human. Needless to say, the grand prize has never been awarded and every year people fight it out for the $3,000 consolation prize for the entity most like a human. (The 2009 contest was held on September 6 and the results have not yet been announced). The details of the contest have varied slightly over the years, but it always seems to return to the “classic” format: a judge faces off against both a human and an AI and tries to guess which is which. People obviously expend blood and treasure on this endeavour, but the abilities of even the winning chatterbots are less than inspiring and those of the losing ones are downright embarrassing. Ironically, the last place bot in the 2008 contest was programmed with a religious personality (and, according to the transcripts, Brother Jerome spent most of his time not responding at all–perhaps the bot should instead have been called God?) while the eventual winner, Elbot, apparently fooled a couple of judges. . .despite having the programmed persona of a robot. (You can judge Elbot’s conversational chops for yourself).

Now there is probably a small fortune awaiting the first person to develop a convincingly human chatterbot. That way someone can install a machine with a limited ability to speak English and an even more limited ability to understand it into customer phone support positions and dispense with the expensive intermediary step of having to turn real human beings into unhelpful machines. But, for the most part the success or failure of this kind of Turing test is irrelevant to the concerns of designing game AI.

I am, however, interested in the test conditions used by the Loebner prize and the degree to which they stack the deck against the AI. These parameters are in fact representative of other attempts to evaluate AIs, including those more specific to gaming: they are less concerned with meaningfully evaluating the ability of an AI to imitate a human than with maintaining a commonplace (and, I would add, overly optimistic) belief in the sophistication of human social interaction.

The question we should be asking is not if an AI can imitate a human being, but under what conditions? For example, as mediated human communication approaches more closely the condition of machine-generated gobbeldygook the likelihood for an AI to fool a human increases. If the test were based around tweets or text messages I’d expect an AI to do pretty well. (Interestingly, the first winner of the Loebner prize won, somewhat controversially, by being able to mimic the typing and grammatical errors of a human being).

The way the Loebner test (and others, something I will explore in a subsequent post) is set up, however, it is humans that are being tested, not the AI: what is being evaluated is not a bot’s ability to fool a human but the ability of a human to distinguish between a bot and a human. The Loebner prize test conditions, while claiming to test the ability of a bot to engage in a naturalistic conversation, therefore employ a highly artificial conversational set-up. There are only ever two possible conversation partners, and the judge converses with each, mano a mano (or mano a micro), in turn. The judge is (almost always) aware that one of them is non-human (and you can see the judge and the human partner making reference to this in many of the contest transcripts). The judge is closely scrutinizing every utterance in order to determine whether or not their conversational partner is non-human.

If this is your everyday conversational reality then you are either locked in a double wide somewhere in Kansas feverishly updating your blackhelicoptersarecoming.org blog, or have a serious Halo addiction for which you need to seek immediate help before you harm yourself and/or others. Personally, I don’t have a lot of conversations that involve me trying to determine if one of my friends is more human than the other (some, sure, but not many).

If you were really interested in testing the ability of the AI to imitate human communication, wouldn’t you structure a less predictable test? You might, for example, mix it up a bit. Sometimes the human judge would be facing one human and one bot; sometimes they might be facing two humans, sometimes two bots, and they would never know which combination they were facing. Perhaps the judges would occasionally be faced with three entities. Or, you could even make the test a really high stakes one. You, the judge, are interacting with only one entity: tell me if it is human or not. You can see how all these combinations might complicate things.

What the Loebner contest focuses on is a model of human communication that is content rich but context limited. AIs fail this kind of Turing test with monotonous regularity because they are expected to provide full and satisfying responses on a wide variety of potential conversational topics and to do so in a fashion that indicates attentiveness to the needs of their conversational partner. This is what most people would probably think of as the basis of real human communication. However this expectation of subject-oriented (in two senses) sophistication is purchased only through creating a restrictive, artificial conversational framework. In everyday human converse, how many of the following apply?

The stakes are high; a lot rides on the outcome of the particular conversation;

Your conversational partner has your fierce, undivided attention and you treat their utterances as if you have (or should have) a similar degree of attention from them;

The purpose of the conversation is to compare their utterances with those from someone else;

The comparison is, furthermore, based not on the truth or usefulness of the information imparted by your conversational partners but on the degree to which their utterances qualify as syntactically, logically, and situationally valid.

Obviously this represents a highly idealized view of “standard” human conversation. Indeed, most human conversations would probably fail such a Turing test.

In my next post I want to look at how this kind of Turing test compares with one method for evaluating game AI: the Botprize.