The Real Challenge w/ Robotics is Interface, not AI

Going to try to write (or at least start to write) something once a week during lab meeting for the remainder of my time in grad school, since it's just a wasted block of time every Tues otherwise.

AI, especially deep-learning, has been a hot topic in the tech blogosphere recently, especially w/ the dominant performance of AlphaGo. This comment via Hacker News does (imo) a really nice and quick job of summarizing just why deep-learning is a big deal, and how it's iterating on traditional machine-learning. I've always (perhaps simplistically) viewed machine-learning as just another form of statistical analysis that tries to find correlations between features that are extracted from some dataset, and then uses those correlations to try and make further predictions. Figuring out what features to try to correlate then becomes the primary challenge. Deep-learning attempts to determine these features automatically, and the resulting behavior can appear to emulate human intuition.

Super cool. Great work. I'm not impressed.

It doesn't surprise me that a computer is better at pattern-recognition and pattern-matching in a structured game with high dimensionality, and I don't see why that should surprise anyone. I've read that Go's complexity forces its players to make moves based on "feel" or "intuition", as if those are some magical terms exclusive to humans. Maybe a seasoned Go player can set me straight, but it seemed to be that both the human player and AlphaGo merely analyzed the board, compared it to layouts they have seen before, and then made a move that had proved to be advantageous in past scenarios. I understand the argument that in the deep-learning case, AlphaGo was determining on its own how to evaluate advantageous versus non-advantageous moves, but like Chess, Go has a clearly-defined endgame goal.

This is not to say that AlphaGo's achievement wasn't monumental. I just think the focus on robotic prowess in terms of AI is misplaced. I don't think it's a stretch to say that computational brute force has far exceeded human capabilities. I don't think that's the challenge in robotics anymore. I think the main task going forward is figuring out how to interface robotic agents in the real world so that they can interact with it, both physically and virtually.

Physical Interface (99 problems, but AI ain't one)

Certainly a little biased here, since manipulation is the focus of my PhD, but robotics are still ridiculously far away from interacting with the physical world, outside of structured research or manufacturing settings. I think it may surprise people how difficult it is for a robot to autonomously pick up an object it hasn't tried previously grasping. We're really good at designing hardware for a single task, or even a narrow range of tasks, but we're currently struggling a great deal to accommodate structural variability, whether in manipulated objects or the environment.

As complex as assembly lines nowadays can be, each robot (or mechanized component) along the line only has to deal with limited variability. For most intents/purposes, each machine can run open-loop, without any sort of feedback. Mobile robotic agents have been mostly limited to vacuum cleaners like the Roomba or glorified, motorized shopping carts like Kiva's (now Amazon Robotics') fleet. Neither of these deal particularly well w/ sudden changes in terrain, and both work best with a particular subset of environmental obstacles (as in, none).

Manipulator arms have only (imo) recently begun to creep out of the research space, thanks to efforts by Universal Robotics, Rethink, and Kinova. UR's use cases seem to mirror those in the manufacturing space, where it's used as a more affordable drop-in for an industrial arm to perform some particular repetitive task. I've mostly seen Kinova being deployed in teleoperation scenarios, where it serves as more an extension of human-driven intentions. Rethink, from what I've seen at expos, seem to be actually trying to deal w/ uncertainty or operational errors through compliant control. Every one of their "success cases" that I've seen tries to align its motion to some consistent environmental feature (like a wall or table edge). The assumption here is (I'm guessing) that variability is lower in force space than configuration space alone.

Robotic hands that can grasp objects robustly are still limited (which hopefully means it'll be easier for me to find a job). I think it's a little telling that most of the entries in the 2015 Amazon Picking Challenge used vacuums instead of grippers with fingers. Google's recently thrown the whole deep-learning toolkit at the problem, but I don't see it having the same level of success here as in Go. The analogy I like throwing out is the following: if you see someone struggling to eat spaghetti, it may not be because he/she is bad at eating; it may be that he/she's using a spoon.

AI is generally restricted to a limited number of output options, especially when employed on hardware, even if the problem itself has high dimensionality. When the range of possibilities is known explicitly, AI generally has no problems finding an optimal solution, and if the rules for finding that optimal solution are good enough, AI should succeed. But imagine now that AI only has limited knowledge. Imagine further that once an optimal decision is made, a decision that now may not necessarily be optimal due to limited knowledge, the action that AI decides upon isn't even guaranteed to succeed as you would expect. That's the physical interaction problem I'm seeing in the real world, one that I'm not seeing very many companies or groups trying to address.

For me, this begs the question: at what point do you stop trying to improve the AI and instead focus on modifying the constraints of your system? What if you could change the rules of the game? Why develop a robot to interact with existing logistics warehouses if you can have the opportunity to change the entire setup. If self-driving cars get their own lanes, how many (metaphorical) corners can you cut to get a truly viable solution?

Virtual Interface: Human2Bot

The deployment of AI has seemingly found much more success virtually, especially when the interface has some structure. Enter: the recent obsession with chatbots (Here's a great podcast on the topic). Creating a chatbot that understand human conversation is actually a pretty old problem, and I actually don't think that machine-learning has made AI better in terms of creating an AI that better "understands" humans. I think the recent chatbot trend, in creating chatbots with specific purpose (instead of just generalized conversation) has allowed developers to "cheat", in the sense that each chatbot can focus on a specific subset of lexicon, as well as have the luxury of dealing with a human participant that has a vested interest in the bot's success.

I think that should be the real story here. The human-bot interface for chatbots, textual natural language conversation, hasn't changed, but the manner in which humans are using them, as common platform from which to do very specific and well-established things (like ordering dinner or setting appointments), has made it easier for AI to succeed. There's far more structure here, and developers can give AI the freedom to ignore irrelevant topics and also demand formulaic triggers (like Amazon Alexa's "wake word"). If you think about it, much of what even the most successful current chatbots do can be replaced by your typical series of traditional apps, which I still think are really just stylized forms.

In a way, the chatbot isn't some independent intermediary between the human and the desired service. In their current, perhaps overhyped (for I am forever the pessimist), form, they're more like extensions of human intent, more similar to the teleoperated Kinova arm described above than a truly autonomous entity. The chatbot, in my view, has the primary goal of making human use of virtual services more seamless, which I would argue has been a common goal among all major apps.

Virtual Interface: Bot2Bot

In the podcast on chatbots (linked above), the guest brings up a good point: with chatbots acting on behalf of humans, what happens when chatbots start interacting with each other, not knowing that the other is a bot? Or, if the bots can figure out that they're in fact interacting with other bots, how does that interaction change? APIs have long since been a way for different apps to utilize each other's services, and I think the burgeoning IoT ecosystem has developers asking some important questions regarding how to standardize decentralized messaging protocols for simple, autonomous entities. I think IFTTT is a glimpse of what will eventually be a fully integrated bot network.

That said, whether this results in something truly useful, I don't know. No one seems to be really using chatbots to do anything more than what apps already do. Making that next step to connect to the physical world seems most interesting. Maybe that's actually through wearables (please God no). Maybe it's continuing to work on bringing robotics out of the warehouse and into the personal space (please Jibo no). Or maybe something else entirely. What do I know, I just write shit during lab meeting.