Progress in artificial intelligence causes some people to worry that software will take jobs such as driving trucks away from humans. Now leading researchers are finding that they can make software that can learn to do one of the trickiest parts of their own jobs—the task of designing machine-learning software.

In one experiment, researchers at the Google Brain artificial intelligence research group had software design a machine-learning system to take a test used to benchmark software that processes language. What it came up with surpassed previously published results from software designed by humans.

In recent months several other groups have also reported progress on getting learning software to make learning software. They include researchers at the nonprofit research institute OpenAI (which was cofounded by Elon Musk), MIT, the University of California, Berkeley, and Google’s other artificial intelligence research group, DeepMind.

If self-starting AI techniques become practical, they could increase the pace at which machine-learning software is implemented across the economy. Companies must currently pay a premium for machine-learning experts, who are in short supply.

MIT researchers are employing novel machine-learning techniques to improve the quality of life for patients by reducing toxic chemotherapy and radiotherapy dosing for glioblastoma, the most aggressive form of brain cancer.

Glioblastoma is a malignant tumor that appears in the brain or spinal cord, and prognosis for adults is no more than five years. Patients must endure a combination of radiation therapy and multiple drugs taken every month. Medical professionals generally administer maximum safe drug doses to shrink the tumor as much as possible. But these strong pharmaceuticals still cause debilitating side effects in patients.

In a paper presented at the 2018 Machine Learning for Healthcare conference at Stanford University, MIT Media Lab researchers detail a model that could make dosing regimens less toxic but still effective. Powered by a “self-learning” machine-learning technique, the model looks at treatment regimens currently in use, and iteratively adjusts the doses. Eventually, it finds an optimal treatment plan, with the lowest possible potency and frequency of doses that should still reduce tumor sizes to a degree comparable to that of traditional regimens.

MIT researchers aim to improve the quality of life for patients suffering from glioblastoma, the most aggressive form of brain cancer, with a machine-learning model that makes chemotherapy and radiotherapy dosing regimens less toxic but still as effective as human-designed regimens. Credit: MIT

In simulated trials of 50 patients, the machine-learning model designed treatment cycles that reduced the potency to a quarter or half of nearly all the doses while maintaining the same tumor-shrinking potential. Many times, it skipped doses altogether, scheduling administrations only twice a year instead of monthly.

“We kept the goal, where we have to help patients by reducing tumor sizes but, at the same time, we want to make sure the quality of life — the dosing toxicity — doesn’t lead to overwhelming sickness and harmful side effects,” says Pratik Shah, a principal investigator at the Media Lab who supervised this research.

The paper’s first author is Media Lab researcher Gregory Yauney.

Rewarding good choices

The researchers’ model uses a technique called reinforced learning (RL), a method inspired by behavioral psychology, in which a model learns to favor certain behavior that leads to a desired outcome.

The technique comprises artificially intelligent “agents” that complete “actions” in an unpredictable, complex environment to reach a desired “outcome.” Whenever it completes an action, the agent receives a “reward” or “penalty,” depending on whether the action works toward the outcome. Then, the agent adjusts its actions accordingly to achieve that outcome.

Rewards and penalties are basically positive and negative numbers, say +1 or -1. Their values vary by the action taken, calculated by probability of succeeding or failing at the outcome, among other factors. The agent is essentially trying to numerically optimize all actions, based on reward and penalty values, to get to a maximum outcome score for a given task.

The approach was used to train the computer program DeepMind that in 2016 made headlines for beating one of the world’s best human players in the game “Go.” It’s also used to train driverless cars in maneuvers, such as merging into traffic or parking, where the vehicle will practice over and over, adjusting its course, until it gets it right.

The researchers adapted an RL model for glioblastoma treatments that use a combination of the drugs temozolomide (TMZ) and procarbazine, lomustine, and vincristine (PVC), administered over weeks or months.

The model’s agent combs through traditionally administered regimens. These regimens are based on protocols that have been used clinically for decades and are based on animal testing and various clinical trials. Oncologists use these established protocols to predict how much doses to give patients based on weight.

As the model explores the regimen, at each planned dosing interval — say, once a month — it decides on one of several actions. It can, first, either initiate or withhold a dose. If it does administer, it then decides if the entire dose, or only a portion, is necessary. At each action, it pings another clinical model — often used to predict a tumor’s change in size in response to treatments — to see if the action shrinks the mean tumor diameter. If it does, the model receives a reward.

However, the researchers also had to make sure the model doesn’t just dish out a maximum number and potency of doses. Whenever the model chooses to administer all full doses, therefore, it gets penalized, so instead chooses fewer, smaller doses. “If all we want to do is reduce the mean tumor diameter, and let it take whatever actions it wants, it will administer drugs irresponsibly,” Shah says. “Instead, we said, ‘We need to reduce the harmful actions it takes to get to that outcome.’”

This represents an “unorthodox RL model, described in the paper for the first time,” Shah says, that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction). Traditional RL models work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome. On the other hand, the researchers’ model, at each action, has flexibility to find a dose that doesn’t necessarily solely maximize tumor reduction, but that strikes a perfect balance between maximum tumor reduction and low toxicity. This technique, he adds, has various medical and clinical trial applications, where actions for treating patients must be regulated to prevent harmful side effects.

Optimal regimens

The researchers trained the model on 50 simulated patients, randomly selected from a large database of glioblastoma patients who had previously undergone traditional treatments. For each patient, the model conducted about 20,000 trial-and-error test runs. Once training was complete, the model learned parameters for optimal regimens. When given new patients, the model used those parameters to formulate new regimens based on various constraints the researchers provided.

The researchers then tested the model on 50 new simulated patients and compared the results to those of a conventional regimen using both TMZ and PVC. When given no dosage penalty, the model designed nearly identical regimens to human experts. Given small and large dosing penalties, however, it substantially cut the doses’ frequency and potency, while reducing tumor sizes.

The researchers also designed the model to treat each patient individually, as well as in a single cohort, and achieved similar results (medical data for each patient was available to the researchers). Traditionally, a same dosing regimen is applied to groups of patients, but differences in tumor size, medical histories, genetic profiles, and biomarkers can all change how a patient is treated. These variables are not considered during traditional clinical trial designs and other treatments, often leading to poor responses to therapy in large populations, Shah says.

“We said [to the model], ‘Do you have to administer the same dose for all the patients? And it said, ‘No. I can give a quarter dose to this person, half to this person, and maybe we skip a dose for this person.’ That was the most exciting part of this work, where we are able to generate precision medicine-based treatments by conducting one-person trials using unorthodox machine-learning architectures,” Shah says.

Basketball players need lots of practice before they master the dribble, and it turns out that’s true for computer-animated players as well. By using deep reinforcement learning, players in basketball video games can glean insights from motion-capture data to sharpen their dribbling skills.

AI basketball dribble. Image credit: Carnegie Mellon University

Researchers at Carnegie Mellon University and DeepMotion Inc., a California company that develops smart avatars, have for the first time developed a physics-based, real-time method for controlling animated characters that can learn dribbling skills from experience. In this case, the system learns from motion capture of the movements performed by people dribbling basketballs.

This trial-and-error learning process is time consuming, requiring millions of trials, but the results are arm movements that are closely coordinated with physically plausible ball movement. Players learn to dribble between their legs, dribble behind their backs and do crossover moves, as well as how to transition from one skill to another.

“Once the skills are learned, new motions can be simulated much faster than real-time,” said Jessica Hodgins, Carnegie Mellon professor of computer science and robotics.

Hodgins and Libin Liu, chief scientist at DeepMotion, will present the method at SIGGRAPH 2018, the Conference on Computer Graphics and Interactive Techniques, Aug. 12–18, in Vancouver.

“This research opens the door to simulating sports with skilled virtual avatars,” said Liu, the report’s first author. “The technology can be applied beyond sport simulation to create more interactive characters for gaming, animation, motion analysis, and in the future, robotics.”

Motion capture data already add realism to state-of-the-art video games. But these games also include disconcerting artifacts, Liu noted, such as balls that follow impossible trajectories or that seem to stick to a player’s hand.

A physics-based method has the potential to create more realistic games, but getting the subtle details right is difficult. That’s especially so for dribbling a basketball, because player contact with the ball is brief and finger position is critical. Some details, such as the way a ball may continue spinning briefly when it makes light contact with the player’s hands, are tough to reproduce. And once the ball is released, the player has to anticipate when and where the ball will return.

Liu and Hodgins opted to use deep reinforcement learning to enable the model to pick up these important details. Artificial intelligence programs have used this form of deep learning to figure out a variety of video games, and the AlphaGo program famously employed it to master the board game Go.

The motion capture data used as input was of people doing things such as rotating the ball around the waist, dribbling while running, and dribbling in place both with the right hand and while switching hands. This capture data did not include the ball movement, which Liu explained is difficult to record accurately. Instead, they used trajectory optimization to calculate the ball’s most likely paths for a given hand motion.

The program learned the skills in two stages — first it mastered locomotion and then it learned how to control the arms and hands and, through them, the motion of the ball. This decoupled approach is sufficient for actions such as dribbling or perhaps juggling, where the interaction between the character and the object doesn’t have an effect on the character’s balance. Further work is required to address sports, such as soccer, where balance is tightly coupled with game maneuvers, Liu said.

Learn more about this research on the Carnegie Mellon Graphics website and for a step by step article about the AI technology visit DeepMotion’s Blog.

Microsoft volunteers and employees with students from underserved communities for the Microsoft pop-up Digital Skills Hub activity day.

KUCHING: In conjunction with the International Youth Day, Microsoft Malaysia hosted underserved youth from various communities at its office with an aim to give them opportunities to learn about careers in technology, connect with Microsoft employees, and participate in hands-on technology workshops.

Known as the Microsoft pop-up Digital Skills Hub, the event saw 120 students from low income (B40) communities between the ages of nine to 17 years coming together to learn digital skills, as well as the basics of coding and computational thinking through the Minecraft Hour of Code session.

Commemorating the event, Dr Shanmugasiva, director, MySkills Foundation, said, “It brings us great joy to collaborate with Microsoft, to co-create opportunities for young people, and bridge the opportunity divide, with an initiative which is aligned to our vision to enhance the lives of Malaysian youth.

“This initiative is a good platform for children from underserved communities to learn the digital skills required to prepare them for the future workforce. We do hope that our efforts can continue to align in the future, to enable much needed betterment to the livelihoods of youth from underserved communities across Malaysia.”

The event is in line with the United Nations’ goal to create ‘Safe Spaces for Youth’ as well as the Sustainable Development Goals aimed at skills development, wellbeing, and gender equality, including essential psychological and physical development of young people.

According to UNICEF’s Report titled ‘Children Without’, almost seven per cent of children in Malaysia live in absolute poverty.

Initiatives like this pop-up Digital Skills hub provide safe spaces for urban poor youth, and help develop their skills for the future, allowing them to translate technology into economic opportunity.

Through this effort, Microsoft intends to continue partnering with interested parties on the creation of physical and digital safe spaces. The objective is to not only build essential cognitive and social skills but also help develop technological knowledge among the nation’s youths and equip them with skills to safely navigate digital spaces.

The key area of focus will be bridging the opportunity divide, by enabling exposure to digital skills.

The initiative was for-youth, by-youth and saw the attendees participating in a variety of fun and educational activities organized by Microsoft interns, with support from Microsoft employees and non-profit partners.

Besides the Hour of Code session, they also engaged in a Digital Safety session and heard about the journey of various speakers towards breaking the cycle of poverty, by leveraging the transformational power of education.

These sharing sessions provided insight into the positive impact of education, whilst raising awareness on the impact that Science, Technology, Engineering and Mathematics (STEM) can have in students’ lives, as it increases their future-readiness, and enables them to participate in the digital economy.

Among the speakers was Imagine Cup1 2018 Regional winner and Global Finalist, Yap Xien Yin, who spoke about his journey working with Team Pine, to develop a non-destructive sensing device that enables farmers to test the sweetness of pineapples non-invasively.

Marking the occasion, Dr. Jasmine Begum, director, Legal, Government and Corporate Affairs for Microsoft Malaysia and Emerging Markets said, “We live in a time where we can no longer merely be technology users.

“As we head into a digital future, it is crucial that our youth are equipped with adequate knowledge, not just on manoeuvring through cyberspace, but also on being able to create and navigate digital spaces in a safe manner.

“As we transform towards becoming a generation of technology creators, at Microsoft, we believe in the democratisation of technology to empower every person and every organisation on the planet to achieve more.

The miniaturization of video cameras has led to an explosion in their use, including their incorporation into a range of portable devices such as headcams, used in scenarios ranging from sporting events to armed combat. To analyze tasks performed in view of such devices and provide real-time guidance to individuals using them, it would be helpful to characterize where the user is actually focusing within footage at each moment in time, but the tools available to predict this are still limited.

In a new study reported at the 15th European Conference on Computer Vision (ECCV 2018), researchers at The University of Tokyo have developed a computational tool that can learn from footage taken using a headcam, in this case of various tasks performed in the kitchen, and then accurately predict where the user’s focus will next be targeted. This new tool could be useful to enable video-linked technologies to predict what actions the user is currently performing, and provide appropriate guidance regarding the next step.

Existing programs for predicting where the human gaze is likely to fall within a frame of video footage have generally been based on the concept of “visual saliency,” which uses distinctions of features such as color, intensity, and contrast within the image to predict where a person is likely to be looking. However, in footage of human subjects performing complex tasks, this visual-saliency approach is inadequate, as the individual is likely to shift their attention from one object to another in a sequential, and often predictable, manner.

To take advantage of this predictability, in this study the team used a novel approach combining visual saliency with “gaze prediction,” which involves an artificial intelligence learning such sequences of actions from existing footage and then applying the obtained knowledge to predict the direction of the user’s gaze in new footage.

“Our new approach involves the construction of first a ‘saliency map’ for each frame of footage, then an ‘attention map’ based on where the user was previously looking and on motion of the user’s head, and finally the combination of both of these into a ‘gaze map,’” Yoichi Sato says. “Our results showed that this new tool outperformed earlier alternatives in terms of predicting where the gaze of the headcam user was actually directed.”

Although the team’s results were obtained for footage of chores in a kitchen, such as boiling water on a stove, they could be extended to situations such as tasks performed in offices or factories. In fact, according to lead author Yifei Huang, “Tools for evaluating so-called egocentric videos of this kind could even be applied in a medical context, such as assessing where a surgeon is focusing and offering guidance on the most appropriate steps to be taken next in an operation.”

The article “Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition” is published in the proceedings of European Conference on Computer Vision (ECCV 2018) and as an arXiv paper at arxiv.org/abs/1803.09125 .

]]>How a computer learns to dribble: Practice, practice, practice | Roboticshttps://www.prosyscom.tech/robotics/how-a-computer-learns-to-dribble-practice-practice-practice-robotics/
Wed, 22 Aug 2018 00:35:10 +0000https://www.prosyscom.tech/robotics/how-a-computer-learns-to-dribble-practice-practice-practice-robotics/Basketball players need lots of practice before they master the dribble, and it turns out that’s true for computer-animated players as well. By using deep reinforcement learning, players in video basketball games can glean insights from motion capture data to sharpen their dribbling skills.

Researchers at Carnegie Mellon University and DeepMotion Inc., a California company that develops smart avatars, have for the first time developed a physics-based, real-time method for controlling animated characters that can learn dribbling skills from experience. In this case, the system learns from motion capture of the movements performed by people dribbling basketballs.

This trial-and-error learning process is time consuming, requiring millions of trials, but the results are arm movements that are closely coordinated with physically plausible ball movement. Players learn to dribble between their legs, dribble behind their backs and do crossover moves, as well as how to transition from one skill to another.

“Once the skills are learned, new motions can be simulated much faster than real-time,” said Jessica Hodgins, Carnegie Mellon professor of computer science and robotics.

Hodgins and Libin Liu, chief scientist at DeepMotion, will present the method at SIGGRAPH 2018, the Conference on Computer Graphics and Interactive Techniques, Aug. 12-18, in Vancouver.

“This research opens the door to simulating sports with skilled virtual avatars,” said Liu, the report’s first author. “The technology can be applied beyond sport simulation to create more interactive characters for gaming, animation, motion analysis, and in the future, robotics.”

Motion capture data already add realism to state-of-the-art video games. But these games also include disconcerting artifacts, Liu noted, such as balls that follow impossible trajectories or that seem to stick to a player’s hand.

A physics-based method has the potential to create more realistic games, but getting the subtle details right is difficult. That’s especially so for dribbling a basketball because player contact with the ball is brief and finger position is critical. Some details, such as the way a ball may continue spinning briefly when it makes light contact with the player’s hands, are tough to reproduce. And once the ball is released, the player has to anticipate when and where the ball will return.

Liu and Hodgins opted to use deep reinforcement learning to enable the model to pick up these important details. Artificial intelligence programs have used this form of deep learning to figure out a variety of video games and the AlphaGo program famously employed it to master the board game Go.

The motion capture data used as input was of people doing things such as rotating the ball around the waist, dribbling while running and dribbling in place both with the right hand and while switching hands. This capture data did not include the ball movement, which Liu explained is difficult to record accurately. Instead, they used trajectory optimization to calculate the ball’s most likely paths for a given hand motion.

The program learned the skills in two stages — first it mastered locomotion and then learned how to control the arms and hands and, through them, the motion of the ball. This decoupled approach is sufficient for actions such as dribbling or perhaps juggling, where the interaction between the character and the object doesn’t have an effect on the character’s balance. Further work is required to address sports, such as soccer, where balance is tightly coupled with game maneuvers, Liu said.

Story Source:

Materials provided by Carnegie Mellon University. Note: Content may be edited for style and length.

MIT researchers are employing novel machine-learning techniques to improve the quality of life for patients by reducing toxic chemotherapy and radiotherapy dosing for glioblastoma, the most aggressive form of brain cancer.

Glioblastoma is a malignant tumor that appears in the brain or spinal cord, and prognosis for adults is no more than five years. Patients must endure a combination of radiation therapy and multiple drugs taken every month. Medical professionals generally administer maximum safe drug doses to shrink the tumor as much as possible. But these strong pharmaceuticals still cause debilitating side effects in patients.

In a paper being presented next week at the 2018 Machine Learning for Healthcare conference at Stanford University, MIT Media Lab researchers detail a model that could make dosing regimens less toxic but still effective. Powered by a “self-learning” machine-learning technique, the model looks at treatment regimens currently in use, and iteratively adjusts the doses. Eventually, it finds an optimal treatment plan, with the lowest possible potency and frequency of doses that should still reduce tumor sizes to a degree comparable to that of traditional regimens.

In simulated trials of 50 patients, the machine-learning model designed treatment cycles that reduced the potency to a quarter or half of nearly all the doses while maintaining the same tumor-shrinking potential. Many times, it skipped doses altogether, scheduling administrations only twice a year instead of monthly.

“We kept the goal, where we have to help patients by reducing tumor sizes but, at the same time, we want to make sure the quality of life — the dosing toxicity — doesn’t lead to overwhelming sickness and harmful side effects,” says Pratik Shah, a principal investigator at the Media Lab who supervised this research.

The paper’s first author is Media Lab researcher Gregory Yauney.

Rewarding good choices

The researchers’ model uses a technique called reinforced learning (RL), a method inspired by behavioral psychology, in which a model learns to favor certain behavior that leads to a desired outcome.

The technique comprises artificially intelligent “agents” that complete “actions” in an unpredictable, complex environment to reach a desired “outcome.” Whenever it completes an action, the agent receives a “reward” or “penalty,” depending on whether the action works toward the outcome. Then, the agent adjusts its actions accordingly to achieve that outcome.

Rewards and penalties are basically positive and negative numbers, say +1 or -1. Their values vary by the action taken, calculated by probability of succeeding or failing at the outcome, among other factors. The agent is essentially trying to numerically optimize all actions, based on reward and penalty values, to get to a maximum outcome score for a given task.

The approach was used to train the computer program DeepMind that in 2016 made headlines for beating one of the world’s best human players in the game “Go.” It’s also used to train driverless cars in maneuvers, such as merging into traffic or parking, where the vehicle will practice over and over, adjusting its course, until it gets it right.

The researchers adapted an RL model for glioblastoma treatments that use a combination of the drugs temozolomide (TMZ) and procarbazine, lomustine, and vincristine (PVC), administered over weeks or months.

The model’s agent combs through traditionally administered regimens. These regimens are based on protocols that have been used clinically for decades and are based on animal testing and various clinical trials. Oncologists use these established protocols to predict how much doses to give patients based on weight.

As the model explores the regimen, at each planned dosing interval — say, once a month — it decides on one of several actions. It can, first, either initiate or withhold a dose. If it does administer, it then decides if the entire dose, or only a portion, is necessary. At each action, it pings another clinical model — often used to predict a tumor’s change in size in response to treatments — to see if the action shrinks the mean tumor diameter. If it does, the model receives a reward.

However, the researchers also had to make sure the model doesn’t just dish out a maximum number and potency of doses. Whenever the model chooses to administer all full doses, therefore, it gets penalized, so instead chooses fewer, smaller doses. “If all we want to do is reduce the mean tumor diameter, and let it take whatever actions it wants, it will administer drugs irresponsibly,” Shah says. “Instead, we said, ‘We need to reduce the harmful actions it takes to get to that outcome.’”

This represents an “unorthodox RL model, described in the paper for the first time,” Shah says, that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction). Traditional RL models work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome. On the other hand, the researchers’ model, at each action, has flexibility to find a dose that doesn’t necessarily solely maximize tumor reduction, but that strikes a perfect balance between maximum tumor reduction and low toxicity. This technique, he adds, has various medical and clinical trial applications, where actions for treating patients must be regulated to prevent harmful side effects.

Optimal regimens

The researchers trained the model on 50 simulated patients, randomly selected from a large database of glioblastoma patients who had previously undergone traditional treatments. For each patient, the model conducted about 20,000 trial-and-error test runs. Once training was complete, the model learned parameters for optimal regimens. When given new patients, the model used those parameters to formulate new regimens based on various constraints the researchers provided.

The researchers then tested the model on 50 new simulated patients and compared the results to those of a conventional regimen using both TMZ and PVC. When given no dosage penalty, the model designed nearly identical regimens to human experts. Given small and large dosing penalties, however, it substantially cut the doses’ frequency and potency, while reducing tumor sizes.

The researchers also designed the model to treat each patient individually, as well as in a single cohort, and achieved similar results (medical data for each patient was available to the researchers). Traditionally, a same dosing regimen is applied to groups of patients, but differences in tumor size, medical histories, genetic profiles, and biomarkers can all change how a patient is treated. These variables are not considered during traditional clinical trial designs and other treatments, often leading to poor responses to therapy in large populations, Shah says.

“We said [to the model], ‘Do you have to administer the same dose for all the patients? And it said, ‘No. I can give a quarter dose to this person, half to this person, and maybe we skip a dose for this person.’ That was the most exciting part of this work, where we are able to generate precision medicine-based treatments by conducting one-person trials using unorthodox machine-learning architectures,” Shah says.

Backed by Tesla Inc founder Elon Musk, and Silicon Valley financier Sam Altman, researchers have found a novel way to use software to teach a human-like robotic hand new tasks, a discovery that could eventually make it more economical to train robots to do things that are easy for humans.

OpenAI, a nonprofit artificial intelligence research group founded in 2015, has said on Monday that they have taught a robotic hand to rotate a lettered, multi-colored block until a desired side of the block faces upward.

The task was quite simple. But the advance was in how the hand gained the skill: All the learning happened in a software simulation, and then transferred to the physical world with relative ease.

Representational Image. Reuters

This solves a long-standing challenge for robotic hands, which look like the fist of a robot from the 1980s “Terminator” science fiction film. The hands have been commercially available for years but are difficult for engineers to program. Engineers can either write a specific computer code for each new task, requiring a pricey new program each time, or the robots can be equipped with software that lets them “learn” through physical training.

Physical training takes months or years, and has problems of its own – for example, if a robot hand drops a workpiece, a human needs to pick it up and put it back. This, too, is expensive. Researchers have sought to chop up those years of physical training and distribute them to multiple computers for a software simulation that can do the training in hours or days, without human help.

Ken Goldberg, a University of California Berkeley robotics professor who was not involved in the OpenAI research but reviewed it, called the OpenAI work released Monday “an important result” in getting closer to that goal.

“That’s the beauty of having lots of computers crunching on this,” Goldberg said. “You don’t need any robots. You just have lots of simulation.”

A key advance in the OpenAI research was transferring the robot hand’s software learning to the real world, overcoming what OpenAI researchers call the “reality gap” between the simulation and physical tasks. Researchers injected random noise into the software simulation, making the robot hand’s virtual world messy enough that it was not befuddled by the unexpected in the real world.

“Now we’re looking for more complicated tasks to conquer,” said Lilian Weng, a member of the technical staff at OpenAI who worked on the research.

(Reuters) – Researchers backed by Tesla Inc founder Elon Musk and Silicon Valley financier Sam Altman have found a novel way to use software to teach a human-like robotic hand new tasks, a discovery that could eventually make it more economical to train robots to do things that are easy for humans.

A lettered, multi-colored block rests on a human-like robotic hand in this undated handout photo. REUTERS/OpenAI/Handout

Researchers at OpenAI, a nonprofit artificial intelligence research group founded in 2015, said on Monday they had taught a robotic hand to rotate a lettered, multi-colored block until a desired side of the block faces upward.

The task is simple. But the advance was in how the hand gained the skill: All the learning happened in a software simulation and was then transferred to the physical world with relative ease.

That solves a challenge for robotic hands, which look like the fist of a robot from the 1980s “Terminator” science fiction film. The hands have been commercially available for years but are difficult for engineers to program. Engineers can write specific computer code for each new task, which requires a pricey new program each time. Or robots can be equipped with software that lets them “learn” through physical training.

Physical training takes months or years and has problems of its own – for example, if a robot hand drops a workpiece, a human needs to pick it up and put it back. That is expensive as well. Researchers have sought to chop up those years of physical training and distribute them to multiple computers for a software simulation that can do the training in hours or days, without human help.

Ken Goldberg, a University of California Berkeley robotics professor who was not involved in the OpenAI research but reviewed it, called the OpenAI work released Monday “an important result” in getting closer to that goal.

“That’s the beauty of having lots of computers crunching on this,” Goldberg said. “You don’t need any robots. You just have lots of simulation.”

A key advance in the OpenAI research was transferring the robot hand’s software learning to the real world, overcoming what OpenAI researchers call the “reality gap” between the simulation and physical tasks. Researchers injected random noise into the software simulation, making the robot hand’s virtual world messy enough that it was not befuddled by the unexpected in the real world.

“Now we’re looking for more complicated tasks to conquer,” said Lilian Weng, a member of the technical staff at OpenAI who worked on the research.