Artificial Intelligence or AI is being developed at breakneck speeds. While this is not particularly a bad thing, many are worried that we might lose control over AI, if it is not properly regulated. There are certain organisations that are keeping watch over the development of AI, like the Elon Musk founded AI research organisation, OpenAI. The company works on many subsets of AI and most of us might know about it by its achievement of besting some of the world’s best DOTA players. Now, the company has taken a step towards introducing human-level dexterity in bots by training a pair of neural networks that enable solving the Rubik’s Cube with a single robot-hand.
The robot-hand developed by the company is called Dactyl and as to how OpenAI managed to train it is quite interesting. The old-fashioned way of training a neural network is to let it practice a task for years at an accelerated pace. When we talk about learning how to beat opponents in a virtual game, this approach is feasible as the software can spend time learning everything it needs to at hastened speeds. However, in Dactyl’s case, letting it practice with a Rubik’s Cube for years was not an option and hence, the company used simulations to train it. “The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity,” says OpenAI . While Dactyl’s dexterity and sharpness when solving the cube is nowhere near humans, it is a great way to demonstrate how simulations can be implemented and to lay a foundation for general-purpose robots.
As per OpenAI, the biggest challenge they faced while training the network was creating simulation environments that are diverse and unique. The simulations needed to capture the physics of the real world, which include attributes like elasticity, dynamics and friction, along with a model for complex objects like Rubik’s Cubes or robotic hands. The company developed the Automatic Domain Randomization (ADR) technique, which is said to endlessly generates progressively more difficult environments in a simulation.