Microsoft earlier this 12 months acquired Canadian deep studying startup Maluuba to enhance its artificial intelligence chops. This week, the corporate shared a number of the progress it has made utilizing a department of AI known as reinforcement studying.
As highlighted within the video above, researchers utilized the method to get hold of the very best score doable in Ms. Pac-Man, an early ’80s arcade sport that’s notoriously tough to crack. Maluuba’s AI beat the most effective human score ever achieved by 4X.
Quite than brute-force assault Ms. Pac-Man, the Maluuba staff used what’s known as Hybrid Reward Structure. This methodology relied on 150 brokers, every of which labored in parallel to grasp particular points of the sport. Information collected by every agent was then fed to a high agent – form of like a senior supervisor – that processed all the things and made the ultimate resolution on the place to transfer the sport’s character.
Doina Precup, an affiliate professor of pc science at McGill College in Montreal, mentioned the concept of getting the brokers work on completely different items of the puzzle to obtain a widespread objective is extremely fascinating. In truth, she mentioned the method is comparable to some theories on how the human mind works and will have broad implications for educating AIs to carry out advanced duties with restricted data.