Ubisoft La Forge is an open research and development initiative that brings together scholars and Ubisoft experts with the driving objective to bridge the gap between academic research and videogame innovations. Experimenting with the latest technologies and techniques in videogame production, they are at the forefront of the academic world, with dedicated teams investigating uses for the latest technology, such as artificial intelligence, to make games more realistic, fun, and efficient to develop.
Deep reinforcement learning is one of those potential uses: a type of machine learning that uses AI to find the most efficient solutions to a variety of problems. To unravel some of its mysteries and find out how it helps create more realistic NPCs, help them navigate complex game worlds, and create more human-like reactions, we spoke with Joshua Romoff, a data scientist at Ubisoft La Forge and a Montreal native, who took his love of videogames and turned it into a data science PhD. Now researching the different applications of machine learning in games, he recently gave a talk at the Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2021 conference to present the breakthrough he and his team have been working on to improve NPCs’ pathfinding and navigation using machine learning.
What is deep reinforcement learning, and how does it work?
Joshua Romoff: There are a couple of terms to explain: “agent” and “action.” What we call an agent in AI is basically the main character that interacts with the world, and we use bots in that role in our research. And then for action, that is the interaction that is performed. I like to think of the player as kind of an extension of a gamepad, and every input players put through the gamepad results in an action.
Let’s focus on the reinforcement learning part: It’s the idea that you’re trying to reinforce some kind of behavior, similar to the classic Pavlov’s dogs experiment where a researcher rings a bell at the dogs’ feeding time and the dogs learn to associate the bell with a reward. You’re trying to encourage or discourage certain kinds of outcomes with rewards and penalties. We do the same with an AI agent, giving it points for doing something we like, or taking points away for something we don’t. My job is to design the tests and define when we give or take away rewards, and the goal of the AI is to get the highest score it can with the actions available.
For the deep part, that’s the way the agent perceives the world it’s in; a deep neural network, essentially. A screen is a complex thing, with potentially hundreds of thousands of pixels being displayed at once. So, how do you process that screen and all that input? A deep neural network takes the screen, processes it into something at a much smaller dimension, analyzes the data, and then inputs that information into the reinforcement-learning part, which then performs actions based on that input data. That’s what we call an end-to-end system, because everything is contained, and the data loops around between these systems, one end to the other and back again. We do that each frame, assigning points based on the actions and the resulting state of the environment, and perform many iterations to train the agent to perform the actions we want it to.
Are there any games that particularly inspired you in your study of deep reinforcement learning?
JR: For sure. I’ve always been into open-world games where you get to run around and interact with NPCs, stuff like Far Cry, for example. One thing that always stands out to me in those kinds of games is how players interact with the AIs of the NPCs, and it’s a core factor of the experience to me. You’re interacting with the world, and the NPCs are a big part of that interaction. I always liked messing with NPCs and trying to bug out the AI as a kind of challenge, seeing how I can manipulate them. So, if I’m in a battle with an enemy and decide to climb up a nearby mountain, then I just watch the enemy crash into the mountain in front of me because it can’t climb, or see what reaction it has to different events. That’s always been something that’s driven me in my work, imagining how we can improve that and train an AI to behave more like humans do.
As an R&D scientist, what’s your day-to-day like?
JR: Day-to-day, I could be running experiments, getting my hands dirty and training what we call an “AI agent” to perform a certain task in a game. Once that experiment is set up, it’s a lot of observation; watching plots and graphs, and tweaking things to refine the results. Another big part of my role is working with master’s and PhD students who are pursuing their degrees. All our students are paid, but I work with them and their professors to define projects for them, and we’ll usually have a bunch of student projects going at the same time, which helps the students, but also helps push what we’re doing forward. I mean, I can’t code up everything by myself, right? Once we have a working prototype, we put the tech inside a sandbox environment, which is basically a simplified version of an actual game engine, and we can see the results of the work we’ve been doing. If a project works out, it’s a chance for the students’ work to appear in the games we develop and for them to get some experience of what it’s like to work on games, so we always try to make sure that the projects we’re working on result in something that game teams can use in their productions.
In your AIIDE talk, you went over how you did some tests in games like Hyper Scape to create more “player-like” bots. Can you talk us through it?
JR: We did some testing in Hyper Scape – though nothing on live servers, the game just happened to present an interesting sandbox for questions we wanted answers to. The thing that is really cool about Hyper Scape is that the 3D environment is quite complex to navigate and has a lot of verticality to it. Players have a lot of tools available to them as well, things like jump pads that propel you straight up in the air, and double jumps, and you can use those to navigate to the tops of buildings. You can combine those things, too, so it’s really interesting for a game developer or tester to know that the map they have created allows players to navigate the whole thing.
Traditionally, games use what’s called a navmesh, kind of a 2D map of all the traversable areas in a world, and that data allows bots to define where they go and how they get there. But it was really hard to do tests with that method, because when you have all these crazy actions like jump pads and double jumps, plus vertical floors which aren’t always connected by stairs or ramps, the combinations make the possibilities explode in number. Using deep RL made sense, because we could throw the agent in a training loop and it would learn how to take actions to get from point A to B by itself, without using a navmesh. So the primary use case was essentially us teaching an agent these movements, and using that to test the map and make sure everything is accessible.
We understand you saw some interesting results in some of your tests with other games. Can you tell us about those?
JR: One example was a bot we trained in For Honor, actually. We wanted the agent to defend more, so we gave it a bonus reward for doing that. What ended up happening was that the agent decided to never end the fight, and kept defending forever and ever. It’s really funny, because one of the main challenges of training agents with this process is that, whatever setup you give it and action you’re trying to incentivize, it’s in theory going to learn how to do that as best as it possibly can. If you give it a reward for staying alive or defending, it will keep doing that, because you’re rewarding it for that. You don’t necessarily want the bot to just beat every player every time, right? That wouldn’t be fun, so you want to incentivize other types of behaviors, like defending, that add some variability to its actions.
The other reason you might give these little bonus rewards is because it can speed up the training process, so it’s easy to just give it a little bonus here for defending and there for attacking – but it’s not obvious how all of these bonuses will combine, and you can end up with these really funny behaviors. Another example was in Hyper Scape, with the navigation tests. We were training the agent to get between two points as quickly as possible, but we hadn’t given it the ability to sprint yet, and it actually figured out that if it moved by jumping and doing these little camera twists, it was actually able to move a little faster than just walking. So it was really fun to watch it essentially learn to bunnyhop. Both of those examples are in my talk at AIIDE.
Are those kinds of results still valuable in the process?
JR: It depends on what the application is. If it’s to test the game, as our experiments were, these results are very useful, because you’ll see what the optimal behavior is based on the rewards you give. You might notice these things it’s learning and figure out that the behavior is actually helping the agent achieve its goal, which could point to something you weren’t aware of, allowing you to debug and know if your code is working as expected.
Have the latest generation of games consoles and things like cloud and streaming services opened up new possibilities that weren’t previously available for AI in games?
JR: One hundred percent, yes. Historically speaking, deep learning research started in the ‘80s and ‘90s, and researchers were definitely bottlenecked by the computing resources that were available. If you’re trying to run a deep-learning model on an older-generation console, it just wouldn’t be possible to do that locally, from a computational perspective – it would kill the framerate. The amount of computational power that people have in their homes has drastically increased, and then the actual hardware itself has drastically improved, so with the combination of those things, and the vast amounts of research being poured into this field, we’re at the point where we could solve these problems, and have things like bots that navigate really complicated maps in 3D worlds with all these crazy abilities. Now it can run fairly efficiently, and act much more human-like than anything we could just hardcode, and it’s not crazy to think you can have multiple agents running around in a game doing all this complex computation. It’s no longer a question of something that could happen in 10 years; the research and hardware are there, and have been building up to where we are today for a while.
What other applications could you imagine using these techniques for?
JR: The most natural application is bots, and that’s why we’re focusing on it. My group is actually called the Smart Bots Group, so we’re very big on bots. We’re working on bots used for testing games, but you could easily imagine that if you teach a bot to navigate an environment, it could then potentially be scaled up and put in front of players as an AI enemy.
Besides bots, reinforcement learning is a very general framework with a lot of applications. So I could imagine, for example, using it for server management. When you’re hosting servers for a game, it’s a problem if you have too many servers running when you don’t need them, or the reverse, when you have a lot of players and not enough servers deployed. We could theoretically train an agent to optimize sequential decision-making, so getting it to look at the number of players at certain times of day then increase or decrease the current number of servers online at a given moment based on needs.
What are your goals for the future of this technology?
JR: The goal is really to keep working on ways we can inject more realism into games, making things like NPCs and bots feel more human and solve problems that haven’t been possible until now. We also want to get this tech in the hands of game designers and create player-facing tools with it. It could become another tool in the repertoire available to developers, so giving them the option to customize these bots and do what they want with them is kind of the next big step, since all of the tests I’ve mentioned so far haven’t been on live environments, or in front of players yet. I think the more immediate step is getting this in front of game testers and using it to test all kinds of different scenarios, from performance issues to gameplay mechanisms and more.
What are some of the implications of using AI and Deep Reinforcement Learning in games?
JR: If we’re just using this tech to test games, then this stuff isn’t going in front of the player, and there’s no reason for anyone to worry about some of the more negative speculation people have around AI. Some might be concerned that this means there will be fewer people actually testing games, but that’s not actually true, because the types of tests that we’re going to be running with these bots are fundamentally different from the types of things we want real humans testing. The actual human interaction is not going anywhere, and people will still be testing the more interesting elements of games, like quests and other actual fun parts of a game.
In terms of actually putting AI bots into games, I think it’s really important for us to be transparent about what we’re doing. I think some people are concerned that they will start seeing bots in games and they won’t know, is this a human or a bot? It’s potentially quite controversial, which is why I think we need to be fully transparent and not try to trick players. I think the other reason to embrace AI in games is because it’s inherently this perfect little sandbox; it’s a great place to try out certain ideas and see what happens, but any unforeseen results stay within the game and don’t really affect life outside of it. So people don’t have to fear that my For Honor defense bot is going to take over the world or something; it just lives in a game, and is kind of funny.
You can check out Joshua’s full talk at AIIDE 2021 to see his work in action and learn more about deep reinforcement learning and AI. For all the latest news and updates from teams at Ubisoft, stay tuned to the Ubisoft News hub.