'Pokémon RL Edition,' a machine learning game that conquers 'Pokémon,' achieves clear with 1/60,000th of the parameter scale of DeepSeek-V3


by

Adam Purves (S3ISOR)

The ' Pokémon RL Edition ' project, which aims to use AI to complete 'Pokémon Red,' released in 1996, has reported that it has succeeded in clearing 'Pokémon Red' using an AI agent with parameters approximately 60,000 times smaller than DeepSeek-V3 .

Learning Pokémon With Reinforcement Learning | Pokémon RL
https://drubinstein.github.io/pokerl/

GitHub - drubinstein/pokemonred_puffer
https://github.com/drubinstein/pokemonred_puffer

Reinforcement learning is a method of learning actions to maximize a set reward as an objective, and is commonly used in AI for shogi and go. Rather than relying on a static dataset, it learns from experience collected through trial and error in a dynamic environment.

Playing JRPGs , including Pokemon, requires complex reasoning and decision-making, and requires long periods of gameplay. The project team said, 'We believe that clearing JRPGs with reinforcement learning poses extremely difficult challenges for reinforcement learning, and we hope that JRPGs will be an excellent benchmark for improving AI.'

According to the project team, Pokemon is relatively easy to program among JRPGs, so they are using tools from the Pokemon Reverse Engineering Team (PRET) and PyBoy , a Python Game Boy emulation program, to develop a reinforcement learning agent to complete the game.


by

case97

The reason for adopting reinforcement learning is that 'supervised learning requires a rich dataset that is properly labeled, as well as a large model and budget.' The project team said that reinforcement learning was the most attractive when considering various approaches. Reinforcement learning was also chosen because the reward function for determining what the AI should optimize when learning is unclear.

With reinforcement learning, the data you've learned from is almost always up to date, so you don't have to build complex data collection systems, manage large datasets, or worry about whether your datasets are stale. Once you've built a system that can generate new data on the fly, you can start training.



In Pokémon, there is a risk of encountering situations that make it impossible to clear, such as not being able to catch the Pokémon needed to clear the game, not being able to have the Pokémon needed to clear the game in your possession, collecting too many items and not having enough space for key items, or having only Pokémon with moves that do not deal damage.

Therefore, in this project, the agent is scripted to perform certain actions when it encounters a certain environment. The project team commented, 'We wanted to clear the game without using scripts, but in some situations human intuition is required that cannot be learned directly from the game.'

According to the project team, the Safari Zone was particularly difficult. In the Safari Zone, you must obtain the 'Golden Nile' and 'Hiden Machine 03 Surf', which are essential items for clearing the zone. If you pay the entrance fee, you can try again and again, but it was extremely difficult to find the correct route within the limited number of steps. Therefore, the project team reports that after adding a script and adopting a method to get rewards proportional to the number of steps remaining in the Safari Zone, they were finally able to clear it after several thousand attempts.



Although the project team was eventually able to complete an agent that could clear 'Pokémon,' at the time of writing the article, the system was not yet stable enough to prove that it could clear the game with all scripts disabled. Although the system was confirmed to be able to clear the game with each script individually removed, there were some bugs, so there are still issues to be resolved.

The project team commented, 'We believe that JRPGs are special and can be a stepping stone to more powerful AI. However, there is a lot to learn and many scripts to remove.'

in Software,   Game, Posted by log1i_yk