We provide here an environment for a predator/prey game. We explore two methods: a simple DQN architecture as well as a true Multi-Agent algorithm architecture using a Policy Gradient approach: Multi-Agent Deep Deterministic Policy Gradient (Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (pp. 6379-6390)).
After 1400 episodes of training.
| DDQN 2vs2 | MADDPG 2vs2 | DDQN 2v1 Magic Switch |
|---|---|---|
![]() |
![]() |
![]() |
Blue dots represent preys and orange dots are predators.
The action space is discrete.
Every agent can do one of none, left, right, top, bottom.
The state is perfectly known by all the agents.
The state is the 3D coordinates (x, y, z) for every agent.


