Getting Started with Pyquaticus for MCTF
This page gives an overview of how to train agents using deep reinforcement learning within the Pyquaticus framework for the Maritime Capture-the-Flag (MCTF) competition. For evaluation of trained agents, refer to the Submit Your Entry page.
Training Agents to Play MCTF
A complete sample for training three agents as a coordinated team is included inside Pyquaticus:
rl_test/train_3v3.py
This script uses RLlib. If you're new to RLlib, view the documentation here.
- Ensure your virtual environment is activated.
- Run: python train_3v3.py
- Models are saved under: ray_tests/<checkpoint>/policies/<policy-name>
- Save frequency is controlled in: competition_train_example.py : line 112
Policy Mapping to Agent IDs
Below is an excerpt from rl_test/train_3v3.py showing how policy names are mapped to individual agents. RLlib uses this mapping so each game agent receives the correct learning or opponent policy.
Training Algorithm: Rollout Workers & GPUs
The following PPO configuration (from train_3v3.py) determines compute resources and associates policies with agents during training.
- Modify line 3 to adjust for your CPU/GPU resources.
- Modify line 7 to change the policy names being trained.
Reward Function Design
Reward shaping is crucial in multi-agent RL-based training. Pyquaticus includes several reward function examples in:
pyquaticus/envs/utils/rewards.py
Below is an example of a sparse reward function that uses both state and prev_state to determine transitions and assign rewards: