Getting Started with Pyquaticus for MCTF

This page gives an overview of how to train agents using deep reinforcement learning within the Pyquaticus framework for the Maritime Capture-the-Flag (MCTF) competition. For evaluation of trained agents, refer to the Submit Your Entry page.

Training Agents to Play MCTF

A complete sample for training three agents as a coordinated team is included inside Pyquaticus:

This script uses RLlib. If you're new to RLlib, view the documentation here.

  • Ensure your virtual environment is activated.
  • Run: python train_3v3.py
  • Models are saved under: ray_tests/<checkpoint>/policies/<policy-name>
  • Save frequency is controlled in: competition_train_example.py : line 112

Policy Mapping to Agent IDs

Below is an excerpt from rl_test/train_3v3.py showing how policy names are mapped to individual agents. RLlib uses this mapping so each game agent receives the correct learning or opponent policy.

Training Algorithm: Rollout Workers & GPUs

The following PPO configuration (from train_3v3.py) determines compute resources and associates policies with agents during training.

  • Modify line 3 to adjust for your CPU/GPU resources.
  • Modify line 7 to change the policy names being trained.

Reward Function Design

Reward shaping is crucial in multi-agent RL-based training. Pyquaticus includes several reward function examples in:

Below is an example of a sparse reward function that uses both state and prev_state to determine transitions and assign rewards: