Three-dimensional TAD game using deep reinforcement learning

Background

This is the extended work of our paper in IROS 2025. In this work, we investigate a three-dimensional (3D) TAD game using hierarchical learning methods, with different entity positions, target maneuvers, and defender policies. The attacker policy is trained via DRL. To make the task difficult for attacker, the initial position is advantageous to defender and target.

Methods

  1. We designed the action space, state space, and reward function to the scenario. The action space is nonlinearly mapped to yield smoother control transients, while the reward aggregates two competing objectives: hitting target and evading defender.

  2. A hierarchical reinforcement-learning paradigm is adopted: the attacker selects between two high-level options—evading the defender or hitting the target. All model parameters are updated via proximal policy optimization (PPO).

Visualization

The following figure demonstrate the training reward using different algorithms.

fig1
Training reward

The following figures demonstrate the rendered trajectories for different scenarios.

fig1
Trajectory 1
fig2
Trajectory 2
The following videos describe the 3D TAD game process under different target maneuvers.
fig1
Scenario 1: Circular target maneuver
fig2
Scenario 2: Roller target maneuver