Assignment #2 (DRL with Reference Trajectory)[Last updated: 16:47 04/22]

**[IMPORTANT] [09:15 05/13] The submission deadline for Homework 2 was extended to May 15th during the May 8th class.*

[IMPORTANT] [15:26 04/19] For 2-1, when comparing `ref_pos[1] (=obs[11])` and `obs[1]`, make sure to compare `ref_pos[1] + 1.25` with `obs[1]` due to the reference height offset in the model.

[IMPORTANT] [15:26 04/19] Skeleton code updated (Fix bug in update_ref_pose.)

[IMPORTANT] [16:34 04/16] Skeleton code updated (Adjusted the range of thigh joints in ‎custom_walker2d_ref.xml)

UPDATES & NOTICES
- [09:26 04/22] The evaluation criteria have been updated (the criteria for 2-2 (3D Humanoid) have been loosened).
- [18:52 04/19] Skeleton code updated (Fix bug in update_ref_pose.)
- [16:47 04/19] The evaluation criteria have been updated (clarified).
- [15:54 04/19] The evaluation criteria have been updated (criteria for root position).
- [15:26 04/19] [Note] For 2-1, when comparing ref_pos[1] (=obs[11]) and obs[1], make sure to compare ref_pos[1] + 1.25 with obs[1] due to the reference height offset in the model.
- [15:26 04/19] Skeleton code updated (Fix bug in update_ref_pose.)
- [21:37 04/17] Please note that you are free to modify the RL and model parameters (e.g., learning rate, batch size, network architecture, etc.) as needed.
- [20:29 04/16] The submission method has been updated.
- [20:13 04/16] [Tip] Even if the reward graph shows a plateau for a while, it may start increasing again after some time. However, if the plateau continues for too long, we recommend tuning your reward function.
- [IMPORTANT] [16:34 04/16] Skeleton code updated (Adjusted the range of thigh joints in ‎custom_walker2d_ref.xml)
- [14:00 04/15] #2 out

In this homework, you will train 2D and 3D characters to walk or run in a simulated environment using reference trajectories. More specifically, your goal is to teach the character to follow a reference pose at each moment of walking or running, ultimately learning the entire motion. The objective of this assignment is to learn how reference trajectories can help characters perform tasks in a natural way. Additionally, in the Extra section, you will attempt to perform tasks while using reference motions to verify whether reference motions can serve as good guides for task learning.

Skeleton Code: https://github.com/snumrl/2025_SNU_HumanMotion_HW2.git

The skeleton code is based on the Walker2D & Humanoid3D environment from Gymnasium, and the default reinforcement learning algorithm provided is from Stable-Baselines3. Through this assignment, you will gain hands-on experience in controlling a simulated character and become familiar with applying reinforcement learning in physical simulation.

2-1. Making a 2D Character Walk & Run (40%)

2D Reference Motion (Walking)

2D Reference Motion (Running)

(a) (25 Points) Train a character to mimic the walking motion obtained from "assets/motion/walk.bvh". (Make the simulated character follow the posture of the red character in the simulation.) To make the character follow this motion, design your own Imitation Reward. [Hint: Create a penalty based on the posture difference between the simulated character and the red character]
(b) (15 Points) Similarly, train a character to follow the running motion obtained from "assets/motion/run.bvh". Use motion flags (--motion run.bvh)
You are allowed to modify Observation, Termination Condition, etc., as long as you don't directly change the physical values (such as velocity, position) of the simulated character.
You may implement and use a PD actuator separately. This is not mandatory, but using a PD actuator with the reference pose as the initial target may accelerate the learning. However, this might require additional parameter tuning, such as scaling actions.
** For convenience and readability, the code for this assignment creates a separate reference skeleton. You are allowed to modify the qpos, qvel values (probably the latter half) of this reference skeleton. (This is not the formal method.) However, when evaluating, we will compare the simulated skeleton with the reference skeleton that follows the Reference Motion. **
Evaluation:
- Pass if the character visually overlaps well with the Reference Motion for 10 seconds (500 environment steps)
- The submission passes as long as the motion visually follows the Reference. However, if the motion appears noticeably different, the submission fails—especially if, for more than 1 second (50 environment steps), any joint exceeds a deviation of 20 degrees compared to the Reference at each frame during that period. ~~any joint deviates by more than 20 degrees consecutively for more than 1 second (50 environment steps) compared to the Reference~~. Additionally, the root position must remain within 0.5 (m) for walking and 2 (m) for running (excluding the first second) compared to the Reference.
  - The evaluation will begin from the initial pose implemented in the skeleton code.

2- 2. Making a 3D Character Walk & Run (40%)

3D Reference Motion (Walking)

3D Reference Motion (Running)

*[IMPORTANT] [09:15 05/13] The submission deadline for Homework 2 was extended to May 15th during the May 8th class.

[IMPORTANT] [15:26 04/19] For 2-1, when comparing ref_pos[1] (=obs[11]) and obs[1], make sure to compare ref_pos[1] + 1.25 with obs[1] due to the reference height offset in the model.

[IMPORTANT] [15:26 04/19] Skeleton code updated (Fix bug in update_ref_pose.)

2-1. Making a 2D Character Walk & Run (40%)

2- 2. Making a 3D Character Walk & Run (40%)

**[IMPORTANT] [09:15 05/13] The submission deadline for Homework 2 was extended to May 15th during the May 8th class.*

[IMPORTANT] [15:26 04/19] For 2-1, when comparing `ref_pos[1] (=obs[11])` and `obs[1]`, make sure to compare `ref_pos[1] + 1.25` with `obs[1]` due to the reference height offset in the model.