Assignment #3 (DRL with Musculotendon)

UPDATES & NOTICES
- [22:38 05/21] Fix bug: Fixed an error that occurs when inferencing models (on MAC OS). Changed variable name from 'model' to 'trained_model' in render.py.
- [14:45 05/12] When running the trained network with render.py , you can predict either deterministically or stochastically—use whichever works best and just specify it in your report.
  - action, _ = model.predict(obs, deterministic=True or False)
- [13:00 05/08] #3 out

In this homework, you will train a controller to control muscles in a musculoskeletal simulation using reinforcement learning. The task is to learn forward locomotion, and you will compare performance and motion quality with and without a reference motion (and imitation reward). You will also analyze muscle activations during walking to understand their roles in movement. Additionally, you will simulate muscle disorder and observe how they affect gait.

Skeleton Code: https://github.com/snumrl/2025_SNU_HumanMotion_HW3.git

The skeleton code is based on the MyoLeg environment from MyoSuite, which is a musculoskeletal simulation toolkit built on top of MuJoCo. In this environment, the action itself is the excitation signal of muscles, which leads to muscle activation. Your goal is to train a reinforcement learning controller that outputs appropriate excitation signals to achieve specific tasks. Through this assignment, you will gain hands-on experience in controlling a simulated musculoskeletal character and applying reinforcement learning to physical simulations.

2-1. Making a Character Move Forward without Reference Motion (20%)

Design a reward that encourages the character to move forward. Target speed is flexible (e.g., around 1.0–1.5).
Do not use a reference motion.
It's okay if the task is not fully successful. The focus is on designing a reward and training for at least 20 million steps.

2-2. Making a Character Stand Still (10%)

Train the character to remain still.
First, find a standing posture (e.g., “attention” pose), then design the reward so the character maintains that posture.

2-3. Making a Character Move Forward with Reference Motion (30%)

Reference Motion

Use the provided reference motion and functions to design an imitation reward.
The initial pose should match the reference motion (you may customize velocity).
Only 4 successful cycles are needed (about 4–5 seconds of motion).
There is no strict success criterion—just observe the result.
Compare the results with those from 2-1.

2-4. Additional Experiments and Analysis

Analysis of Muscle Activation
- Select 2~3 muscles and plot their activation levels during a walking cycle (normalized to 1 cycle).
- Write a brief explanation of why the patterns appear as they do.