Resnet as representation for Reinforcement Learning (RRL) is a simple yet effective approach for training behaviors directly from visual inputs. We demonstrate that features learned by standard image classification models are general towards different task, robust to visual distractors, and when used in conjunction with standard Imitation Learning or Reinforcement Learning pipelines can efficiently acquire behaviors directly from proprioceptive inputs.
Final Behaviors acquired using RRL on ADROIT benchmark tasks (left to right) (a) Opening a door (b) Hammering a nail (c) Pen-twirling (d)) Object relocation
RRL codebase can be installed by cloning this repository. Note that it uses git submodules to resolve dependencies. Please follow the steps as below to install correctly.
git clone --recursive https://github.com/facebookresearch/RRL.git
Install the package using
conda. The dependencies (apart from
mujoco_py) are listed in
conda env create -f env.yml
conda activate rrl ``` 3. The environment require MuJoCo as a dependency. You may need to obtain a license and follow the setup instructions for mujoco_py. Setting up mujoco_py with GPU support is highly recommended.
pip install -e mjrl/.
pip install -e mj_envs/.
pip install -e .
First step is to convert the observations of demonstrations provided by
hand_dapg to the encoder feature space. An example script is provided here. Note the script saves the demonstrations in a
.pickle format inside the
mj_envs tasks :
python convertDemos.py --env_name hammer-v0 --encoder_type resnet34 -c top -d <path-to-the-demo-file>
python convertDemos.py --env_name door-v0 --encoder_type resnet34 -c top -d <path-to-the-demo-file>
python convertDemos.py --env_name pen-v0 --encoder_type resnet34 -c vil_camera -d <path-to-the-demo-file>
python convertDemos.py --env_name relocate-v0 --encoder_type resnet34 -c cam1 -c cam2 -c cam3 -d <path-to-the-demo-file>
RRL experiments using DAPG.
An example launching script is provided
job_script.py in the
examples/ directory and the configs used are stored in the
examples/config/ directory. Note : Hydra configs are used.
python job_script.py demo_file=<path-to-new-demo-file> --config-name hammer_dapg
python job_script.py demo_file=<path-to-new-demo-file> --config-name door_dapg
python job_script.py demo_file=<path-to-new-demo-file> --config-name pen_dapg
python job_script.py demo_file=<path-to-new-demo-file> --config-name relocate_dapg
Hi - I was wondering where can I find the code for the grad cam visualizations? is it somewhere in the current repository? Thanks!
Hi Rutav, The plots that are provided in the paper - example given below plot the number of samples vs the success rate I also see that the code logs the success rate after every iteration as mentioned here - https://github.com/ShahRutav/mjrl/blob/6cdb8b8c72279abe8d9d8b8a800f8ac396413e42/mjrl/utils/train_agent.py#L119 and according to the default configuration file here - https://github.com/facebookresearch/RRL/blob/main/examples/config/hammer_dapg.yaml#L40 - the code is being run for 200 iterations, I also see here https://github.com/facebookresearch/RRL/blob/main/examples/config/hammer_dapg.yaml#L40 that the number of trajectories is 200, so I think the horizon length, say h, has to be 100, assuming that the training is happening for 4 x 10e6 if I am not wrong - I have the following doubts - 1. Here - https://github.com/ShahRutav/mjrl/blob/6cdb8b8c72279abe8d9d8b8a800f8ac396413e42/mjrl/algos/batch_reinforce.py#L64 I see that the horizon length is 1e6 - am I looking at the right place for that or is there some other parameters that I am missing? 2. Is the eval_success being logged at each iteration according to the code and the plots which are reported in the paper simply have their axis scaled by an appropriate factor, in our case something like 2 x 10eX (X depends on the answer to the above point) - so essentially the total number of logged eval_success is 200?
Please do let me know if my understanding is right in this setting. Thanks, Megh
Hi! I'm curious on how can we turn on the visual distractions such as light position, direction, object color etc. in the Adroit environment? (as mentioned in section 7.5 in the appendix of your paper)
Thanks a lot for your help!