Code for RRL (https://sites.google.com/view/abstractions4rl)

facebookresearch, updated 🕥 2022-01-21 04:20:08

Quick Links

Wesbite | Paper | Video

RRL: Resnet as representation for Reinforcement Learning

Resnet as representation for Reinforcement Learning (RRL) is a simple yet effective approach for training behaviors directly from visual inputs. We demonstrate that features learned by standard image classification models are general towards different task, robust to visual distractors, and when used in conjunction with standard Imitation Learning or Reinforcement Learning pipelines can efficiently acquire behaviors directly from proprioceptive inputs.

Final Behaviors acquired using RRL on ADROIT benchmark tasks (left to right) (a) Opening a door (b) Hammering a nail (c) Pen-twirling (d)) Object relocation All Tasks

Setup

RRL codebase can be installed by cloning this repository. Note that it uses git submodules to resolve dependencies. Please follow the steps as below to install correctly.

  1. Clone this repository along with the submodules git clone --recursive https://github.com/facebookresearch/RRL.git
  2. Install the package using conda. The dependencies (apart from mujoco_py) are listed in env.yml ``` conda env create -f env.yml

    conda activate rrl ``` 3. The environment require MuJoCo as a dependency. You may need to obtain a license and follow the setup instructions for mujoco_py. Setting up mujoco_py with GPU support is highly recommended.

  3. Install mj_envs and mjrl repositories. cd RRL pip install -e mjrl/. pip install -e mj_envs/. pip install -e .

  4. Additionally, it requires the demonstrations published by hand_dapg

Running Instructions

  1. First step is to convert the observations of demonstrations provided by hand_dapg to the encoder feature space. An example script is provided here. Note the script saves the demonstrations in a .pickle format inside the rrl/demonstrations directory.

    For the mj_envs tasks : python convertDemos.py --env_name hammer-v0 --encoder_type resnet34 -c top -d <path-to-the-demo-file> python convertDemos.py --env_name door-v0 --encoder_type resnet34 -c top -d <path-to-the-demo-file> python convertDemos.py --env_name pen-v0 --encoder_type resnet34 -c vil_camera -d <path-to-the-demo-file> python convertDemos.py --env_name relocate-v0 --encoder_type resnet34 -c cam1 -c cam2 -c cam3 -d <path-to-the-demo-file> 2. Launching RRL experiments using DAPG.

    An example launching script is provided job_script.py in the examples/ directory and the configs used are stored in the examples/config/ directory. Note : Hydra configs are used. python job_script.py demo_file=<path-to-new-demo-file> --config-name hammer_dapg python job_script.py demo_file=<path-to-new-demo-file> --config-name door_dapg python job_script.py demo_file=<path-to-new-demo-file> --config-name pen_dapg python job_script.py demo_file=<path-to-new-demo-file> --config-name relocate_dapg

Issues

GradCAM Visualization Code

opened on 2023-03-01 04:44:05 by meghbhalerao

Hi - I was wondering where can I find the code for the grad cam visualizations? is it somewhere in the current repository? Thanks!

Relationship between number of samples and number of iterations

opened on 2023-02-02 05:35:34 by meghbhalerao

Hi Rutav, The plots that are provided in the paper - example given below plot the number of samples vs the success rate Screenshot 2023-02-01 at 21 13 50 I also see that the code logs the success rate after every iteration as mentioned here - https://github.com/ShahRutav/mjrl/blob/6cdb8b8c72279abe8d9d8b8a800f8ac396413e42/mjrl/utils/train_agent.py#L119 and according to the default configuration file here - https://github.com/facebookresearch/RRL/blob/main/examples/config/hammer_dapg.yaml#L40 - the code is being run for 200 iterations, I also see here https://github.com/facebookresearch/RRL/blob/main/examples/config/hammer_dapg.yaml#L40 that the number of trajectories is 200, so I think the horizon length, say h, has to be 100, assuming that the training is happening for 4 x 10e6 if I am not wrong - I have the following doubts - 1. Here - https://github.com/ShahRutav/mjrl/blob/6cdb8b8c72279abe8d9d8b8a800f8ac396413e42/mjrl/algos/batch_reinforce.py#L64 I see that the horizon length is 1e6 - am I looking at the right place for that or is there some other parameters that I am missing? 2. Is the eval_success being logged at each iteration according to the code and the plots which are reported in the paper simply have their axis scaled by an appropriate factor, in our case something like 2 x 10eX (X depends on the answer to the above point) - so essentially the total number of logged eval_success is 200?

Please do let me know if my understanding is right in this setting. Thanks, Megh

How to turn on visual distractions?

opened on 2021-12-16 09:32:05 by watchernyu

Hi! I'm curious on how can we turn on the visual distractions such as light position, direction, object color etc. in the Adroit environment? (as mentioned in section 7.5 in the appendix of your paper)

Thanks a lot for your help!

Meta Research
GitHub Repository