RL CIRL Research

shaktikshri, updated 🕥 2022-12-08 06:10:17

Adaptive Systems

Project Description is available at my university page under Proactive Troubleshooting [Aug2018-Present].

I will update this readme as I get more time, the individual files however have an extensive documentation.

dqn.py

This file gives a template for constructing a Deep Q Learning network. You can specify the no. of hidden units you want but for the moment it takes only one hidden layer.

IRL/irl_finite_space.py

This file implements the Finite Space IRL as put forth in Andrew Ng and Stuart Russel in Algorithms for Inverse Reinforcement Learning. I used pulp based linear program solver but many people prefer using cvxopt package as well. For a sample reward structure the following result was obtained, IRL in Finite Space

dqn_sin_stability.py

This is the most recent code I am stuck on (among many other codes :P ). This should ideally be a DQN implementation of stabilizing a sin function, or a general function. For any given continuous values of a noisy sin output, the agent should choose a noise correction scheme which smoothly approximates the sin function, or the function in consideration. Both the noise and the correction values can be continuous real values which makes this problem non trivial.

acla_with_approxq.py

Lately i realized that the function stabilization problem I was trying to handle couldnt be done without a continuos action space consideration. As the first step towards Continuous Actor Critic Learning Algorithm (CACLA) proposed by Hasselt and Wiering in Reinforcement Learning in Continuous Action Spaces I tried the the ACLA algorithm, which as you'd expect is just a slight variation of the DQN form. The file acla_with_approxq.py is an implementation of the ACLA with value function approximation. I tried out the following configurations, 1. Actor and Critic with experience replays [This training was by far the slowest I ever saw. There was a progress of 180 episodes over 8hours on an AMD Ryzen Threadripper 2990WX 128 GB] 2. Combinations of fixing Q targets in Critic and Actor 3. Using dropouts in Actor 4. Combinations of stochastic/batch updates of critic/actor

Out of all these, updating Critic with experience replay without target scaling and actor in batch along with target scaling gave the best performance. The environment was CartPole-v1. Timesteps of 500 were achieved within 100 episodes, a further learning rate decay for both actor and critic is expected to speed up this convergence. Experience Replay Critic + Batch Update Actor The actor still has a high variance even though I used a one step TD error as the advantage. However this variance seems to die off once the convergence is achieved, something I'm still trying to explain to myself.

acla_with_mc_returns.py

This file performs an actor critic learning algorithm with monte carlo estimates of the returns. Several experiments were performed and were found consistent with stochastic behaviors of the gradients. The stochastic parameter updates were best with an SGD with learning rate scheduling and nesterov accelerated gradient. However, a full batch gradient descent beat the sgd by a large margin and converged within 500 episodes for cartpole v1.

cacla.py
So finally I was able to write the Continuous Actor Critic Learning Algorithm. I benchmarked this against cartpole continuous environment. The training started pretty low on enthusiasm, but to my surprise the algorithm hit 1189 timesteps in the 420th episode! I used number of updates towards an action where variancet is the running variance of the TD(0) error and δt is the TD(0) error at time t. A gaussian exploration was used. The Actor was trained in a full batch mode, the Critic used an experience replay with fixed targets updated every copy_epochs episodes. CACLA It is worth appreciating the reduction in Actor's variance over time and the corresponding increase in the timesteps.

Preliminary Results

I ran an experiment where a noisy sin function was to be stabilised. The noise came from a another time dependent function with 4 unique levels. The environment is defined in env_definition.py. This can be interpreted as a function filtering from a convolution. The file experiment_sin_stability.py gives the continuous actor critic learning algorithm we used here. The following results were obtained, Training_status

The hits+partial hits were the percentage of points in the domain where the algorithm brought the noisy curve within [-0.1, +0.1] deviation of the actual value after correction. A maximum hit of 75.5% was observed. prelim_results

The original curve is the one observed with convolution, the corrected one if the one which the agent gives out after correction.

Issues

Bump certifi from 2019.6.16 to 2022.12.7

opened on 2022-12-08 06:10:17 by dependabot[bot]

Bumps certifi from 2019.6.16 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).

Bump lief from 0.9.0 to 0.12.2

opened on 2022-10-06 20:01:41 by dependabot[bot]

Bumps lief from 0.9.0 to 0.12.2.

Release notes

Sourced from lief's releases.

0.12.2

No release notes provided.

0.12.1

See: https://lief-project.github.io/doc/stable/changelog.html#april-08-2022

0.12.0

Changelog is here: https://lief-project.github.io/doc/latest/changelog.html#march-25-2022

0.11.5

No release notes provided.

0.11.4

No release notes provided.

0.11.3

No release notes provided.

0.11.2

See: https://lief.quarkslab.com/doc/stable/changelog.html#february-22-2021

0.11.1

See: https://lief.quarkslab.com/blog/2021-02-22-lief-0-11-1/

0.11.0

See: https://lief.quarkslab.com/doc/stable/changelog.html#v0.11.0

0.10.1

No release notes provided.

0.10.0

No release notes provided.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).

Bump joblib from 0.13.2 to 1.2.0

opened on 2022-09-30 19:47:47 by dependabot[bot]

Bumps joblib from 0.13.2 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

  • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

  • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

  • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

  • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

  • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

  • Vendor loky 3.3.0 which fixes several bugs including:

    • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

    • avoiding leaking worker processes in case of nested loky parallel calls;

    • reliability spawn the correct number of reusable workers.

Release 1.1.0

  • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

  • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits
  • 5991350 Release 1.2.0
  • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
  • cea26ff CI test the future loky-3.3.0 branch (#1338)
  • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
  • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
  • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
  • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
  • ac09691 [MAINT] various test updates (#1334)
  • 4a314b1 Vendor loky 3.2.0 (#1333)
  • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).

Bump nbconvert from 5.5.0 to 6.5.1

opened on 2022-08-23 17:52:10 by dependabot[bot]

Bumps nbconvert from 5.5.0 to 6.5.1.

Release notes

Sourced from nbconvert's releases.

Release 6.5.1

No release notes provided.

6.5.0

What's Changed

New Contributors

Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5

6.4.3

What's Changed

New Contributors

Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3

6.4.0

What's Changed

New Contributors

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).

Bump mistune from 0.8.4 to 2.0.3

opened on 2022-07-29 22:56:50 by dependabot[bot]

Bumps mistune from 0.8.4 to 2.0.3.

Release notes

Sourced from mistune's releases.

Version 2.0.2

Fix escape_url via lepture/mistune#295

Version 2.0.1

Fix XSS for image link syntax.

Version 2.0.0

First release of Mistune v2.

Version 2.0.0 RC1

In this release, we have a Security Fix for harmful links.

Version 2.0.0 Alpha 1

This is the first release of v2. An alpha version for users to have a preview of the new mistune.

Changelog

Sourced from mistune's changelog.

Changelog

Here is the full history of mistune v2.

Version 2.0.4


Released on Jul 15, 2022
  • Fix url plugin in <a> tag
  • Fix * formatting

Version 2.0.3

Released on Jun 27, 2022

  • Fix table plugin
  • Security fix for CVE-2022-34749

Version 2.0.2


Released on Jan 14, 2022

Fix escape_url

Version 2.0.1

Released on Dec 30, 2021

XSS fix for image link syntax.

Version 2.0.0


Released on Dec 5, 2021

This is the first non-alpha release of mistune v2.

Version 2.0.0rc1

Released on Feb 16, 2021

Version 2.0.0a6


</tr></table> 

... (truncated)

Commits
  • 3f422f1 Version bump 2.0.3
  • a6d4321 Fix asteris emphasis regex CVE-2022-34749
  • 5638e46 Merge pull request #307 from jieter/patch-1
  • 0eba471 Fix typo in guide.rst
  • 61e9337 Fix table plugin
  • 76dec68 Add documentation for renderer heading when TOC enabled
  • 799cd11 Version bump 2.0.2
  • babb0cf Merge pull request #295 from dairiki/bug.escape_url
  • fc2cd53 Make mistune.util.escape_url less aggressive
  • 3e8d352 Version bump 2.0.1
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).

Bump distributed from 2.1.0 to 2021.10.0

opened on 2022-07-15 21:59:18 by dependabot[bot]

Bumps distributed from 2.1.0 to 2021.10.0.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/shaktikshri/adaptiveSystems/network/alerts).
Shakti Kumar
GitHub Repository

reinforcement-learning inverse-reinforcement-learning cirl deep-reinforcement-learning