Project Description is available at my university page under Proactive Troubleshooting [Aug2018-Present].
I will update this readme as I get more time, the individual files however have an extensive documentation.
dqn.py
This file gives a template for constructing a Deep Q Learning network. You can specify the no. of hidden units you want but for the moment it takes only one hidden layer.
IRL/irl_finite_space.py
This file implements the Finite Space IRL as put forth in Andrew Ng and Stuart Russel in Algorithms for Inverse Reinforcement Learning. I used pulp based linear program solver but many people prefer using cvxopt package as well.
For a sample reward structure the following result was obtained,
dqn_sin_stability.py
This is the most recent code I am stuck on (among many other codes :P ). This should ideally be a DQN implementation of stabilizing a sin function, or a general function. For any given continuous values of a noisy sin output, the agent should choose a noise correction scheme which smoothly approximates the sin function, or the function in consideration. Both the noise and the correction values can be continuous real values which makes this problem non trivial.
acla_with_approxq.py
Lately i realized that the function stabilization problem I was trying to handle couldnt be done without a continuos
action space consideration. As the first step towards Continuous Actor Critic Learning Algorithm (CACLA) proposed by Hasselt and Wiering in Reinforcement Learning in Continuous Action Spaces
I tried the the ACLA algorithm, which as you'd expect is just a slight variation of the DQN form. The file acla_with_approxq.py
is an implementation of the ACLA with value function approximation. I tried out the following configurations,
1. Actor and Critic with experience replays [This training was by far the slowest I ever saw. There was a progress of 180 episodes over 8hours on an AMD Ryzen Threadripper 2990WX 128 GB]
2. Combinations of fixing Q targets in Critic and Actor
3. Using dropouts in Actor
4. Combinations of stochastic/batch updates of critic/actor
Out of all these, updating Critic with experience replay without target scaling and actor in batch along with target scaling gave the best performance.
The environment was CartPole-v1. Timesteps of 500 were achieved within 100 episodes, a further learning rate decay for both actor and critic is expected to speed up this convergence.
The actor still has a high variance even though I used a one step TD error as the advantage. However this variance seems to die off once the convergence is achieved, something I'm still trying to explain to myself.
acla_with_mc_returns.py
This file performs an actor critic learning algorithm with monte carlo estimates of the returns. Several experiments were performed and were found consistent with stochastic behaviors of the gradients. The stochastic parameter updates were best with an SGD with learning rate scheduling and nesterov accelerated gradient. However, a full batch gradient descent beat the sgd by a large margin and converged within 500 episodes for cartpole v1.
cacla.py
So finally I was able to write the Continuous Actor Critic Learning Algorithm. I benchmarked this against cartpole continuous environment.
The training started pretty low on enthusiasm, but to my surprise the algorithm hit 1189 timesteps in the 420th episode!
I used number of updates towards an action where variancet is the running variance of the TD(0) error and δt is the TD(0) error at time t.
A gaussian exploration was used. The Actor was trained in a full batch mode, the Critic used an experience replay with fixed targets updated every copy_epochs episodes.
It is worth appreciating the reduction in Actor's variance over time and the corresponding increase in the timesteps.
I ran an experiment where a noisy sin function was to be stabilised. The noise came from
a another time dependent function with 4 unique levels. The environment is defined in env_definition.py.
This can be interpreted as a function filtering from a convolution. The file experiment_sin_stability.py
gives the continuous actor critic learning algorithm we used here. The following results were obtained,
The hits+partial hits were the percentage of points in the domain where the algorithm brought the noisy curve within [-0.1, +0.1] deviation of the actual value
after correction. A maximum hit of 75.5% was observed.
The original curve is the one observed with convolution, the corrected one if the one which the agent gives out
after correction.
Bumps certifi from 2019.6.16 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps lief from 0.9.0 to 0.12.2.
Sourced from lief's releases.
0.12.2
No release notes provided.
0.12.1
See: https://lief-project.github.io/doc/stable/changelog.html#april-08-2022
0.12.0
Changelog is here: https://lief-project.github.io/doc/latest/changelog.html#march-25-2022
0.11.5
No release notes provided.
0.11.4
No release notes provided.
0.11.3
No release notes provided.
0.11.2
See: https://lief.quarkslab.com/doc/stable/changelog.html#february-22-2021
0.11.1
See: https://lief.quarkslab.com/blog/2021-02-22-lief-0-11-1/
0.11.0
See: https://lief.quarkslab.com/doc/stable/changelog.html#v0.11.0
0.10.1
No release notes provided.
0.10.0
No release notes provided.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps joblib from 0.13.2 to 1.2.0.
Sourced from joblib's changelog.
Release 1.2.0
Fix a security issue where
eval(pre_dispatch)
could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256
Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263
Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with
mmap_mode != None
as the resultingnumpy.memmap
object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.
Vendor loky 3.3.0 which fixes several bugs including:
robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);
avoiding leaking worker processes in case of nested loky parallel calls;
reliability spawn the correct number of reusable workers.
Release 1.1.0
Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181
Fix joblib.Memory bug with the
ignore
parameter when the cached function is a decorated function.
... (truncated)
5991350
Release 1.2.03fa2188
MAINT cleanup numpy warnings related to np.matrix in tests (#1340)cea26ff
CI test the future loky-3.3.0 branch (#1338)8aca6f4
MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)067ed4f
XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)ac4ebd5
MAINT add back pytest warnings plugin (#1337)a23427d
Test child raises parent exits cleanly more reliable on macos (#1335)ac09691
[MAINT] various test updates (#1334)4a314b1
Vendor loky 3.2.0 (#1333)bdf47e9
Make test_parallel_with_interactively_defined_functions_default_backend timeo...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps nbconvert from 5.5.0 to 6.5.1.
Sourced from nbconvert's releases.
Release 6.5.1
No release notes provided.
6.5.0
What's Changed
- Drop dependency on testpath. by
@anntzer
in jupyter/nbconvert#1723- Adopt pre-commit by
@blink1073
in jupyter/nbconvert#1744- Add pytest settings and handle warnings by
@blink1073
in jupyter/nbconvert#1745- Apply Autoformatters by
@blink1073
in jupyter/nbconvert#1746- Add git-blame-ignore-revs by
@blink1073
in jupyter/nbconvert#1748- Update flake8 config by
@blink1073
in jupyter/nbconvert#1749- support bleach 5, add packaging and tinycss2 dependencies by
@bollwyvl
in jupyter/nbconvert#1755- [pre-commit.ci] pre-commit autoupdate by
@pre-commit-ci
in jupyter/nbconvert#1752- update cli example by
@leahecole
in jupyter/nbconvert#1753- Clean up pre-commit by
@blink1073
in jupyter/nbconvert#1757- Clean up workflows by
@blink1073
in jupyter/nbconvert#1750New Contributors
@pre-commit-ci
made their first contribution in jupyter/nbconvert#1752Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5
6.4.3
What's Changed
- Add section to
customizing
showing how to use template inheritance by@stefanv
in jupyter/nbconvert#1719- Remove ipython genutils by
@rgs258
in jupyter/nbconvert#1727- Update changelog for 6.4.3 by
@blink1073
in jupyter/nbconvert#1728New Contributors
@stefanv
made their first contribution in jupyter/nbconvert#1719@rgs258
made their first contribution in jupyter/nbconvert#1727Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3
6.4.0
What's Changed
- Optionally speed up validation by
@gwincr11
in jupyter/nbconvert#1672- Adding missing div compared to JupyterLab DOM structure by
@SylvainCorlay
in jupyter/nbconvert#1678- Allow passing extra args to code highlighter by
@yuvipanda
in jupyter/nbconvert#1683- Prevent page breaks in outputs when printing by
@SylvainCorlay
in jupyter/nbconvert#1679- Add collapsers to template by
@SylvainCorlay
in jupyter/nbconvert#1689- Fix recent pandoc latex tables by adding calc and array (#1536, #1566) by
@cgevans
in jupyter/nbconvert#1686- Add an invalid notebook error by
@gwincr11
in jupyter/nbconvert#1675- Fix typos in execute.py by
@TylerAnderson22
in jupyter/nbconvert#1692- Modernize latex greek math handling (partially fixes #1673) by
@cgevans
in jupyter/nbconvert#1687- Fix use of deprecated API and update test matrix by
@blink1073
in jupyter/nbconvert#1696- Update nbconvert_library.ipynb by
@letterphile
in jupyter/nbconvert#1695- Changelog for 6.4 by
@blink1073
in jupyter/nbconvert#1697New Contributors
... (truncated)
7471b75
Release 6.5.1c1943e0
Fix pre-commit8685e93
Fix tests0abf290
Run black and prettier418d545
Run test on 6.x branchbef65d7
Convert input to string prior to escape HTML0818628
Check input type before escapingb206470
GHSL-2021-1017, GHSL-2021-1020, GHSL-2021-1021a03cbb8
GHSL-2021-1026, GHSL-2021-102548fe71e
GHSL-2021-1024Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mistune from 0.8.4 to 2.0.3.
Sourced from mistune's releases.
Version 2.0.2
Fix
escape_url
via lepture/mistune#295Version 2.0.1
Fix XSS for image link syntax.
Version 2.0.0
First release of Mistune v2.
Version 2.0.0 RC1
In this release, we have a Security Fix for harmful links.
Version 2.0.0 Alpha 1
This is the first release of v2. An alpha version for users to have a preview of the new mistune.
Sourced from mistune's changelog.
Changelog
Here is the full history of mistune v2.
Version 2.0.4
Released on Jul 15, 2022
- Fix
url
plugin in<a>
tag- Fix
*
formattingVersion 2.0.3
Released on Jun 27, 2022
- Fix
table
plugin- Security fix for CVE-2022-34749
Version 2.0.2
Released on Jan 14, 2022
Fix
escape_url
Version 2.0.1
Released on Dec 30, 2021
XSS fix for image link syntax.
Version 2.0.0
Released on Dec 5, 2021
This is the first non-alpha release of mistune v2.
Version 2.0.0rc1
Released on Feb 16, 2021
Version 2.0.0a6
</tr></table>
... (truncated)
3f422f1
Version bump 2.0.3a6d4321
Fix asteris emphasis regex CVE-2022-347495638e46
Merge pull request #307 from jieter/patch-10eba471
Fix typo in guide.rst61e9337
Fix table plugin76dec68
Add documentation for renderer heading when TOC enabled799cd11
Version bump 2.0.2babb0cf
Merge pull request #295 from dairiki/bug.escape_urlfc2cd53
Make mistune.util.escape_url less aggressive3e8d352
Version bump 2.0.1Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps distributed from 2.1.0 to 2021.10.0.
63ebaea
bump version to 2021.10.01670cf8
Ensure resumed flight tasks are still fetched (#5426)cdc68cc
AMM high level documentation (#5456)33d83bc
Provide stack for suspended coro in test timeout (#5446)918e3fb
Handle UCXNotConnected
error (#5449)3afc670
AMM: Don't schedule tasks to paused workers (#5431)f3aa9d1
Use pip install .
instead of calling setup.py
(#5442)a8151a6
Increase latency for stealing (#5390)1585f85
Type annotations for Worker and gen_cluster (#5438)7d2516a
Ensure reconnecting workers do not loose required data (#5436)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
reinforcement-learning inverse-reinforcement-learning cirl deep-reinforcement-learning