A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

SforAiDl, updated 🕥 2023-03-01 21:06:37

KD-Lib

A PyTorch model compression library containing easy-to-use methods for knowledge distillation, pruning, and quantization

[![Downloads](https://pepy.tech/badge/kd-lib)](https://pepy.tech/project/kd-lib) [![Tests](https://github.com/SforAiDl/KD_Lib/actions/workflows/python-package-test.yml/badge.svg)](https://github.com/SforAiDl/KD_Lib/actions/workflows/python-package-test.yml) [![Docs](https://readthedocs.org/projects/kd-lib/badge/?version=latest)](https://kd-lib.readthedocs.io/en/latest/?badge=latest) **[Documentation](https://kd-lib.readthedocs.io/en/latest/)** | **[Tutorials](https://kd-lib.readthedocs.io/en/latest/usage/tutorials/index.html)**

Installation

From source (recommended)

```shell

https://github.com/SforAiDl/KD_Lib.git cd KD_Lib python setup.py install

```

From PyPI

```shell

pip install KD-Lib

```

Example usage

To implement the most basic version of knowledge distillation from Distilling the Knowledge in a Neural Network and plot loss curves:

```python

import torch import torch.optim as optim from torchvision import datasets, transforms from KD_Lib.KD import VanillaKD

This part is where you define your datasets, dataloaders, models and optimizers

train_loader = torch.utils.data.DataLoader( datasets.MNIST( "mnist_data", train=True, download=True, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] ), ), batch_size=32, shuffle=True, )

test_loader = torch.utils.data.DataLoader( datasets.MNIST( "mnist_data", train=False, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] ), ), batch_size=32, shuffle=True, )

teacher_model = student_model =

teacher_optimizer = optim.SGD(teacher_model.parameters(), 0.01) student_optimizer = optim.SGD(student_model.parameters(), 0.01)

Now, this is where KD_Lib comes into the picture

distiller = VanillaKD(teacher_model, student_model, train_loader, test_loader, teacher_optimizer, student_optimizer)
distiller.train_teacher(epochs=5, plot_losses=True, save_model=True) # Train the teacher network distiller.train_student(epochs=5, plot_losses=True, save_model=True) # Train the student network distiller.evaluate(teacher=False) # Evaluate the student network distiller.get_parameters() # A utility function to get the number of # parameters in the teacher and the student network

```

To train a collection of 3 models in an online fashion using the framework in Deep Mutual Learning and log training details to Tensorboard:

```python

import torch import torch.optim as optim from torchvision import datasets, transforms from KD_Lib.KD import DML from KD_Lib.models import ResNet18, ResNet50 # To use models packaged in KD_Lib

Define your datasets, dataloaders, models and optimizers

train_loader = torch.utils.data.DataLoader( datasets.MNIST( "mnist_data", train=True, download=True, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] ), ), batch_size=32, shuffle=True, )

test_loader = torch.utils.data.DataLoader( datasets.MNIST( "mnist_data", train=False, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] ), ), batch_size=32, shuffle=True, )

student_params = [4, 4, 4, 4, 4] student_model_1 = ResNet50(student_params, 1, 10) student_model_2 = ResNet18(student_params, 1, 10)

student_cohort = [student_model_1, student_model_2]

student_optimizer_1 = optim.SGD(student_model_1.parameters(), 0.01) student_optimizer_2 = optim.SGD(student_model_2.parameters(), 0.01)

student_optimizers = [student_optimizer_1, student_optimizer_2]

Now, this is where KD_Lib comes into the picture

distiller = DML(student_cohort, train_loader, test_loader, student_optimizers, log=True, logdir="./logs")

distiller.train_students(epochs=5) distiller.evaluate() distiller.get_parameters()

```

Methods Implemented

Some benchmark results can be found in the logs file.

| Paper / Method | Link | Repository (KD_Lib/) | | ----------------------------------------------------------|----------------------------------|----------------------| | Distilling the Knowledge in a Neural Network | https://arxiv.org/abs/1503.02531 | KD/vision/vanilla | | Improved Knowledge Distillation via Teacher Assistant | https://arxiv.org/abs/1902.03393 | KD/vision/TAKD | | Relational Knowledge Distillation | https://arxiv.org/abs/1904.05068 | KD/vision/RKD | | Distilling Knowledge from Noisy Teachers | https://arxiv.org/abs/1610.09650 | KD/vision/noisy | | Paying More Attention To The Attention | https://arxiv.org/abs/1612.03928 | KD/vision/attention | | Revisit Knowledge Distillation: a Teacher-free
Framework | https://arxiv.org/abs/1909.11723 |KD/vision/teacher_free| | Mean Teachers are Better Role Models | https://arxiv.org/abs/1703.01780 |KD/vision/mean_teacher| | Knowledge Distillation via Route Constrained
Optimization | https://arxiv.org/abs/1904.09149 | KD/vision/RCO | | Born Again Neural Networks | https://arxiv.org/abs/1805.04770 | KD/vision/BANN | | Preparing Lessons: Improve Knowledge Distillation
with Better Supervision | https://arxiv.org/abs/1911.07471 | KD/vision/KA | | Improving Generalization Robustness with Noisy
Collaboration in Knowledge Distillation | https://arxiv.org/abs/1910.05057 | KD/vision/noisy| | Distilling Task-Specific Knowledge from BERT into
Simple Neural Networks | https://arxiv.org/abs/1903.12136 | KD/text/BERT2LSTM | | Deep Mutual Learning | https://arxiv.org/abs/1706.00384 | KD/vision/DML | | The Lottery Ticket Hypothesis: Finding Sparse,
Trainable Neural Networks | https://arxiv.org/abs/1803.03635 | Pruning/lottery_tickets| | Regularizing Class-wise Predictions via
Self-knowledge Distillation | https://arxiv.org/abs/2003.13964 | KD/vision/CSDK |


Please cite our pre-print if you find KD-Lib useful in any way :)

```bibtex

@misc{shah2020kdlib, title={KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization}, author={Het Shah and Avishree Khare and Neelay Shah and Khizir Siddiqui}, year={2020}, eprint={2011.14691}, archivePrefix={arXiv}, primaryClass={cs.LG} }

```

Issues

Bump torch from 1.5.0 to 1.13.1

opened on 2023-03-01 21:03:57 by dependabot[bot]

Bumps torch from 1.5.0 to 1.13.1.

Release notes

Sourced from torch's releases.

PyTorch 1.13.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

  • RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669
  • Installation via pip on Amazon Linux 2, regression #88869
  • Installation using poetry on Mac M1, failure #88049
  • Missing masked tensor documentation #89734
  • torch.jit.annotations.parse_type_line is not safe (command injection) #88868
  • Use the Python frame safely in _pythonCallstack #88993
  • Double-backward with full_backward_hook causes RuntimeError #88312
  • Fix logical error in get_default_qat_qconfig #88876
  • Fix cuda/cpu check on NoneType and unit test #88854 and #88970
  • Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504
  • Onnx operator_export_type on the new registry #87735
  • torchrun AttributeError caused by file_based_local_timer on Windows #85427

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

  • Highlights
  • Backwards Incompatible Changes
  • New Features
  • Improvements
  • Performance
  • Documentation
  • Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

  • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

  • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

  • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

  • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

Release Compatibility Matrix

Following is the Release Compatibility Matrix for PyTorch releases:

PyTorch version Python Stable CUDA Experimental CUDA
2.0 >=3.8, <=3.11 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84
1.13 >=3.7, <=3.10 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96
1.12 >=3.7, <=3.10 CUDA 11.3, CUDNN 8.3.2.44 CUDA 11.6, CUDNN 8.3.2.44

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/SforAiDl/KD_Lib/network/alerts).

Bump torch from 1.5.0 to 1.13.1 in /docs

opened on 2023-03-01 21:03:52 by dependabot[bot]

Bumps torch from 1.5.0 to 1.13.1.

Release notes

Sourced from torch's releases.

PyTorch 1.13.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

  • RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669
  • Installation via pip on Amazon Linux 2, regression #88869
  • Installation using poetry on Mac M1, failure #88049
  • Missing masked tensor documentation #89734
  • torch.jit.annotations.parse_type_line is not safe (command injection) #88868
  • Use the Python frame safely in _pythonCallstack #88993
  • Double-backward with full_backward_hook causes RuntimeError #88312
  • Fix logical error in get_default_qat_qconfig #88876
  • Fix cuda/cpu check on NoneType and unit test #88854 and #88970
  • Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504
  • Onnx operator_export_type on the new registry #87735
  • torchrun AttributeError caused by file_based_local_timer on Windows #85427

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

  • Highlights
  • Backwards Incompatible Changes
  • New Features
  • Improvements
  • Performance
  • Documentation
  • Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

  • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

  • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

  • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

  • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

Release Compatibility Matrix

Following is the Release Compatibility Matrix for PyTorch releases:

PyTorch version Python Stable CUDA Experimental CUDA
2.0 >=3.8, <=3.11 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84
1.13 >=3.7, <=3.10 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96
1.12 >=3.7, <=3.10 CUDA 11.3, CUDNN 8.3.2.44 CUDA 11.6, CUDNN 8.3.2.44

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/SforAiDl/KD_Lib/network/alerts).

Relational KD

opened on 2023-02-15 12:46:47 by GeryAdhane

Does RKD works as it is or we have to modify the code? I would love to get a help on how we can modify this library.

Thank you.

Bump numpy from 1.21.0 to 1.22.0

opened on 2022-06-22 04:10:13 by dependabot[bot]

Bumps numpy from 1.21.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/SforAiDl/KD_Lib/network/alerts).

Test BERT2LSTM with mock data

opened on 2022-05-22 18:37:28 by NeelayS None

Consider potential name change to 'kdlib'

opened on 2022-05-14 11:13:46 by NeelayS None

Releases

v0.0.32 2022-05-18 08:31:17

v0.0.31 2022-05-15 19:41:23

2022-03-23 09:32:54

Society for Artificial Intelligence and Deep Learning
GitHub Repository Homepage

knowledge-distillation model-compression pruning quantization pytorch deep-learning-library machine-learning data-science benchmarking algorithm-implementations