NXTProduct, updated 🕥 2022-06-22 04:43:33

TUNet - Official Implementation

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining - ICASSP 2022

Generic badge PWC Generic badge Generic badge

License and citation

This code is available for academic research only. If you use our software, please cite as below. For commercial applications, please contact [email protected].

Copyright © 2021 FPT Software, Inc. All rights reserved.

@inproceedings{Nguyen_2022, doi = {10.1109/icassp43922.2022.9747699}, url = {https://doi.org/10.1109%2Ficassp43922.2022.9747699}, year = 2022, month = {may}, publisher = {{IEEE}}, author = {Viet-Anh Nguyen and Anh H. T. Nguyen and Andy W. H. Khong}, title = {{TUNet}: A Block-Online Bandwidth Extension Model Based On Transformers And Self-Supervised Pretraining}, booktitle = {{ICASSP} 2022 - 2022 {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})} }

1. Results

Our model achieved a significant gain over baselines. Here, we include the predicted mean-opion-score (MOS) using Microsoft's DNSMOS Azure service. Please refer to our paper for more benchmarks.

| Model | DNSMOS | | -------- | -------- | |Input | 3.0951 | |TFiLM-UNet | 3.1026 | |WSRGlow | 3.2053 | |NU-Wave | 3.2760 | |TUNet | 3.3896|

We also provide several audio samples in audio_samples for comparison. In spectrogram visualization, it can be seen that high frequencies generated by our models are more accurate than the baselines.

2. Installation

Setup

Clone the repo

$ git clone https://github.com/NXTProduct/TUNet.git $ cd TUNet

Install dependencies

  • Our implementation requires the libsndfile and libsamplerate libraries for the Python packages soundfile and samplerate, respectively. On Ubuntu, they can be easily installed using apt-get: $ apt-get update && apt-get install libsndfile-dev libsamplerate-dev
  • Create a Python 3.8 environment. Conda is recommended: $ conda create -n tunet python=3.8 $ conda activate tunet

  • Install the requirements: $ pip install -r requirements.txt -f https://download.pytorch.org/whl/cu113/torch_stable.html

Note: the argument -f https://download.pytorch.org/whl/cu113/torch_stable.html is provided to install torch==1.10.0+cu113 (Pytorch 1.10, CUDA 11.3) inside the requirements.txt . Choose an appropriate CUDA version to your GPUs and change/remove the argument according to PyTorch documentation

3. Data preparation

In our paper, we conduct experiments on the VCTK and VIVOS datasets. You may use either one or both.

  • Download and extract the datasets: $ wget http://www.udialogue.org/download/VCTK-Corpus.tar.gz -O data/vctk/VCTK-Corpus.tar.gz $ wget https://ailab.hcmus.edu.vn/assets/vivos.tar.gz -O data/vivos/vivos.tar.gz $ tar -zxvf data/vctk/VCTK-Corpus.tar.gz -C data/vctk/ --strip-components=1 $ tar -zxvf data/vivos/vivos.tar.gz -C data/vivos/ --strip-components=1

After extracting the datasets, your ./data directory should look like this:

```
.
|--data
    |--vctk
        |--wav48
            |--p225
                |--p225_001.wav
                ...
        |--train.txt   
        |--test.txt
    |--vivos
        |--train
            |--waves
                |--VIVOSSPK01
                    |--VIVOSSPK12_R001.wav
                    ...                
        |--test
            |--waves
                |--VIVOSDEV01
                    |--VIVOSDEV01_R001.wav
                    ...      
        |--train.txt   
        |--test.txt
```
  • In order to load the datasets, text files that contain training and testing audio paths are required. We have prepared train.txt and test.txt files in ./data/vctk and ./data/vivos directories.

4. Run the code

Configuration

config.py is the most important file. Here, you can find all the configurations related to experiment setups, datasets, models, training, testing, etc. Although the config file has been explained thoroughly, we recommend reading our paper to fully understand each parameter.

Training

  • Adjust training hyperparameters in config.py

Note: batch_size in this implementation is different from the batch size in the paper. Specifically, we infer " batch size" in our paper as the number of frames per batch, whereas in this repo, batch_size is the number of audio files per batch. The DataLoader loads batches of audio files then chunks into frames on the fly. Since audio duration is variable, the number of frames per batch varies around 12*batch_size . * Run main.py: $ python main.py --mode train * Each run will create a version in ./lightning_logs, where the model checkpoint and hyperparameters are saved. In case you want to continue training from one of these versions, just set the argument --version of the above command to your desired version number. For example: # resume from version 5 $ python main.py --mode train --version 5 * To monitor the training curves as well as inspect model output visualization, run the tensorboard: $ tensorboard --logdir=./lightning_logs --bind_all image.png image.png

Evaluation

  • Modify config.py to change evaluation setup if necessary.
  • Run main.py with a version number to be evaluated: $ python main.py --mode eval --version 5 This will give the mean and standard deviation of LSD, LSD-HF, and SI-SDR, respectively. During the evaluation, several output samples are saved to CONFIG.LOG.sample_path for sanity testing.

Configure a new dataset

Our implementation currently works with the VCTK and VIVOS datasets but can be easily extensible to a new one.

  • Firstly, you need to prepare train.txt and test.txt. See ./data/vivos/train.txt and ./data/vivos/test.txt for example.
  • Secondly, add a new dictionary to CONFIG.DATA.data_dir: { 'root': 'path/to/data/directory', 'train': 'path/to/train.txt', 'test': 'path/to/test.txt' } Important: Make sure each line in train.txt and test.txt joining with 'root' is a valid path to its corresponding audio file.

5. Audio generation

  • In order to generate output audios, you need to either put your input samples into ./test_samples or modify CONFIG.TEST.in_dir to your input directory.
  • Run main.py: python main.py --mode test --version 5 The generated audios are saved to CONFIG.TEST.out_dir.

Note: checkpoint version_5 has only been trained for a few epochs for demonstration purposes. Since the code has been refactored, the checkpoint we used in the paper could not be loaded. To inference with our best checkpoint, please use the ONNX model instead. ## ONNX inferencing We provide ONNX inferencing scripts and the best ONNX model (converted from the best checkpoint) at lightning_logs/best_model.onnx. * Convert a checkpoint to an ONNX model: python main.py --mode onnx --version 5 The converted ONNX model will be saved to lightning_logs/version_5/checkpoints. * Put test audios in test_samples and inference with the converted ONNX model (see inference_onnx.py for more details): python inference_onnx.py

6. Acknowledgement

We thank FPT Software for funding and providing GPU infrastructure. We also thank Microsoft for giving access to the DNSMOS Azure service.

Issues

Bump numpy from 1.20.3 to 1.22.0

opened on 2022-06-22 04:43:33 by dependabot[bot]

Bumps numpy from 1.20.3 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/NXTProduct/TUNet/network/alerts).