kapre: Keras Audio Preprocessors

keunwoochoi, updated 🕥 2022-07-04 00:10:02


Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
  • A perfectly invertible STFT and InverseSTFT pair
  • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!


sh pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

```python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax from kapre import STFT, Magnitude, MagnitudeToDecibel from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

6 channels (!), maybe 1-sec audio signal, for an example.

input_shape = (44100, 6) sr = 44100 model = Sequential()

A STFT layer

model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024, window_name=None, pad_end=False, input_data_format='channels_last', output_data_format='channels_last', input_shape=input_shape)) model.add(Magnitude()) model.add(MagnitudeToDecibel()) # these three layers can be replaced with get_stft_magnitude_layer()

Alternatively, you may want to use a melspectrogram layer

melgram_layer = get_melspectrogram_layer()

or log-frequency layer

log_stft_layer = get_log_frequency_spectrogram_layer()

add more layers as you want

model.add(Conv2D(32, (3, 3), strides=(2, 2))) model.add(BatchNormalization()) model.add(ReLU()) model.add(GlobalAveragePooling2D()) model.add(Dense(10)) model.add(Softmax())

Compile the model

model.compile('adam', 'categorical_crossentropy') # if single-label classification

train it with raw audio sample inputs

for example, you may have functions that load your data as below.

x = load_x() # e.g., x.shape = (10000, 6, 44100) y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification


model.fit(x, y)



Tflite compatbility

The STFT layer is not tflite compatible (due to tf.signal.stft). To create a tflite compatible model, first train using the normal kapre layers then create a new model replacing STFT and Magnitude with STFTTflite, MagnitudeTflite. Tflite compatible layers are restricted to a batch size of 1 which prevents use of them during training.


assumes you have run the one-shot example above.

from kapre import STFTTflite, MagnitudeTflite model_tflite = Sequential()

model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024, window_name=None, pad_end=False, input_data_format='channels_last', output_data_format='channels_last', input_shape=input_shape)) model_tflite.add(MagnitudeTflite()) model_tflite.add(MagnitudeToDecibel())
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2))) model_tflite.add(BatchNormalization()) model_tflite.add(ReLU()) model_tflite.add(GlobalAveragePooling2D()) model_tflite.add(Dense(10)) model_tflite.add(Softmax())

load the trained weights into the tflite compatible model.

model_tflite.set_weights(model.get_weights()) ```


Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre, title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras}, author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho}, booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning}, year={2017}, organization={ICML} }


Problem incorporating SpecAugument in the training process

opened on 2022-10-26 22:19:44 by nnbuainain


I'm trying to add a SpecAug layer in the training process of a CNN using the code below:




melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE, n_fft = 2048, hop_length = 512, return_decibel=True, n_mels = 40, mel_f_min = 500, mel_f_max = 15000, input_data_format='channels_last', output_data_format='channels_last')

spec_augment = SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_masks=2, n_time_masks=3, mask_value=-100)

model = Sequential() model.add(melgram) model.add(spec_augment) ```

The CNN summary looks like this:

```Python Model: "sequential_2"

Layer (type) Output Shape Param #

melspectrogram (Sequential) (None, 397, 40, 1) 0

spec_augment_1 (SpecAugment (None, 397, 40, 1) 0

================================================================= Total params: 0 Trainable params: 0 Non-trainable params: 0

``` Compiling and fitting the model

```Python model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')

early_stop = EarlyStopping(monitor='loss', patience=5)

reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)

checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')

model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR]) ```

Then I get the following error:

```Python Epoch 1/50

TypeError Traceback (most recent call last) in 7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5') 8 ----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

6 frames /usr/local/lib/python3.7/dist-packages/kapre/augmentation.py in tfapplymasks_to_axis(self, x, axis, mask_param, n_masks) 78 try: 79 do_return = True ---> 80 retval_ = ag.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope) 81 except: 82 do_return = False

TypeError: in user code:

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
    return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
    outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
    y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
    ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
    retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
    ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
    x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
    retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)

TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).

in user code:

    File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
        elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
    File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
        x = self._apply_masks_to_axis(
    File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
        return tf.where(mask, self.mask_value, x)

    TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.

Call arguments received by layer "spec_augment_1" (type SpecAugment):
  • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
  • training=True
  • kwargs=<class 'inspect._empty'>


The shape of X_train is

Python (2182, 205000, 1)

I'm using Tensorflow 2.9.2, and Python 3.7.15

When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

Full-integer quantization and kapre layers

opened on 2022-02-15 17:25:29 by eppane

I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

I am using TensorFlow 2.7.0 and kapre 0.3.7.

Here is my code for testing the tflite-model:

``` preds = []

Test and evaluate the TFLite-converted model on unseen test data

for i, sample in enumerate(X_test_full_scaled): X = sample

if input_details['dtype'] == np.int8:
    input_scale, input_zero_point = input_details["quantization"]
    X = sample / input_scale + input_zero_point

X = X.reshape((1, 8000, 1)).astype(input_details["dtype"])

interpreter.set_tensor(input_index, X)
pred = interpreter.get_tensor(output_index)

output_scale, output_zero_point = output_details['quantization']
if output_details['dtype'] == np.int8:
    pred = pred.astype(np.float32)
    pred = (pred - output_zero_point) * output_scale

pred = np.argmax(pred, axis=1)[0]

preds = np.array(preds) ```

Calling Magnitude() and Phase() simultaneously

opened on 2021-11-20 19:09:39 by HsuanYang-Wang


I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

Is this possible?



about kapre.utiils

opened on 2021-11-14 12:57:44 by YiningWang2

Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

Best wishes, Daisy

Function missing in updated version

opened on 2021-06-21 07:15:16 by v3551G

I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

trainable DSP parameters

opened on 2021-05-27 19:38:55 by bytosaur

hello contributers and community.

I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

Could you please state if this is still available and how to use it?

happy hacking Paul


Kapre-0.3.7 2022-01-21 20:07:49

Kapre-0.3.6 2021-11-14 01:11:11

  • bugfix (tflite)

Kapre-0.3.5 2021-03-18 22:13:11

  • Add tflite-compatible stft layer

2020-09-29 18:44:35

Bugfix for get_window_fn()

Kapre-0.3.3 2020-09-15 03:13:44

  • kapre.augmentation is added
  • kapre.time_frequency.ConcatenateFrequencyMap is added
  • kapre.composed.get_frequency_aware_conv2d is added
  • In STFT and InverseSTFT, keyword arg window_fn is renamed to window_name and it expects string value, not function.
  • With this update, models with Kapre layers can be loaded with h5 file format.
  • kapre.backend.get_window_fn is added

Kapre 0.3.2 2020-08-30 21:50:36

- `kapre.signal.Frame` and `kapre.signal.Energy` are added
- `kapre.signal.LogmelToMFCC` is added
- `kapre.signal.MuLawEncoder` and `kapre.signal.MuLawDecoder` are added
- `kapre.composed.get_stft_magnitude_layer()` is added 
- doc is hosted at https://kapre.readthedocs.io/
Keunwoo Choi

MIR, machine learning, music recommendation.

GitHub Repository

keras audio spectrogram melspectrogram preprocess tensorflow kapre-layers keras-audio-preprocessors shot