Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.
Tested on Python 3.6 and 3.7
channels_first
and channels_last
)tf.signals
implementation such as..STFT
and InverseSTFT
pairkapre.time_frequency.STFT()
as the first layer of the model.n_fft
to boost the performance.sh
pip install kapre
Please refer to Kapre API Documentation at https://kapre.readthedocs.io
```python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax from kapre import STFT, Magnitude, MagnitudeToDecibel from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer
input_shape = (44100, 6) sr = 44100 model = Sequential()
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024, window_name=None, pad_end=False, input_data_format='channels_last', output_data_format='channels_last', input_shape=input_shape)) model.add(Magnitude()) model.add(MagnitudeToDecibel()) # these three layers can be replaced with get_stft_magnitude_layer()
model.add(Conv2D(32, (3, 3), strides=(2, 2))) model.add(BatchNormalization()) model.add(ReLU()) model.add(GlobalAveragePooling2D()) model.add(Dense(10)) model.add(Softmax())
model.compile('adam', 'categorical_crossentropy') # if single-label classification
x = load_x() # e.g., x.shape = (10000, 6, 44100) y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
model.fit(x, y)
```
The STFT
layer is not tflite compatible (due to tf.signal.stft
). To create a tflite
compatible model, first train using the normal kapre
layers then create a new
model replacing STFT
and Magnitude
with STFTTflite
, MagnitudeTflite
.
Tflite compatible layers are restricted to a batch size of 1 which prevents use
of them during training.
```python
from kapre import STFTTflite, MagnitudeTflite model_tflite = Sequential()
model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,
window_name=None, pad_end=False,
input_data_format='channels_last', output_data_format='channels_last',
input_shape=input_shape))
model_tflite.add(MagnitudeTflite())
model_tflite.add(MagnitudeToDecibel())
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))
model_tflite.add(BatchNormalization())
model_tflite.add(ReLU())
model_tflite.add(GlobalAveragePooling2D())
model_tflite.add(Dense(10))
model_tflite.add(Softmax())
model_tflite.set_weights(model.get_weights()) ```
Please cite this paper if you use Kapre for your work.
@inproceedings{choi2017kapre,
title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
year={2017},
organization={ICML}
}
Hi,
I'm trying to add a SpecAug layer in the training process of a CNN using the code below:
```Python
CLIP_DURATION = 5 SAMPLING_RATE = 41000 NUM_CHANNELS = 1
INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)
melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE, n_fft = 2048, hop_length = 512, return_decibel=True, n_mels = 40, mel_f_min = 500, mel_f_max = 15000, input_data_format='channels_last', output_data_format='channels_last')
spec_augment = SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_masks=2, n_time_masks=3, mask_value=-100)
model = Sequential() model.add(melgram) model.add(spec_augment) ```
The CNN summary looks like this:
```Python Model: "sequential_2"
melspectrogram (Sequential) (None, 397, 40, 1) 0
spec_augment_1 (SpecAugment (None, 397, 40, 1) 0
)
================================================================= Total params: 0 Trainable params: 0 Non-trainable params: 0
``` Compiling and fitting the model
```Python model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')
early_stop = EarlyStopping(monitor='loss', patience=5)
reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)
checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR]) ```
Then I get the following error:
```Python Epoch 1/50
TypeError Traceback (most recent call last)
6 frames /usr/local/lib/python3.7/dist-packages/kapre/augmentation.py in tfapplymasks_to_axis(self, x, axis, mask_param, n_masks) 78 try: 79 do_return = True ---> 80 retval_ = ag.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope) 81 except: 82 do_return = False
TypeError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
in user code:
File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call *
elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment *
x = self._apply_masks_to_axis(
File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis *
return tf.where(mask, self.mask_value, x)
TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
Call arguments received by layer "spec_augment_1" (type SpecAugment):
• x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
• training=True
• kwargs=<class 'inspect._empty'>
```
The shape of X_train is
Python
(2182, 205000, 1)
I'm using Tensorflow 2.9.2, and Python 3.7.15
When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.
Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.
I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.
The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).
I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?
I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).
I am using TensorFlow 2.7.0 and kapre 0.3.7.
Here is my code for testing the tflite-model:
``` preds = []
for i, sample in enumerate(X_test_full_scaled): X = sample
if input_details['dtype'] == np.int8:
input_scale, input_zero_point = input_details["quantization"]
X = sample / input_scale + input_zero_point
X = X.reshape((1, 8000, 1)).astype(input_details["dtype"])
interpreter.set_tensor(input_index, X)
interpreter.invoke()
pred = interpreter.get_tensor(output_index)
output_scale, output_zero_point = output_details['quantization']
if output_details['dtype'] == np.int8:
pred = pred.astype(np.float32)
pred = (pred - output_zero_point) * output_scale
pred = np.argmax(pred, axis=1)[0]
preds.append(pred)
preds = np.array(preds) ```
Hi,
I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.
Is this possible?
Best,
Yang
Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.
Best wishes, Daisy
I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?
hello contributers and community.
I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.
Could you please state if this is still available and how to use it?
happy hacking Paul
Bugfix for get_window_fn()
kapre.augmentation
is addedkapre.time_frequency.ConcatenateFrequencyMap
is addedkapre.composed.get_frequency_aware_conv2d
is addedSTFT
and InverseSTFT
, keyword arg window_fn
is renamed to window_name
and it expects string value, not function.h5
file format.kapre.backend.get_window_fn
is added- `kapre.signal.Frame` and `kapre.signal.Energy` are added
- `kapre.signal.LogmelToMFCC` is added
- `kapre.signal.MuLawEncoder` and `kapre.signal.MuLawDecoder` are added
- `kapre.composed.get_stft_magnitude_layer()` is added
- doc is hosted at https://kapre.readthedocs.io/
keras audio spectrogram melspectrogram preprocess tensorflow kapre-layers keras-audio-preprocessors shot