This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

wazenmai, updated 🕥 2023-02-05 21:02:57



Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen


This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

With this repository, you can * pre-train a MidiBERT-Piano with your customized pre-trained dataset * fine-tune & evaluate on 4 downstream tasks * extract melody (mid to mid) using pre-trained MidiBERT-Piano

All the datasets employed in this work are publicly available.

Quick Start

For programmers

If you'd like to reproduce the results (MidiBERT) shown in the paper, image-20210710185007453

  1. Please download the checkpoints, and rename files like the following

    (Note: we only provide checkpoints for models in CP representations) result/ └── finetune/ └── melody_default/ └── model_best.ckpt └── velocity_default/ └── model_best.ckpt └── composer_default/ └── model_best.ckpt └── emotion_default/ └── model_best.ckpt

  2. Run ./scripts/

    Or refer to Readme in MidiBERT folder for more details.

    No gpu is needed for evaluation

For musicians who want to test melody extraction

Edit scripts/ and modify song_path to your midi path. The midi file to predicted melody will be saved at the root folder. ./scripts/

Windows Users


modify this line (export PYTHONPATH='.') to the following


print the environment variable to make sure it's working

echo %PYTHONPATH% ``` I've experimented this on Adele hello (piano cover), and I think it's good.
But for non-pop music like Mozart sonata, I feel like the model is pretty confused. This is expected. As the training data is POP909 Dataset, the model knows very little about classical music.

Side note: I try to make it more friendly for non-programmers. Feel free to open an issue if there's any problem.


  • Python3
  • Install generally used packages for MidiBERT-Piano: python git clone cd MIDI-BERT pip install -r requirements.txt


Please see scripts folder, which includes bash file for * prepare data * pretrain * finetune * evaluation * melody extraction

You may need to change the folder/file name or any config settings you prefer.

Repo Structure

``` Data/ └── Dataset/
└── pop909/
└── .../ └── CP_data/ └── pop909_train.npy └── *.npy

data_creation/ └── preprocess_pop909/ └── prepare_data/ # convert midi to CP_data └── dict/ # CP dictionary

melody_extraction/ └── skyline/ └── midibert/

MidiBERT/ └── *py



For more details on * data preparation, please go to data_creation and follow Readme * MidiBERT pretraining, finetuning, evaluation, please go to MidiBERT and follow Readme * skyline, please go to melody_extraction/skyline and follow Readme

Note that Baseline (LSTM) and code in remi versions are removed for cleaness. But you could find them in main branch.


If you find this useful, please cite our paper.

@article{midibertpiano, title={{MidiBERT-Piano}: Large-scale Pre-training for Symbolic Music Understanding}, author={Yi-Hui Chou and I-Chun Chen and Chin-Jui Chang and Joann Ching, and Yi-Hsuan Yang}, journal={arXiv preprint arXiv:2107.05223}, year={2021} }


How to convert the CP-Word of this project into MIDI ?

opened on 2023-03-24 13:29:44 by li-car-fei None

Install with Python3.11

opened on 2023-02-21 11:02:37 by fpachet

Hello Do you know if this can be installed with Python 3.11 ? I tried but it failed (notably with torch). Thanks!

about preprocess my own pretrain data

opened on 2022-04-11 15:10:23 by fourthbrother

image can you tell me where i am wrong and where do i need to change