This project aims to provide high quality text dataset of rap music lyrics. Such dataset are then fed to a neural network to build lyrics-generation model. The resulting word-to-word lyrics-generative model is served on raplyrics.eu.
Feel free to tweak this scraper to fit your needs. Kudos to open source.
First you will need to create a genius API key
to be able to call their API. Once done, copy your client_access_token
in genius/credentials.ini
.
Get the repo - clone from GitHub
$ git clone https://github.com/fpaupier/RapLyrics-Scraper
Setup a virtualenv
This project is built on python3 - I recommend using a virtual environment.
bash
`which python3` -m venv RapLyrics-Scraper
source RapLyrics-Scraper/bin/activate
pip install -r requirements.txt
Update the list of artists you want to get the lyrics from and the number of songs
to get per artists. To do so, directly edit the artists
list defined at lyrics_scraper.py:39
.
To run the script: be sure to set the lyrics_dir
and songs_per_artists
arguments.
lyrics_dir
songs_per_artists
arg.
Run python lyrics_scraper.py --help
for more information on the available arguments Let's say you want to scrap 2 songs per artist and save them in the folder my_lyrics_folder
with a verbose output, run:
bash
python lyrics_scraper.py --verbose --lyrics_dir='my_lyrics_folder' --songs_per_artists=2
bash
cat *_lyrics.txt > merged_lyrics.txt
A toolbox is also provided to analyze some of the dataset properties.
To run a quick analysis of any .txt
file, update the file to consider in pre_processing/analysis.py
then run:
bash
python pre_processing/analysis.py
Currently we get the songs by decreasing popularity order.
This project was intensively used to generate high quality text dataset that were consumed by:
RapLyrics-Back, to train and serve a lyrics-generative model.
RapLyrics-Front consumes the model trained and served by RapLyrics-Back enabling raplyrics.eu users to generate unique and inspirational lyrics.
Bumps certifi from 2018.4.16 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps ipython from 6.3.1 to 7.16.3.
d43c7c7
release 7.16.35fa1e40
Merge pull request from GHSA-pq7m-3gw7-gq5x8df8971
back to dev9f477b7
release 7.16.2138f266
bring back release helper from master branch5aa3634
Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...bcae8e0
Backport PR #13335: What's new 7.16.28fcdcd3
Pin Jedi to <0.17.2.2486838
release 7.16.120bdc6f
fix conda buildDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
data data-mining scraper beautiful-soup beautiful-soup-scraper genius genius-api genius-lyrics-search genius-lyrics music python python3 mit-license