Download mp3 files and combine them using: cat mp3.zip.* > single_mp3.zip Extract files from subfolders: find . -mindepth 2 -type f -print -exec mv {} . \;
In general, music audio files can be accompanied by metadata related to their content such as free text description or tags. Tags are proved to be more useful as they provide a more direct description of the audio file and can be used in tasks such as classification per gender, artist, musical instrument etc in recommendation systems related to music. As not all audio files are accompanied by tags, the need of auto-tagging arises.
One approach widely used involves the usage of unsupervised feature learning such as K-means, sparse coding and Boltzmann machines. In these cases, the main concern is the capture of low leve music structures that can be used as input into some classifier.
Another approach involves supervised methods, like Deep Neural Networks (DNNs) of various architectural types (MLP, CNN, RNN), that directly map labels to audio files. In these cases, the feature extraction method could vary a lot, from spectograms to hand-engineered features like Mel Frequency Cepstral Coefficients (MFCCs).
In more detail, a spectrogram of an audio clip is a visual representation of its spectrum of frequencies as they vary with time. A common variation is the mel spectrogram, where the so-called mel scale, i.e. a perceptual scale of pitches judged by listeners to be equal in distance from one another, is used. Additionally, MFCCs are coefficients that collectively make up a Mel Frequency Cepstrum (MFC), which is a representation of the short-term power spectrum of a asound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
For the purpose of this project, the method that was deployed involves the transformation of the initial sound files into 2D numerical matrices using both feature extraction techniques (mel spectrograms, MFCCs). These matrices were then fed into a CNN in order to assign suitable tags to each audio clip. Multiple architectures, from shallow to deeper, were evaluated.
Among many candidates, the one finally selected was the MagnaTagATune dataset which can be found here. The dataset consists of almost 26000 audio tracks of 29 seconds in mp3 format with 32 mbps, where the 188 available binary tags are not mutually exclusive (i.e. a file can be labeled with more than one tag). The dataset was preferred among all others, since:
The available tags include, among others music genres (e.g. "pop", "alternative", "indie"), instruments (e.g. "guitar", "violin", "piano"), emotions (e.g. "mellow", "chill") and origins (e.g. "middle eastern", "India"). There are also some tags representing the absence of a characteristic (e.g. "no beat", "no drums")
The tags of the original dataset displayed two main dysfunctionalities: synonym tags and extremely skewed tag distributions, i.e. existence of tags with rare occurence. To solve the above issues, in 1_label_reduction_and_files_renaming.py and aux_label_distribution.py, synonym and similar tags were merged into a single tage and eventually the 50 most popular of the merged tags were kept.
For each audio file, the below process was applied: - Selection of the desirable sampling rate. Uniform sample rates were used to allow easy batching of the data - Selection of the desirable number and type of features (mel spectrogram, MFCCs) to be extracted per sampling window. - Extraction of features from the .wav file, i.e. generation of a 2D array where the horizontal axis represents the sampling windows and the vertical axis represents the extracted features. - Storage of the above information in numpy array format.
The result of the process is the transformation of the audio wave into a 2D matrix of features. This is displayed in the following figure for three tracks where the frequencies are shown increasing up the vertical axis, time is depicted on the horizontal axis and wave amplitude is represented as color intensity.
The general idea for the CNN construction was to use the generated 2D numpy arrays as input into a chain of two or three convolutional layers (with or without max pooling), followed by a fully connected layer prior to the class prediction output (multi-hot vectors of 50 positions, one for each tag).
The finnaly selected architecture is the following:
As mentioned previously, the auto tagging task is a multi label classification problem. Thus, th ground truth labels provided along with the MagnaTagATune data are encoded as a 50x1 binary vector (multihot vector) where each dimension corresponds to a specific tag. Accordingly, the predictions are also represented as a vector of the same size, but in this case the values are not binary anymore; instead, the represent the probability of that label being assigned to the corresponding audio clip. By setting a threshold at 0.5 the final set of labels was acquired.
To assess the network's performance the Receiver Operating Characteristic Curve (AUC) was used which represents the probability that a random classifier will rank a randomly chosen positive instance higher that a randomly chosen negative one. Regarding the Accuracy and True Positive Rate per tag, it was observed that classes with few occurrences display a higher accuracy percentage. This happens due to the high value of True Negatives. In contrast, classes with frequent appearance, such as "guitar", "techno" and "classical", may have a lower accuracy percentage but they are the ones that the network has learnt better, since they have the highest True Positive Rate (Recall).
As mentioned before, the initial tagging of the clips included in the MagnaTagATune dataset is done by humans. Consequently, in many cases not all appropriate tags were assigned to the corresponding songs. Therefore, for the easier evaluation of the appropriateness of the assigned tags to each audio clip, a visual representation of the tags was constructed.
Words were represented as vectors with the use of a word to vector model. The word embeddings model was generated with Google News word to vector pre-trained model, comprised of 300 dimensions. The Google News Model is a skip-gram model with negative sampling that is proven to capture a large number of precise syntactic and semantic word relationships. Visualization of music labels in a 2D image required the dimensionality reduction from 300 dimensions to 2. Instead of the Principal Component Analysis (PCA) method, the Distributed Stochastic Neighbor Embedding method (t-SNE) was used. The t-SNE achieves dimensionality reduction by preserving, as much as possible, the original semantic similarity of the tags.
It should be noted that the cost function used by the method is non-convex while the optimization algorithm is the gradient descent. Thus, different executions of the algorithm would lead to different results. To accomplish a more comprehensive and interactive visualization Tensorflow Projector was used. For this purpose, t-SNE algorithm was implemented twicel; once to reduce Google News model dimensions from 300 to 4 and a second time when Tensorflow Projector was used.
Studying the produced visualization, it becomes evident that the majority of the semantically close tags are clustered together. Some indicative examples of clusters include:
To observe more closely the results of the CNN some indicative examples are presented below.
It is identifies that the provided dataset annotations in specific tracks may be insufficient. For example, for clip id 15317 the annotation is "no piano" which provides no information regarding the song itself. This consequently makes training of the CNN more challenging.
Furthermore, it should be noted that the predicted labels in the majority of cases may be not be 100% accurate but are semantically close to the provided ones. For example, it is evident that in clip id 35334 the annotation "guitar" has been translated to "bass" and "violin" which are all stringed instruments.
In addition, it is observed that the model predicts suitable tags even in cases where they are not part of the initial annotations, as happens with clip id 24335.
Bumps nbconvert from 5.3.1 to 6.5.1.
Sourced from nbconvert's releases.
Release 6.5.1
No release notes provided.
6.5.0
What's Changed
- Drop dependency on testpath. by
@anntzer
in jupyter/nbconvert#1723- Adopt pre-commit by
@blink1073
in jupyter/nbconvert#1744- Add pytest settings and handle warnings by
@blink1073
in jupyter/nbconvert#1745- Apply Autoformatters by
@blink1073
in jupyter/nbconvert#1746- Add git-blame-ignore-revs by
@blink1073
in jupyter/nbconvert#1748- Update flake8 config by
@blink1073
in jupyter/nbconvert#1749- support bleach 5, add packaging and tinycss2 dependencies by
@bollwyvl
in jupyter/nbconvert#1755- [pre-commit.ci] pre-commit autoupdate by
@pre-commit-ci
in jupyter/nbconvert#1752- update cli example by
@leahecole
in jupyter/nbconvert#1753- Clean up pre-commit by
@blink1073
in jupyter/nbconvert#1757- Clean up workflows by
@blink1073
in jupyter/nbconvert#1750New Contributors
@pre-commit-ci
made their first contribution in jupyter/nbconvert#1752Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5
6.4.3
What's Changed
- Add section to
customizing
showing how to use template inheritance by@stefanv
in jupyter/nbconvert#1719- Remove ipython genutils by
@rgs258
in jupyter/nbconvert#1727- Update changelog for 6.4.3 by
@blink1073
in jupyter/nbconvert#1728New Contributors
@stefanv
made their first contribution in jupyter/nbconvert#1719@rgs258
made their first contribution in jupyter/nbconvert#1727Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3
6.4.0
What's Changed
- Optionally speed up validation by
@gwincr11
in jupyter/nbconvert#1672- Adding missing div compared to JupyterLab DOM structure by
@SylvainCorlay
in jupyter/nbconvert#1678- Allow passing extra args to code highlighter by
@yuvipanda
in jupyter/nbconvert#1683- Prevent page breaks in outputs when printing by
@SylvainCorlay
in jupyter/nbconvert#1679- Add collapsers to template by
@SylvainCorlay
in jupyter/nbconvert#1689- Fix recent pandoc latex tables by adding calc and array (#1536, #1566) by
@cgevans
in jupyter/nbconvert#1686- Add an invalid notebook error by
@gwincr11
in jupyter/nbconvert#1675- Fix typos in execute.py by
@TylerAnderson22
in jupyter/nbconvert#1692- Modernize latex greek math handling (partially fixes #1673) by
@cgevans
in jupyter/nbconvert#1687- Fix use of deprecated API and update test matrix by
@blink1073
in jupyter/nbconvert#1696- Update nbconvert_library.ipynb by
@letterphile
in jupyter/nbconvert#1695- Changelog for 6.4 by
@blink1073
in jupyter/nbconvert#1697New Contributors
... (truncated)
7471b75
Release 6.5.1c1943e0
Fix pre-commit8685e93
Fix tests0abf290
Run black and prettier418d545
Run test on 6.x branchbef65d7
Convert input to string prior to escape HTML0818628
Check input type before escapingb206470
GHSL-2021-1017, GHSL-2021-1020, GHSL-2021-1021a03cbb8
GHSL-2021-1026, GHSL-2021-102548fe71e
GHSL-2021-1024Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pyarrow from 0.12.0 to 0.15.1.
Sourced from pyarrow's changelog.
Apache Arrow 0.15.1 (25 October 2019)
Bug
- ARROW-6464 - [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
- ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
- ARROW-6740 - [Python] Unable to delete closed MemoryMappedFile on Windows
- ARROW-6762 - [C++] JSON reader segfaults on newline
- ARROW-6795 - [C#] Reading large Arrow files in C# results in an exception
- ARROW-6806 - [C++] Segfault deserializing ListArray containing null/empty list
- ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
- ARROW-6834 - [C++] Pin gtest to 1.8.1 to triage failing Appveyor / MSVC build
- ARROW-6844 - [C++][Parquet][Python] List columns read broken with 0.15.0
- ARROW-6857 - [Python][C++] Segfault for dictionary_encode on empty chunked_array (edge case)
- ARROW-6860 - [Python] Only link libarrow_flight.so to pyarrow._flight
- ARROW-6861 - [Python] arrow-0.15.0 reading arrow-0.14.1-output Parquet dictionary column: Failure reading column: IOError: Arrow error: Invalid: Resize cannot downsize
- ARROW-6869 - [C++] Dictionary "delta" building logic in builder_dict.h produces invalid arrays
- ARROW-6873 - [Python] Stale CColumn reference break Cython cimport pyarrow
- ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype
- ARROW-6876 - [Python] Reading parquet file with many columns becomes slow for 0.15.0
- ARROW-6877 - [C++] Boost not found from the correct environment
- ARROW-6878 - [Python] pa.array() does not handle list of dicts with bytes keys correctly under python3
- ARROW-6882 - [Python] cannot create a chunked_array from dictionary_encoding result
- ARROW-6886 - [C++] arrow::io header nvcc compiler warnings
- ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
- ARROW-6903 - [Python] Wheels broken after ARROW-6860 changes
- ARROW-6905 - [Packaging][OSX] Nightly builds on MacOS are failing because of brew compile timeouts
- ARROW-6910 - [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits
- ARROW-6922 - [Python] Pandas master build is failing (MultiIndex.levels change)
- ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
- ARROW-6938 - [Python] Windows wheel depends on zstd.dll and libbz2.dll, which are not bundled
- ARROW-6962 - [C++] [CI] Stop compiling with -Weverything
- ARROW-6977 - [C++] Only enable jemalloc background_thread if feature is supported
- ARROW-6983 - [C++] Threaded task group crashes sometimes
Improvement
- ARROW-6610 - [C++] Add ARROW_FILESYSTEM=ON/OFF CMake configuration flag
- ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
- ARROW-6852 - [C++] memory-benchmark build failed on Arm64
- ARROW-6927 - [C++] Add gRPC version check
- ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds
New Feature
- ARROW-6661 - [Java] Implement APIs like slice to enhance VectorSchemaRoot
Apache Arrow 0.15.0 (30 September 2019)
Bug
... (truncated)
b789226
[maven-release-plugin] prepare release apache-arrow-0.15.16687aed
[Release] Set Maven versions to 0.15.1-SNAPSHOTe9f7864
[Release] Update versions for 0.15.1c7e9fe4
[Release] Update .deb/.rpm changelogs for 0.15.16a94d72
[Release] Update CHANGELOG.md for 0.15.1d02f74e
ARROW-6962: [C++] [CI] Stop compiling with -Weverything28d6d97
ARROW-6963: [Packaging][Wheel][OSX] Use crossbow's command to deploy artifact...a83a0f0
ARROW-6983: [C++] Fix ThreadedTaskGroup lifetime issue4142ed5
ARROW-6977: [C++] Disable jemalloc background_thread on macOSc20eceb
ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mistune from 0.8.3 to 2.0.3.
Sourced from mistune's releases.
Version 2.0.2
Fix
escape_url
via lepture/mistune#295Version 2.0.1
Fix XSS for image link syntax.
Version 2.0.0
First release of Mistune v2.
Version 2.0.0 RC1
In this release, we have a Security Fix for harmful links.
Version 2.0.0 Alpha 1
This is the first release of v2. An alpha version for users to have a preview of the new mistune.
Version 0.8.4
Sourced from mistune's changelog.
Changelog
Here is the full history of mistune v2.
Version 2.0.4
Released on Jul 15, 2022
- Fix
url
plugin in<a>
tag- Fix
*
formattingVersion 2.0.3
Released on Jun 27, 2022
- Fix
table
plugin- Security fix for CVE-2022-34749
Version 2.0.2
Released on Jan 14, 2022
Fix
escape_url
Version 2.0.1
Released on Dec 30, 2021
XSS fix for image link syntax.
Version 2.0.0
Released on Dec 5, 2021
This is the first non-alpha release of mistune v2.
Version 2.0.0rc1
Released on Feb 16, 2021
Version 2.0.0a6
</tr></table>
... (truncated)
3f422f1
Version bump 2.0.3a6d4321
Fix asteris emphasis regex CVE-2022-347495638e46
Merge pull request #307 from jieter/patch-10eba471
Fix typo in guide.rst61e9337
Fix table plugin76dec68
Add documentation for renderer heading when TOC enabled799cd11
Version bump 2.0.2babb0cf
Merge pull request #295 from dairiki/bug.escape_urlfc2cd53
Make mistune.util.escape_url less aggressive3e8d352
Version bump 2.0.1Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps distributed from 1.21.8 to 2021.10.0.
63ebaea
bump version to 2021.10.01670cf8
Ensure resumed flight tasks are still fetched (#5426)cdc68cc
AMM high level documentation (#5456)33d83bc
Provide stack for suspended coro in test timeout (#5446)918e3fb
Handle UCXNotConnected
error (#5449)3afc670
AMM: Don't schedule tasks to paused workers (#5431)f3aa9d1
Use pip install .
instead of calling setup.py
(#5442)a8151a6
Increase latency for stealing (#5390)1585f85
Type annotations for Worker and gen_cluster (#5438)7d2516a
Ensure reconnecting workers do not loose required data (#5436)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps lxml from 4.6.3 to 4.9.1.
Sourced from lxml's changelog.
4.9.1 (2022-07-01)
Bugs fixed
- A crash was resolved when using
iterwalk()
(orcanonicalize()
) after parsing certain incorrect input. Note thatiterwalk()
can crash on valid input parsed with the same parser after failing to parse the incorrect input.4.9.0 (2022-06-01)
Bugs fixed
- GH#341: The mixin inheritance order in
lxml.html
was corrected. Patch by xmo-odoo.Other changes
Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.
Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).
GH#343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.
4.8.0 (2022-02-17)
Features added
GH#337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.
The
ElementMaker
now supportsQName
values as tags, which always override the default namespace of the factory.Bugs fixed
- GH#338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.
... (truncated)
d01872c
Prevent parse failure in new test from leaking into later test runs.d65e632
Prepare release of lxml 4.9.1.86368e9
Fix a crash when incorrect parser input occurs together with usages of iterwa...50c2764
Delete unused Travis CI config and reference in docs (GH-345)8f0bf2d
Try to speed up the musllinux AArch64 build by splitting the different CPytho...b9f7074
Remove debug print from test.b224e0f
Try to install 'xz' in wheel builds, if available, since it's now needed to e...897ebfa
Update macOS deployment target version from 10.14 to 10.15 since 10.14 starts...853c9e9
Prepare release of 4.9.0.d3f77e6
Add a test for https://bugs.launchpad.net/lxml/+bug/1965070 leaving out the a...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps numpy from 1.16.6 to 1.22.0.
Sourced from numpy's releases.
v1.22.0
NumPy 1.22.0 Release Notes
NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:
- Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
- A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
- NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
- New methods for
quantile
,percentile
, and related functions. The new methods provide a complete set of the methods commonly found in the literature.- A new configurable allocator for use by downstream projects.
These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.
The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.
Expired deprecations
Deprecated numeric style dtype strings have been removed
Using the strings
"Bytes0"
,"Datetime64"
,"Str0"
,"Uint32"
, and"Uint64"
as a dtype will now raise aTypeError
.(gh-19539)
Expired deprecations for
loads
,ndfromtxt
, andmafromtxt
in npyio
numpy.loads
was deprecated in v1.15, with the recommendation that users usepickle.loads
instead.ndfromtxt
andmafromtxt
were both deprecated in v1.17 - users should usenumpy.genfromtxt
instead with the appropriate value for theusemask
parameter.(gh-19615)
... (truncated)
4adc87d
Merge pull request #20685 from charris/prepare-for-1.22.0-releasefd66547
REL: Prepare for the NumPy 1.22.0 release.125304b
wipc283859
Merge pull request #20682 from charris/backport-204165399c03
Merge pull request #20681 from charris/backport-20954f9c45f8
Merge pull request #20680 from charris/backport-20663794b36f
Update armccompiler.pyd93b14e
Update test_public_api.py7662c07
Update init.py311ab52
Update armccompiler.pyDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
audio-files convolutional-neural-networks music-audio-files music