Multi-class imbalance is a common problem occurring in real-world supervised classifications tasks. While there has already been some research on the specialized methods aiming to tackle that challenging problem, most of them still lack coherent Python implementation that is simple, intuitive and easy to use. multi-imbalance is a python package tackling the problem of multi-class imbalanced datasets in machine learning.
Tha package has been tested under python 3.6, 3.7 and 3.8. It relies heavily on scikit-learn and typical scientific stack (numpy, scipy, pandas etc.). Requirements include: * numpy>=1.17.0, * scikit-learn>=0.22.0, * pandas>=0.25.1, * pytest>=5.1.2, * imbalanced-learn>=0.6.1 * IPython>=7.13.0, * seaborn>=0.10.1, * matplotlib>=3.2.1
Just type in
bash
pip install multi-imbalance
Our package includes implementation of such algorithms, as: * One-vs-One (OVO) and One-vs-all (OVA) ensembles [2], * Error-Correcting Output Codes (ECOC) [1] with dense, sparse and complete encoding [9] , * Global-CS [4], * Static-SMOTE [10], * Mahalanobis Distance Oversampling [3], * Similarity-based Oversampling and Undersampling Preprocessing (SOUP) [5], * SPIDER3 cost-sensitive pre-processing [8]. * Multi-class Roughly Balanced Bagging (MRBB) [7], * SOUP Bagging [6],
```python from multi_imbalance.resampling.mdo import MDO
mdo = MDO(k=9, k1_frac=0, seed=0)
X_train, y_train, X_test, y_test = ...
X_train_resampled, y_train_resampled = mdo.fit_transform(np.copy(X_train), np.copy(y_train))
clf_tree = DecisionTreeClassifier(random_state=0) clf_tree.fit(X_train_resampled, y_train_resampled)
y_pred = clf_tree.predict(X_test) ```
At the moment, due to some sklearn's limitations the only way to use our resampling methods is to use the pipelines implemented in imbalanced-learn. It doesn't apply to ensemble methods. ```python from imblearn.pipeline import Pipeline
X, y = load_arff_dataset('data/arff/new_ecoli.arff') X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
pipeline = Pipeline([ ('scaler', StandardScaler()), ('mdo', MDO()), ('knn', KNN()) ])
pipeline.fit(X_train, y_train) y_hat = pipeline.predict(X_test)
print(classification_report(y_test, y_hat)) ```
For more examples please refer to https://multi-imbalance.readthedocs.io/en/latest/ or check examples
directory.
multi-imbalance follows sklearn's coding guideline: https://scikit-learn.org/stable/developers/contributing.html
We use pytest as our unit tests framework. To use it, simply run:
bash
pytest
If you would like to check the code coverage:
bash
coverage run -m pytest
coverage report -m # or coverage html
multi-imbalance uses reStructuredText markdown for docstrings. To build the documentation locally run:
bash
cd docs
make html -B
and open docs/_build/html/index.html
if you add a new algorithm, we would appreciate if you include references and an example of use in ./examples
or docstrings.
If you use multi-imbalance in a scientific publication, please consider including citation to the following thesis:
@InProceedings{10.1007/978-3-030-67670-4_36,
author="Grycza, Jacek and Horna, Damian and Klimczak, Hanna and Lango, Mateusz and Pluci{\'{n}}ski, Kamil and Stefanowski, Jerzy",
title="multi-imbalance: Open Source Python Toolbox for Multi-class Imbalanced Classification",
booktitle="Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="546--549",
isbn="978-3-030-67670-4"
}
[1] Dietterich, T., and Bakiri, G. Solving multi-class learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2 (02 1995), 263–286.
[2] Fernández, A., López, V., Galar, M., del Jesus, M., and Herrera, F. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems 42 (2013), 97 – 110.
[3] Abdi, L., and Hashemi, S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28 (January 2016), 238–251.
[4] Zhou, Z., and Liu, X. On multi-class cost-sensitive learning. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1 (2006), AAAI’06, AAAI Press, pp. 567–572.
[5] Janicka, M., Lango, M., and Stefanowski, J. Using information on class interrelations to improve classification of multi-class imbalanced data: A new resampling algorithm. International Journal of Applied Mathematics and Computer Science 29 (December 2019).
[6] Lango, M., and Stefanowski, J. SOUP-Bagging: a new approach for multi-class imbalanced data classification. PP-RAI ’19: Polskie Porozumienie na Rzecz Sztucznej Inteligencji (2019).
[7] Lango, M., and Stefanowski, J. Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data. J Intell Inf Syst 50 (2017), 97–127
[8] Wojciechowski, S., Wilk, S., and Stefanowski, J. An algorithm for selective preprocessing of multi-class imbalanced data. In Proceedings of the 10th International Conference on Computer Recognition Systems (05 2017), pp. 238–247.
[9] Kuncheva, L. Combining Pattern Classifiers: Methods and Algorithms. Wiley (2004).
[10] Fernández-Navarro, F., Hervás-Martínez, C., and Antonio Gutiérrez, P. A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8), 1821–1833 (2011).
Bumps ipython from 7.13.0 to 8.10.0.
Sourced from ipython's releases.
See https://pypi.org/project/ipython/
We do not use GitHub release anymore. Please see PyPI https://pypi.org/project/ipython/
15ea1ed
release 8.10.0560ad10
DOC: Update what's new for 8.10 (#13939)7557ade
DOC: Update what's new for 8.10385d693
Merge pull request from GHSA-29gw-9793-fvw7e548ee2
Swallow potential exceptions from showtraceback() (#13934)0694b08
MAINT: mock slowest test. (#13885)8655912
MAINT: mock slowest test.a011765
Isolate the attack tests with setUp and tearDown methodsc7a9470
Add some regression tests for this changefd34cf5
Swallow potential exceptions from showtraceback()Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
from multi_imbalance.resampling.mdo import MDO
mdo = MDO(k=9, k1_frac=0, seed=0)
X_train_bal, y_train_bal = mdo.fit_resample(X_train, Y_train)
Error 👍 'MDO' object has no attribute '_parameter_constraints'
Bumps certifi from 2020.4.5.1 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps joblib from 0.14.1 to 1.2.0.
Sourced from joblib's changelog.
Release 1.2.0
Fix a security issue where
eval(pre_dispatch)
could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256
Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263
Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with
mmap_mode != None
as the resultingnumpy.memmap
object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.
Vendor loky 3.3.0 which fixes several bugs including:
robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);
avoiding leaking worker processes in case of nested loky parallel calls;
reliability spawn the correct number of reusable workers.
Release 1.1.0
Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181
Fix joblib.Memory bug with the
ignore
parameter when the cached function is a decorated function.
... (truncated)
5991350
Release 1.2.03fa2188
MAINT cleanup numpy warnings related to np.matrix in tests (#1340)cea26ff
CI test the future loky-3.3.0 branch (#1338)8aca6f4
MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)067ed4f
XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)ac4ebd5
MAINT add back pytest warnings plugin (#1337)a23427d
Test child raises parent exits cleanly more reliable on macos (#1335)ac09691
[MAINT] various test updates (#1334)4a314b1
Vendor loky 3.2.0 (#1333)bdf47e9
Make test_parallel_with_interactively_defined_functions_default_backend timeo...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps nbconvert from 5.6.1 to 6.5.1.
Sourced from nbconvert's releases.
Release 6.5.1
No release notes provided.
6.5.0
What's Changed
- Drop dependency on testpath. by
@anntzer
in jupyter/nbconvert#1723- Adopt pre-commit by
@blink1073
in jupyter/nbconvert#1744- Add pytest settings and handle warnings by
@blink1073
in jupyter/nbconvert#1745- Apply Autoformatters by
@blink1073
in jupyter/nbconvert#1746- Add git-blame-ignore-revs by
@blink1073
in jupyter/nbconvert#1748- Update flake8 config by
@blink1073
in jupyter/nbconvert#1749- support bleach 5, add packaging and tinycss2 dependencies by
@bollwyvl
in jupyter/nbconvert#1755- [pre-commit.ci] pre-commit autoupdate by
@pre-commit-ci
in jupyter/nbconvert#1752- update cli example by
@leahecole
in jupyter/nbconvert#1753- Clean up pre-commit by
@blink1073
in jupyter/nbconvert#1757- Clean up workflows by
@blink1073
in jupyter/nbconvert#1750New Contributors
@pre-commit-ci
made their first contribution in jupyter/nbconvert#1752Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5
6.4.3
What's Changed
- Add section to
customizing
showing how to use template inheritance by@stefanv
in jupyter/nbconvert#1719- Remove ipython genutils by
@rgs258
in jupyter/nbconvert#1727- Update changelog for 6.4.3 by
@blink1073
in jupyter/nbconvert#1728New Contributors
@stefanv
made their first contribution in jupyter/nbconvert#1719@rgs258
made their first contribution in jupyter/nbconvert#1727Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3
6.4.0
What's Changed
- Optionally speed up validation by
@gwincr11
in jupyter/nbconvert#1672- Adding missing div compared to JupyterLab DOM structure by
@SylvainCorlay
in jupyter/nbconvert#1678- Allow passing extra args to code highlighter by
@yuvipanda
in jupyter/nbconvert#1683- Prevent page breaks in outputs when printing by
@SylvainCorlay
in jupyter/nbconvert#1679- Add collapsers to template by
@SylvainCorlay
in jupyter/nbconvert#1689- Fix recent pandoc latex tables by adding calc and array (#1536, #1566) by
@cgevans
in jupyter/nbconvert#1686- Add an invalid notebook error by
@gwincr11
in jupyter/nbconvert#1675- Fix typos in execute.py by
@TylerAnderson22
in jupyter/nbconvert#1692- Modernize latex greek math handling (partially fixes #1673) by
@cgevans
in jupyter/nbconvert#1687- Fix use of deprecated API and update test matrix by
@blink1073
in jupyter/nbconvert#1696- Update nbconvert_library.ipynb by
@letterphile
in jupyter/nbconvert#1695- Changelog for 6.4 by
@blink1073
in jupyter/nbconvert#1697New Contributors
... (truncated)
7471b75
Release 6.5.1c1943e0
Fix pre-commit8685e93
Fix tests0abf290
Run black and prettier418d545
Run test on 6.x branchbef65d7
Convert input to string prior to escape HTML0818628
Check input type before escapingb206470
GHSL-2021-1017, GHSL-2021-1020, GHSL-2021-1021a03cbb8
GHSL-2021-1026, GHSL-2021-102548fe71e
GHSL-2021-1024Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mistune from 0.8.4 to 2.0.3.
Sourced from mistune's releases.
Version 2.0.2
Fix
escape_url
via lepture/mistune#295Version 2.0.1
Fix XSS for image link syntax.
Version 2.0.0
First release of Mistune v2.
Version 2.0.0 RC1
In this release, we have a Security Fix for harmful links.
Version 2.0.0 Alpha 1
This is the first release of v2. An alpha version for users to have a preview of the new mistune.
Sourced from mistune's changelog.
Changelog
Here is the full history of mistune v2.
Version 2.0.4
Released on Jul 15, 2022
- Fix
url
plugin in<a>
tag- Fix
*
formattingVersion 2.0.3
Released on Jun 27, 2022
- Fix
table
plugin- Security fix for CVE-2022-34749
Version 2.0.2
Released on Jan 14, 2022
Fix
escape_url
Version 2.0.1
Released on Dec 30, 2021
XSS fix for image link syntax.
Version 2.0.0
Released on Dec 5, 2021
This is the first non-alpha release of mistune v2.
Version 2.0.0rc1
Released on Feb 16, 2021
Version 2.0.0a6
</tr></table>
... (truncated)
3f422f1
Version bump 2.0.3a6d4321
Fix asteris emphasis regex CVE-2022-347495638e46
Merge pull request #307 from jieter/patch-10eba471
Fix typo in guide.rst61e9337
Fix table plugin76dec68
Add documentation for renderer heading when TOC enabled799cd11
Version bump 2.0.2babb0cf
Merge pull request #295 from dairiki/bug.escape_urlfc2cd53
Make mistune.util.escape_url less aggressive3e8d352
Version bump 2.0.1Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
multi-class-imbalance class-imbalance machine-learning preprocessing ensembles smote resampling decomposition decision-trees bagging python python-package undersampling oversampling balancing