autocluster
is an automated machine learning (AutoML) toolkit for performing clustering tasks.
Report and presentation slides can be found here and here.
sudo apt-get install build-essential swig
conda install gxx_linux-64 gcc_linux-64 swig
pip install smac==0.8.0
pip install autocluster
autocluster
automatically optimizes the configuration of a clustering problem. By configuration, we mean
autocluster
provides 3 different approaches to optimize the configuration (with increasing complexity):
List of dimension reduction algorithms in sklearn
supported by autocluster
's optimizer.
List of clustering models in sklearn
supported by autocluster
's optimizer.
Examples are available in these notebooks.
This dataset comprises of 16 Gaussian clusters in 128-dimensional space with N = 1024
points. The optimal configuration obtained by autocluster
(SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.
This dataset comprises of 15 Gaussian clusters in 2-dimensional space with N = 5000 points
. The optimal configuration obtained by autocluster
(SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.
The project is experimental and still under development.
I am getting this error AttributeError: module 'pynisher' has no attribute 'enforce_limits'
This is the first google result that comes up when I searched for "python automl clustering" and is frankly a really great library. However, it's not maintained and installation has broken.
See
https://github.com/renxida/autocluster
for a version that works as of Jan 23 2023.
I have also submitted pull requests in the hope that the author comes back, and will gladly close this issue if this repo gets some love.
https://automl.github.io/SMAC3/stable/installation.html
This link is dead
What does metaknowledge mean in this repository ?
Bravo ! Hao Wang
Also, when attempting to the run the system at all I am consistently running into a core dump issue:
`>>> from autocluster import AutoCluster, get_evaluator
X, y = datasets.make_blobs(n_samples=1000, ... n_features=2, ... centers=6, ... cluster_std=0.5, ... shuffle=True, random_state=27) dummy_df = pd.DataFrame(X) dummy_df.head(5) 0 1 0 7.742343 -6.603815 1 8.726121 6.433689 2 -1.427522 5.393546 3 8.801468 -5.185687 4 -1.404321 9.526536 cluster = AutoCluster(logger=None) fit_params = { ... "df": dummy_df, ... "cluster_alg_ls": [ ... 'KMeans', 'GaussianMixture', 'MiniBatchKMeans' ... ], ... "dim_reduction_alg_ls": [ ... 'NullModel' ... ], ... "optimizer": 'smac', ... "n_evaluations": 40, ... "run_obj": 'quality', ... "seed": 27, ... "cutoff_time": 10, ... "preprocess_dict": { ... "numeric_cols": list(range(2)), ... "categorical_cols": [], ... "ordinal_cols": [], ... "y_col": [] ... }, ... "evaluator": get_evaluator(evaluator_ls = ['silhouetteScore', ... 'daviesBouldinScore', ... 'calinskiHarabaszScore'], ... weights = [1, 1, 1], ... clustering_num = None, ... min_proportion = .01, ... min_relative_proportion='default'), ... "n_folds": 3, ... "warmstart": False, ... "verbose_level": 1, ... } result_dict = cluster.fit(**fit_params) /home/wolvez/.local/lib/python3.8/site-packages/sklearn/ensemble/_iforest.py:252: FutureWarning: 'behaviour' is deprecated in 0.22 and will be removed in 0.24. You should not pass or set this parameter. warn( 664/1000 datapoints remaining after outlier removal Truncated n_evaluations: 40 Segmentation fault (core dumped)`
https://github.com/matplotlib/matplotlib/issues/13555
When running a base pip install I am consistently having the same issue.
` pip3 --no-cache-dir install autocluster
Looking in indexes: https://pypi.org/simple, https://1205d49dc47b4644d672f57e74f850e6342693e3f0b8cf0b:****@packagecloud.io/agrible/internal/pypi/simple
Collecting autocluster
Downloading autocluster-0.5.2-py3-none-any.whl (35 kB)
Requirement already satisfied: six>=1.5.0 in /usr/lib/python3/dist-packages (from autocluster) (1.14.0)
Collecting matplotlib==3.0.3
Downloading matplotlib-3.0.3.tar.gz (36.6 MB)
|████████████████████████████████| 36.6 MB 3.1 MB/s
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ecu3m8rg/matplotlib/setup.py'"'"'; file='"'"'/tmp/pip-install-ecu3m8rg/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-ecu3m8rg/matplotlib/pip-egg-info
cwd: /tmp/pip-install-ecu3m8rg/matplotlib/
Complete output (48 lines):
Traceback (most recent call last):
File "
BUILDING MATPLOTLIB
matplotlib: yes [3.0.3]
python: yes [3.8.2 (default, Apr 27 2020, 15:53:34) [GCC
9.3.0]]
platform: yes [linux]
REQUIRED DEPENDENCIES AND EXTENSIONS
numpy: yes [version 1.18.5]
install_requires: yes [handled by setuptools]
libagg: yes [pkg-config information for 'libagg' could not
be found. Using local copy.]
freetype: no [The C/C++ header for freetype2 (ft2build.h)
could not be found. You may need to install the
development package.]
png: no [pkg-config information for 'libpng' could not
be found.]
qhull: yes [pkg-config information for 'libqhull' could not
be found. Using local copy.]
OPTIONAL SUBPACKAGES
sample_data: yes [installing]
toolkits: yes [installing]
tests: no [skipping due to configuration]
toolkits_tests: no [skipping due to configuration]
OPTIONAL BACKEND EXTENSIONS
agg: yes [installing]
tkagg: yes [installing; run-time loading from Python Tcl /
Tk]
macosx: no [Mac OS-X only]
windowing: no [Microsoft Windows only]
OPTIONAL PACKAGE DATA
dlls: no [skipping due to configuration]
============================================================================
* The following required packages can not be built:
* freetype, png
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`
Machine Learning Engineer | Shopee SG | HKUST | Times Series Analysis, Deep Learning & Statistics
GitHub Repositoryhyperparameter-optimization bayesian-optimization automl clustering