A python package to assess cluster tendency

lachhebo, updated πŸ•₯ 2022-12-27 17:39:21

pyclustertend

Build Status PyPi Status Documentation Status Downloads codecov DOI

pyclustertend is a python package specialized in cluster tendency. Cluster tendency consist to assess if clustering algorithms are relevant for a dataset.

Three methods for assessing cluster tendency are currently implemented and one additional method based on metrics obtained with a KMeans estimator :

  • [x] Hopkins Statistics
  • [x] VAT
  • [x] iVAT

  • [x] Metric based method (silhouette, calinksi, davies bouldin)

Installation

shell pip install pyclustertend

Usage

Example Hopkins

python >>>from sklearn import datasets >>>from pyclustertend import hopkins >>>from sklearn.preprocessing import scale >>>X = scale(datasets.load_iris().data) >>>hopkins(X,150) 0.18950453452838564

Example VAT

python >>>from sklearn import datasets >>>from pyclustertend import vat >>>from sklearn.preprocessing import scale >>>X = scale(datasets.load_iris().data) >>>vat(X)

Example iVat

python >>>from sklearn import datasets >>>from pyclustertend import ivat >>>from sklearn.preprocessing import scale >>>X = scale(datasets.load_iris().data) >>>ivat(X)

Notes

It's preferable to scale the data before using hopkins or vat algorithm as they use distance between observations. Moreover, vat and ivat algorithms do not really fit to massive databases. A first solution is to sample the data before using those algorithms.

Issues

build(deps): bump setuptools from 65.5.0 to 65.5.1

opened on 2022-12-27 17:39:20 by dependabot[bot]

Bumps setuptools from 65.5.0 to 65.5.1.

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

  • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
  • #3659: Fixed REDoS vector in package_index.
Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lachhebo/pyclustertend/network/alerts).

build(deps): bump certifi from 2022.9.24 to 2022.12.7

opened on 2022-12-09 09:03:57 by dependabot[bot]

Bumps certifi from 2022.9.24 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lachhebo/pyclustertend/network/alerts).

The usage criterion for Hopkins Statistic is not consistent with other source

opened on 2022-07-01 06:49:19 by bonehead

I can see that the usage condition is still contradictory to both Wikipedia as well as the reference in Pyclusterend docs. Both Wikipedia and the reference indicate that a value closer to 1 indicate presence of clusters.

Bug?

opened on 2022-06-16 12:00:32 by Muuuh219

Hello,

First of all, thank you for implementiv VAT and iVAT! In the function compute_ivat_ordered_dissimilarity_matrix, the iVAT matrix is computed, but it is not symmetric in the end. Is is possible that there is some bug? I fixed it for now with the following code: I changed return re_ordered_matrix to for i in range(re_ordered_matrix.shape[0]): for j in range(i): re_ordered_matrix[j, i] = re_ordered_matrix[i, j] return re_ordered_matrix

Sincere regards!

Release package to conda-forge

opened on 2021-03-15 18:29:04 by radiantly

Firstly - great project! Thank you for your great work!

Coming to the issue, I use conda to manage my python dependencies, and it'd be really great if it were possible to release the package on conda-forge, so that it can easily be installed using conda.

Would love to know your thoughts on the same, thanks!

Releases

v1.7.0 2021-12-06 14:30:04

  • add numba jit compilation to speed up VAT algorithm

1.6.0 2021-07-06 18:09:59

  • fix issue: ivat was not symmetric

dev: - flake8 - black

v1.5.0 2021-02-26 19:08:42

  • Switch to poetry (pyproject.toml)
  • Switch to github actions instead of travis
  • improve depedency management
  • tests against python version 3.8 and 3.9

1.4.9 2020-03-23 20:27:54

First scikit-learn compatible version

Release of pyclustertend 2019-10-01 09:49:37

IsmaΓ―l Lachheb

Software Engineer & Data Scientist

GitHub Repository Homepage

clustering clustertendency cluster-analysis scikit-learn machine-learning hopkins statistics visual-assessment-cluster-tendency data-science vat cluster-tendency python ivat