HMM and DTW-based sequence machine learning algorithms in Python following an sklearn-like interface.
About · Build Status · Features · Documentation · Examples · Acknowledgments · References · Contributors · Licensing
Sequentia is a Python package that provides various classification and regression algorithms for sequential data, including methods based on hidden Markov models and dynamic time warping.
Some examples of how Sequentia can be used on sequence data include:
| master
| dev
|
| -------- | ------|
| |
|
The following models provided by Sequentia all support variable length sequences.
dtaidistance
)hmmlearn
)Parameter estimation with the Baum-Welch algorithm and prediction with the forward algorithm [1]
Sequentia aims to follow the Scikit-Learn interface for estimators and transformations,
as well as to be largely compatible with three core Scikit-Learn modules to improve the ease of model development:
preprocessing
, model_selection
and pipeline
.
While there are many other modules, maintaining full compatibility with Scikit-Learn is challenging and many of its features are inapplicable to sequential data, therefore we only focus on the relevant core modules.
Despite some deviation from the Scikit-Learn interface in order to accommodate sequences, the following features are currently compatible with Sequentia.
preprocessing
FunctionTransformer
— via an adapted class definitionpipeline
Pipeline
— via an adapted class definitionFeatureUnion
model_selection
You can install Sequentia using pip
.
The latest stable version of Sequentia can be installed with the following command.
console
pip install sequentia
For optimal performance when using any of the k-NN based models, it is important that dtaidistance
C libraries are compiled correctly.
Please see the dtaidistance
installation guide for troubleshooting if you run into C compilation issues, or if setting use_c=True
on k-NN based models results in a warning.
You can use the following to check if the appropriate C libraries have been installed.
python
from dtaidistance import dtw
dtw.try_import_c()
Pre-release versions include new features which are in active development and may change unpredictably.
The latest pre-release version can be installed with the following command.
console
pip install --pre sequentia
Please see the contribution guidelines to see installation instructions for contributing to Sequentia.
Documentation for the package is available on Read The Docs.
Demonstration of classifying multivariate sequences with two features into two classes using the KNNClassifier
.
This example also shows a typical preprocessing workflow, as well as compatibility with Scikit-Learn.
```python import numpy as np
from sklearn.preprocessing import scale from sklearn.decomposition import PCA
from sequentia.models import KNNClassifier from sequentia.pipeline import Pipeline from sequentia.preprocessing import IndependentFunctionTransformer, mean_filter
X = np.array([ # Sequence 1 - Length 3 [1.2 , 7.91], [1.34, 6.6 ], [0.92, 8.08], # Sequence 2 - Length 5 [2.11, 6.97], [1.83, 7.06], [1.54, 5.98], [0.86, 6.37], [1.21, 5.8 ], # Sequence 3 - Length 2 [1.7 , 6.22], [2.01, 5.49] ])
lengths = np.array([3, 5, 2])
y = np.array([0, 1, 1])
pipeline = Pipeline([ ('denoise', IndependentFunctionTransformer(mean_filter)), ('scale', IndependentFunctionTransformer(scale)), ('pca', PCA(n_components=1)), ('knn', KNNClassifier(k=1)) ])
pipeline.fit(X, y, lengths)
y_pred = pipeline.predict(X, lengths) acc = pipeline.score(X, y, lengths) ```
In earlier versions of the package, an approximate DTW implementation fastdtw
was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper [2] claim that approximated DTW alignments can be computed in linear memory and time, compared to the O(N2) runtime complexity of the usual exact DTW implementation.
I was contacted by Prof. Eamonn Keogh whose work makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates [3]. Upon switching from the fastdtw
package to dtaidistance
(a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.
I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding.
All contributions to this repository are greatly appreciated. Contribution guidelines can be found here.
eonu |
Prhmma |
manisci |
jonnor |
---|
Sequentia is released under the MIT license.
Certain parts of the source code are heavily adapted from Scikit-Learn. Such files contain copy of their license.
Sequentia © 2019-2023, Edwin Onuonga - Released under the MIT license.
Authored and maintained by Edwin Onuonga.
scikit-learn
validation constraints from IndependentFunctionTransformer
. (#237)max_nbytes=None
to fix read-only buffer source array error in joblib.Parallel
(see https://github.com/scikit-learn/scikit-learn/issues/7981). (#235)sequentia.preprocessing
module with sklearn.preprocessing
compatibility. (#234)sequentia.pipeline
module for sklearn.pipeline
compatibility. (#234)sklearn
version specifier from >=0.22
to >=1.0
. (#234)CategoricalHMM
and GaussianMixtureHMM
parameter defaults for params
/init_params
being modified. (#231)CategoricalHMM
and GaussianMixtureHMM
unfreeze()
calling super().freeze()
instead of super().unfreeze()
. (#231)_KNNMixin
when weighting=None
. (#231)load_digits
numbers
parameter name to digits
. (#231)SequentialDataset
properties to not return copies of arrays. (#231)SequentialDataset.__eq__
. (#231)HMMClassifier
prior
default to None
. (#231)preprocessing
module (temporarily until design is finalized). (#226)datasets
module for sample datasets. (#226)datasets.load_random_sequences
for generating an arbitrarily sized dataset of sequences. (#216)DeepGRU
and classifier.rnn
module. (#215)sequentia.datasets
module. (#214)return_scores
argument to KNNClassifier.predict()
to return class scores. (#213)self
in fit()
functions. (#213)hmmlearn
v0.2.7. (#201)HMMClassifier
structure to match KNNClassifier
. (#200)'uniform'
KNNClassifier
weighting option. (#192)KNNClassifier
label scoring bug - thanks @manisci. (#187)digits.npz
as package data in setup.py
. (#221)CONTRIBUTING.md
CI instructions. (#219)datasets
module. (#217)tslearn
as a core dependency. (#216)torchaudio
, torchvision
and torchfsdd
dependencies. (#214)play_audio
helper. (#214)README.md
and documentation. (#202)Jinja2
dependency for RTD. (#188)classification-algorithms machine-learning python time-series time-series-classification multivariate-timeseries dynamic-time-warping hidden-markov-models k-nearest-neighbor-classifier sequential-patterns sequence-classification dtw knn hmm variable-length