HMM and DTW-based sequence machine learning algorithms in Python following an sklearn-like interface.
About · Build Status · Features · Documentation · Examples · Acknowledgments · References · Contributors · Licensing
Sequentia is a Python package that provides various classification and regression algorithms for sequential data, including methods based on hidden Markov models and dynamic time warping.
Some examples of how Sequentia can be used on sequence data include:
| -------- | ------|
| | |
The following models provided by Sequentia all support variable length sequences.
Parameter estimation with the Baum-Welch algorithm and prediction with the forward algorithm 
Sequentia aims to follow the Scikit-Learn interface for estimators and transformations,
as well as to be largely compatible with three core Scikit-Learn modules to improve the ease of model development:
While there are many other modules, maintaining full compatibility with Scikit-Learn is challenging and many of its features are inapplicable to sequential data, therefore we only focus on the relevant core modules.
Despite some deviation from the Scikit-Learn interface in order to accommodate sequences, the following features are currently compatible with Sequentia.
FunctionTransformer— via an adapted class definition
Pipeline— via an adapted class definition
You can install Sequentia using
The latest stable version of Sequentia can be installed with the following command.
pip install sequentia
For optimal performance when using any of the k-NN based models, it is important that
dtaidistance C libraries are compiled correctly.
Please see the
dtaidistance installation guide for troubleshooting if you run into C compilation issues, or if setting
use_c=True on k-NN based models results in a warning.
You can use the following to check if the appropriate C libraries have been installed.
from dtaidistance import dtw
Pre-release versions include new features which are in active development and may change unpredictably.
The latest pre-release version can be installed with the following command.
pip install --pre sequentia
Please see the contribution guidelines to see installation instructions for contributing to Sequentia.
Documentation for the package is available on Read The Docs.
Demonstration of classifying multivariate sequences with two features into two classes using the
This example also shows a typical preprocessing workflow, as well as compatibility with Scikit-Learn.
```python import numpy as np
from sklearn.preprocessing import scale from sklearn.decomposition import PCA
from sequentia.models import KNNClassifier from sequentia.pipeline import Pipeline from sequentia.preprocessing import IndependentFunctionTransformer, mean_filter
X = np.array([ # Sequence 1 - Length 3 [1.2 , 7.91], [1.34, 6.6 ], [0.92, 8.08], # Sequence 2 - Length 5 [2.11, 6.97], [1.83, 7.06], [1.54, 5.98], [0.86, 6.37], [1.21, 5.8 ], # Sequence 3 - Length 2 [1.7 , 6.22], [2.01, 5.49] ])
lengths = np.array([3, 5, 2])
y = np.array([0, 1, 1])
pipeline = Pipeline([ ('denoise', IndependentFunctionTransformer(mean_filter)), ('scale', IndependentFunctionTransformer(scale)), ('pca', PCA(n_components=1)), ('knn', KNNClassifier(k=1)) ])
pipeline.fit(X, y, lengths)
y_pred = pipeline.predict(X, lengths) acc = pipeline.score(X, y, lengths) ```
In earlier versions of the package, an approximate DTW implementation
fastdtw was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper  claim that approximated DTW alignments can be computed in linear memory and time, compared to the O(N2) runtime complexity of the usual exact DTW implementation.
I was contacted by Prof. Eamonn Keogh whose work makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates . Upon switching from the
fastdtw package to
dtaidistance (a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.
I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding.
|||Lawrence R. Rabiner. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition" Proceedings of the IEEE 77 (1989), no. 2, 257-86.|
|||Stan Salvador & Philip Chan. "FastDTW: Toward accurate dynamic time warping in linear time and space." Intelligent Data Analysis 11.5 (2007), 561-580.|
|||Renjie Wu & Eamonn J. Keogh. "FastDTW is approximate and Generally Slower than the Algorithm it Approximates" IEEE Transactions on Knowledge and Data Engineering (2020), 1–1.|
All contributions to this repository are greatly appreciated. Contribution guidelines can be found here.
Sequentia is released under the MIT license.
Certain parts of the source code are heavily adapted from Scikit-Learn. Such files contain copy of their license.
Sequentia © 2019-2023, Edwin Onuonga - Released under the MIT license.
Authored and maintained by Edwin Onuonga.
scikit-learnvalidation constraints from
max_nbytes=Noneto fix read-only buffer source array error in
joblib.Parallel(see https://github.com/scikit-learn/scikit-learn/issues/7981). (#235)
sklearnversion specifier from
GaussianMixtureHMMparameter defaults for
init_paramsbeing modified. (#231)
numbersparameter name to
SequentialDatasetproperties to not return copies of arrays. (#231)
preprocessingmodule (temporarily until design is finalized). (#226)
datasetsmodule for sample datasets. (#226)
datasets.load_random_sequencesfor generating an arbitrarily sized dataset of sequences. (#216)
KNNClassifier.predict()to return class scores. (#213)
HMMClassifierstructure to match
KNNClassifierweighting option. (#192)
KNNClassifierlabel scoring bug - thanks @manisci. (#187)
digits.npzas package data in
CONTRIBUTING.mdCI instructions. (#219)
tslearnas a core dependency. (#216)
README.mdand documentation. (#202)
Jinja2dependency for RTD. (#188)
Learning to make machines learn.GitHub Repository Homepage
classification-algorithms machine-learning python time-series time-series-classification multivariate-timeseries dynamic-time-warping hidden-markov-models k-nearest-neighbor-classifier sequential-patterns sequence-classification dtw knn hmm variable-length