Bringing back uncertainty to machine learning.
A Python package to include prediction intervals in the predictions of machine learning models, to quantify their uncertainty.
If you do not already have HDF5 installed, then start by installing that. On MacOS this
can be done using
sudo port install hdf5 after
MacPorts have been installed. On Ubuntu you can
get HDF5 with
sudo apt-get install python-dev python3-dev libhdf5-serial-dev. After
that, you can install
pip install doubt
If you already have a model in Scikit-Learn, then you can simply
wrap it in a
Boot to enable predicting with prediction intervals:
from sklearn.linear_model import LinearRegression from doubt import Boot from doubt.datasets import PowerPlant
X, y = PowerPlant().split() clf = Boot(LinearRegression()) clf = clf.fit(X, y) clf.predict([10, 30, 1000, 50], uncertainty=0.05) (481.9203102126274, array([473.43314309, 490.0313962 ])) ```
Alternatively, you can use one of the standalone models with uncertainty
outputs. For instance, a
from doubt import QuantileRegressionForest as QRF from doubt.datasets import Concrete import numpy as np
X, y = Concrete().split() clf = QRF(max_leaf_nodes=8) clf.fit(X, y) clf.predict(np.ones(8), uncertainty=0.25) (16.933590347847982, array([ 8.93456428, 26.0664534 ])) ```
So far all tests are doctests, which double as explanatory examples. However, we also need unit tests that test the edge cases of the functions.
Conformal Quantile Regression was introduced in Romano, Patterson & Candès and is a variant of quantile regression which calibrates the prediction intervals, yielding narrower intervals, while preserving theoretical coverage guarantees.
This could potentially be built into
QuantileLinearRegression via a
The Inductive Venn-Abers predictors (IVAPs) and Cross Venn-Abers predictors (CVAPs) was introduced in Vovk, Petej & Fedorova (2015), and can provide lower and upper bounds for probabilities in classification models. The IVAPs have theoretical guarantees, and the authors show empirically that the CVAPs might also enjoy this property.
QuantileRegressionForestwere the same. This has now been fixed. Thanks to @gugerlir for noticing this!
QuantileRegressionForesthas been changed to
random_stateto be consistent with
DecisionTreeRegressor, and to avoid an
AttributeErrorwhen accessing the estimators of a
QuantileRegressionForestnow has a
Boot.predictmethods are now parallelised, speeding up both training and prediction time a bit.
READMEto include generalised linear models, rather than only mentioning linear regression.
PyTorchmodel support, as that has not been implemented yet
QuantileRegressionForestalso displays a progress bar during inference now.
QuantileRegressionForest, which displays a progress bar during training.
QuantileRegressionForest.min_samples_leafhas changed from 1 to 5, to ensure that the quantiles can always be computed sensibly with the default setting.
logkowfeature in the
FishBioconcentrationdataset is now converted into a float, rather than a string.
QuantileLinearRegressionhas been removed, and
QuantileRegressorshould be used instead
Senior AI Specialist, Mathematics PhDGitHub Repository
uncertainty machine-learning prediction-intervals quantile-regression quantile-regression-forests confidence-intervals bootstrap