Bringing back uncertainty to machine learning.
A Python package to include prediction intervals in the predictions of machine learning models, to quantify their uncertainty.
If you do not already have HDF5 installed, then start by installing that. On MacOS this
can be done using sudo port install hdf5
after
MacPorts have been installed. On Ubuntu you can
get HDF5 with sudo apt-get install python-dev python3-dev libhdf5-serial-dev
. After
that, you can install doubt
with pip
:
shell
pip install doubt
If you already have a model in Scikit-Learn, then you can simply
wrap it in a Boot
to enable predicting with prediction intervals:
```python
from sklearn.linear_model import LinearRegression from doubt import Boot from doubt.datasets import PowerPlant
X, y = PowerPlant().split() clf = Boot(LinearRegression()) clf = clf.fit(X, y) clf.predict([10, 30, 1000, 50], uncertainty=0.05) (481.9203102126274, array([473.43314309, 490.0313962 ])) ```
Alternatively, you can use one of the standalone models with uncertainty
outputs. For instance, a QuantileRegressionForest
:
```python
from doubt import QuantileRegressionForest as QRF from doubt.datasets import Concrete import numpy as np
X, y = Concrete().split() clf = QRF(max_leaf_nodes=8) clf.fit(X, y) clf.predict(np.ones(8), uncertainty=0.25) (16.933590347847982, array([ 8.93456428, 26.0664534 ])) ```
So far all tests are doctests, which double as explanatory examples. However, we also need unit tests that test the edge cases of the functions.
Conformal Quantile Regression was introduced in Romano, Patterson & Candès and is a variant of quantile regression which calibrates the prediction intervals, yielding narrower intervals, while preserving theoretical coverage guarantees.
This could potentially be built into QuantileLinearRegression
via a conformal
argument.
The Inductive Venn-Abers predictors (IVAPs) and Cross Venn-Abers predictors (CVAPs) was introduced in Vovk, Petej & Fedorova (2015), and can provide lower and upper bounds for probabilities in classification models. The IVAPs have theoretical guarantees, and the authors show empirically that the CVAPs might also enjoy this property.
QuantileRegressionForest
were the same. This has now
been fixed. Thanks to @gugerlir for noticing this!random_seed
argument in QuantileRegressionTree
and QuantileRegressionForest
has been changed to random_state
to be consistent with DecisionTreeRegressor
, and
to avoid an AttributeError
when accessing the estimators of a
QuantileRegressionForest
.QuantileRegressionForest
now has a feature_importances_
attribute.Boot.fit
and Boot.predict
methods are now parallelised, speeding up both training
and prediction time a bit.README
to include generalised linear models, rather than only
mentioning linear regression.PyTorch
model support, as that has not been implemented
yetverbose
argument to QuantileRegressionForest
also displays a progress
bar during inference now.QuantileRegressionForest.__repr__
.verbose
argument to QuantileRegressionForest
, which displays a
progress bar during training.QuantileRegressionForest.min_samples_leaf
has changed
from 1 to 5, to ensure that the quantiles can always be computed sensibly
with the default setting.logkow
feature in the FishBioconcentration
dataset is now converted
into a float, rather than a string.README
QuantileLinearRegression
has been removed, and QuantileRegressor
should
be used insteaduncertainty machine-learning prediction-intervals quantile-regression quantile-regression-forests confidence-intervals bootstrap