*Bringing back uncertainty to machine learning.*

A Python package to include prediction intervals in the predictions of machine learning models, to quantify their uncertainty.

If you do not already have HDF5 installed, then start by installing that. On MacOS this
can be done using `sudo port install hdf5`

after
MacPorts have been installed. On Ubuntu you can
get HDF5 with `sudo apt-get install python-dev python3-dev libhdf5-serial-dev`

. After
that, you can install `doubt`

with `pip`

:

```
shell
pip install doubt
```

- Bootstrap wrapper for all Scikit-Learn models
- Can also be used to calculate usual bootstrapped statistics of a dataset

- Quantile Regression for all generalised linear models
- Quantile Regression Forests
- A uniform dataset API, with 24 regression datasets and counting

If you already have a model in Scikit-Learn, then you can simply
wrap it in a `Boot`

to enable predicting with prediction intervals:

```python

from sklearn.linear_model import LinearRegression from doubt import Boot from doubt.datasets import PowerPlant

X, y = PowerPlant().split() clf = Boot(LinearRegression()) clf = clf.fit(X, y) clf.predict([10, 30, 1000, 50], uncertainty=0.05) (481.9203102126274, array([473.43314309, 490.0313962 ])) ```

Alternatively, you can use one of the standalone models with uncertainty
outputs. For instance, a `QuantileRegressionForest`

:

```python

from doubt import QuantileRegressionForest as QRF from doubt.datasets import Concrete import numpy as np

X, y = Concrete().split() clf = QRF(max_leaf_nodes=8) clf.fit(X, y) clf.predict(np.ones(8), uncertainty=0.25) (16.933590347847982, array([ 8.93456428, 26.0664534 ])) ```

So far all tests are doctests, which double as explanatory examples. However, we also need unit tests that test the edge cases of the functions.

Conformal Quantile Regression was introduced in Romano, Patterson & Candès and is a variant of quantile regression which calibrates the prediction intervals, yielding narrower intervals, while preserving theoretical coverage guarantees.

This could potentially be built into `QuantileLinearRegression`

via a `conformal`

argument.

The Inductive Venn-Abers predictors (IVAPs) and Cross Venn-Abers predictors (CVAPs) was introduced in Vovk, Petej & Fedorova (2015), and can provide lower and upper bounds for probabilities in classification models. The IVAPs have theoretical guarantees, and the authors show empirically that the CVAPs might also enjoy this property.

- Previously, all the trees in
`QuantileRegressionForest`

were the same. This has now been fixed. Thanks to @gugerlir for noticing this! - The
`random_seed`

argument in`QuantileRegressionTree`

and`QuantileRegressionForest`

has been changed to`random_state`

to be consistent with`DecisionTreeRegressor`

, and to avoid an`AttributeError`

when accessing the estimators of a`QuantileRegressionForest`

.

- The
`QuantileRegressionForest`

now has a`feature_importances_`

attribute.

`Boot.fit`

and`Boot.predict`

methods are now parallelised, speeding up both training and prediction time a bit.- Updated
`README`

to include generalised linear models, rather than only mentioning linear regression.

- Removed mention of
`PyTorch`

model support, as that has not been implemented yet

- The
`verbose`

argument to`QuantileRegressionForest`

also displays a progress bar during inference now.

- Fixed
`QuantileRegressionForest.__repr__`

.

- Added a
`verbose`

argument to`QuantileRegressionForest`

, which displays a progress bar during training.

- The default value of
`QuantileRegressionForest.min_samples_leaf`

has changed from 1 to 5, to ensure that the quantiles can always be computed sensibly with the default setting.

- The
`logkow`

feature in the`FishBioconcentration`

dataset is now converted into a float, rather than a string. - Typo in example script in
`README`

`QuantileLinearRegression`

has been removed, and`QuantileRegressor`

should be used instead