Bringing back uncertainty to machine learning.

saattrupdan, updated 🕥 2023-03-20 19:13:41

Doubt

Bringing back uncertainty to machine learning.


PyPI Status Documentation License LastCommit Code Coverage Conference

A Python package to include prediction intervals in the predictions of machine learning models, to quantify their uncertainty.

Installation

If you do not already have HDF5 installed, then start by installing that. On MacOS this can be done using sudo port install hdf5 after MacPorts have been installed. On Ubuntu you can get HDF5 with sudo apt-get install python-dev python3-dev libhdf5-serial-dev. After that, you can install doubt with pip:

shell pip install doubt

Features

  • Bootstrap wrapper for all Scikit-Learn models
    • Can also be used to calculate usual bootstrapped statistics of a dataset
  • Quantile Regression for all generalised linear models
  • Quantile Regression Forests
  • A uniform dataset API, with 24 regression datasets and counting

Quick Start

If you already have a model in Scikit-Learn, then you can simply wrap it in a Boot to enable predicting with prediction intervals:

```python

from sklearn.linear_model import LinearRegression from doubt import Boot from doubt.datasets import PowerPlant

X, y = PowerPlant().split() clf = Boot(LinearRegression()) clf = clf.fit(X, y) clf.predict([10, 30, 1000, 50], uncertainty=0.05) (481.9203102126274, array([473.43314309, 490.0313962 ])) ```

Alternatively, you can use one of the standalone models with uncertainty outputs. For instance, a QuantileRegressionForest:

```python

from doubt import QuantileRegressionForest as QRF from doubt.datasets import Concrete import numpy as np

X, y = Concrete().split() clf = QRF(max_leaf_nodes=8) clf.fit(X, y) clf.predict(np.ones(8), uncertainty=0.25) (16.933590347847982, array([ 8.93456428, 26.0664534 ])) ```

Issues

Unit tests

opened on 2021-04-11 20:21:35 by saattrupdan

So far all tests are doctests, which double as explanatory examples. However, we also need unit tests that test the edge cases of the functions.

Conformal Quantile Regression

opened on 2021-04-06 15:01:44 by saattrupdan

Conformal Quantile Regression was introduced in Romano, Patterson & Candès and is a variant of quantile regression which calibrates the prediction intervals, yielding narrower intervals, while preserving theoretical coverage guarantees.

This could potentially be built into QuantileLinearRegression via a conformal argument.

IVAPs and CVAPs

opened on 2021-04-06 14:54:55 by saattrupdan

The Inductive Venn-Abers predictors (IVAPs) and Cross Venn-Abers predictors (CVAPs) was introduced in Vovk, Petej & Fedorova (2015), and can provide lower and upper bounds for probabilities in classification models. The IVAPs have theoretical guarantees, and the authors show empirically that the CVAPs might also enjoy this property.

Releases

v4.3.1 2023-03-20 18:37:11

Fixed

  • Previously, all the trees in QuantileRegressionForest were the same. This has now been fixed. Thanks to @gugerlir for noticing this!
  • The random_seed argument in QuantileRegressionTree and QuantileRegressionForest has been changed to random_state to be consistent with DecisionTreeRegressor, and to avoid an AttributeError when accessing the estimators of a QuantileRegressionForest.

v4.3.0 2022-07-17 17:24:24

Added

  • The QuantileRegressionForest now has a feature_importances_ attribute.

v4.2.0 2022-07-17 15:32:36

Changed

  • Boot.fit and Boot.predict methods are now parallelised, speeding up both training and prediction time a bit.
  • Updated README to include generalised linear models, rather than only mentioning linear regression.

Fixed

  • Removed mention of PyTorch model support, as that has not been implemented yet

v4.1.0 2021-07-26 09:35:14

Changed

  • The verbose argument to QuantileRegressionForest also displays a progress bar during inference now.

Fixed

  • Fixed QuantileRegressionForest.__repr__.

v4.0.0 2021-07-26 09:15:52

Added

  • Added a verbose argument to QuantileRegressionForest, which displays a progress bar during training.

Changed

  • The default value of QuantileRegressionForest.min_samples_leaf has changed from 1 to 5, to ensure that the quantiles can always be computed sensibly with the default setting.

Fixed

  • The logkow feature in the FishBioconcentration dataset is now converted into a float, rather than a string.
  • Typo in example script in README

v3.0.0 2021-04-25 13:17:57

Removed

  • QuantileLinearRegression has been removed, and QuantileRegressor should be used instead
Dan Saattrup Nielsen

Senior AI Specialist, Mathematics PhD

GitHub Repository

uncertainty machine-learning prediction-intervals quantile-regression quantile-regression-forests confidence-intervals bootstrap