More examples#

Importing packages#

In the examples below, we will be using the three main classes ConformalClassifier, ConformalRegressor, and ConformalPredictiveSystem from the crepes package, as an alternative to using the classes WrapClassifier and WrapRegressor in the same package, which was illustrated here. In the examples, we will be using a helper class and functions from crepes.extras as well as NumPy, pandas, matplotlib and sklearn.

[1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml

from crepes import ConformalClassifier, ConformalRegressor, ConformalPredictiveSystem, __version__

from crepes.extras import hinge, margin, binning, DifficultyEstimator

print(f"crepes v. {__version__}")

np.random.seed(602211023)

crepes v. 0.6.2

Conformal classifiers (CC)#

Importing and splitting a classification dataset#

Let us import a classification dataset from www.openml.org.

[2]:

dataset = fetch_openml(name="gas-drift", parser="auto")

X = dataset.data.values.astype(float)
y = dataset.target.values

We now split the dataset into a training and a test set, and further split the training set into a proper training set and a calibration set.

[3]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

X_prop_train, X_cal, y_prop_train, y_cal = train_test_split(X_train, y_train,
                                                            test_size=0.25)

Standard conformal classifier#

Let us first create and fit a random forest in the usual way (using the proper training set only):

[4]:

rf = RandomForestClassifier(n_jobs=-1, n_estimators=500)

rf.fit(X_prop_train, y_prop_train)

[4]:

RandomForestClassifier(n_estimators=500, n_jobs=-1)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Let us also calculate non-conformity scores for the calibration set using the hinge function imported from crepes.extras, which takes predicted probabilities, class names and labels as input:

[5]:

alphas_cal = hinge(rf.predict_proba(X_cal), rf.classes_, y_cal)

Using the non-conformity scores, we create and fit a standard conformal classifier by:

[6]:

cc_std = ConformalClassifier()

cc_std.fit(alphas_cal)

display(cc_std)

ConformalClassifier(fitted=True, mondrian=False)

In order to make predictions for a test set, we need non-conformity scores for this too; note that class names and labels should not be provided in this case:

[7]:

alphas_test = hinge(rf.predict_proba(X_test))

We obtain p-values for the test set from the non-conformity scores by:

[8]:

p_values = cc_std.predict_p(alphas_test)

display(p_values)

array([[7.11853998e-02, 6.81623163e-04, 7.51952414e-05, 7.22822111e-03,
        1.83551541e-03, 1.25354912e-03],
       [7.69995483e-04, 4.02198309e-01, 3.28246348e-03, 8.46337362e-04,
        1.03880711e-03, 1.29555038e-03],
       [1.65747617e-03, 1.16900791e-03, 2.02917868e-04, 5.02167131e-05,
        7.65836044e-01, 1.70578620e-03],
       ...,
       [1.30852359e-03, 1.38763636e-03, 2.20624808e-03, 4.54321090e-03,
        2.99559728e-03, 2.06218609e-01],
       [1.89321540e-03, 1.75937091e-03, 1.64400289e-03, 6.98791143e-03,
        2.22087512e-03, 1.16866012e-01],
       [2.90605130e-03, 1.94900613e-03, 1.74838116e-03, 3.30855703e-03,
        7.68918667e-02, 7.03350329e-03]])

We can also obtain predictions sets, i.e., binary vectors indicating the presence (1) or absence (0) of each class label, for some specified confidence level (the default is 0.95):

[9]:

cc_std.predict_set(alphas_test, confidence=0.99)

[9]:

array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0],
       ...,
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 1, 0]])

We may also evaluate the predictions; here using all available metrics, which is the default:

[10]:

cc_std.evaluate(alphas_test, rf.classes_, y_test, confidence=0.99)

[10]:

{'error': 0.005463695183321349,
 'avg_c': 1.0050323508267434,
 'one_c': 0.9929547088425593,
 'empty': 0.00100647016534867,
 'time_fit': 6.222724914550781e-05,
 'time_evaluate': 0.06952309608459473}

Alternatively, we could calculate non-conformity scores using the margin function, also imported from crepes.extras, which similar to hinge, takes predicted probabilities, class names and labels as input:

[11]:

alphas_margin_cal = margin(rf.predict_proba(X_cal), rf.classes_, y_cal)

Using the new non-conformity scores, we create and fit a standard conformal classifier by:

[12]:

cc_margin_std = ConformalClassifier()

cc_margin_std.fit(alphas_margin_cal)

display(cc_margin_std)

ConformalClassifier(fitted=True, mondrian=False)

In order to make predictions for the test set, we again need the non-conformity scores for this too:

[13]:

alphas_margin_test = margin(rf.predict_proba(X_test))

Let us again evaluate the predictions:

[14]:

cc_margin_std.evaluate(alphas_margin_test, rf.classes_, y_test, confidence=0.99)

[14]:

{'error': 0.004025880661394643,
 'avg_c': 1.0067577282530553,
 'one_c': 0.9945363048166787,
 'empty': 0.0,
 'time_fit': 0.00032591819763183594,
 'time_evaluate': 0.07206845283508301}

Mondrian conformal classifiers#

To control the error level across different groups of objects of interest, we may use so-called Mondrian conformal classifiers. A Mondrian conformal classifier is formed by providing the names of the categories as an additional argument, named bins, for the calibrate method.

For illustration, we will consider two categories formed by whether the first feature value is larger than 50 000 or not:

[15]:

bins_cal = X_cal[:,0] > 50000

cc_mond = ConformalClassifier()

cc_mond.fit(alphas_cal, bins_cal)

[15]:

ConformalClassifier(fitted=True, mondrian=True)

To obtain p-values for the test objects, we need to provide bins for these too:

[16]:

bins_test = X_test[:,0] > 50000

p_values = cc_mond.predict_p(alphas_test, bins_test)

display(p_values)

array([[6.85483538e-02, 9.56905247e-04, 9.96107122e-04, 5.49297864e-03,
        1.07263707e-03, 1.73441058e-03],
       [6.44031222e-04, 4.34053862e-01, 1.48879679e-03, 1.02445448e-03,
        1.20033689e-03, 1.15408714e-03],
       [1.26778633e-03, 1.38554492e-03, 2.05399966e-04, 1.71614694e-03,
        7.74677294e-01, 9.19270156e-04],
       ...,
       [1.45263031e-03, 1.25476462e-03, 1.09537260e-03, 1.34346158e-03,
        1.46410801e-03, 2.20306963e-01],
       [1.39891600e-03, 1.65663937e-03, 8.81856043e-04, 5.67695289e-03,
        1.44449215e-03, 1.15156779e-01],
       [1.44170455e-03, 1.25665750e-03, 1.34483024e-03, 1.39414839e-03,
        7.70479932e-02, 5.40138215e-03]])

Similarly, prediction sets are obtained by:

[17]:

prediction_sets = cc_mond.predict_set(alphas_test, bins_test, confidence=0.8)

display(prediction_sets)

array([[0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0],
       ...,
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

Bins also have to be provided for the test objects when evaluating a Mondrian conformal classifier:

[18]:

cc_mond.evaluate(alphas_test, rf.classes_, y_test, bins_test, 0.9)

[18]:

{'error': 0.08396836808051766,
 'avg_c': 0.9171818835370237,
 'one_c': 0.9171818835370237,
 'empty': 0.08281811646297628,
 'time_fit': 0.00011992454528808594,
 'time_evaluate': 0.10257625579833984}

Class-conditional conformal classifiers#

Class-conditional conformal classifiers is a special type of Mondrian conformal classifiers where the categories simply are defined by the class labels. The fitting is hence straightforward:

[19]:

cc_class_cond = ConformalClassifier()

cc_class_cond.fit(alphas_cal, y_cal)

display(cc_class_cond)

ConformalClassifier(fitted=True, mondrian=True)

However, the test objects need special treatment since we do not know to which categories they belong. Below we show how to obtain the prediction sets:

[20]:

prediction_set = np.array([
                cc_class_cond.predict_set(alphas_test,
                                          np.full(len(alphas_test),
                                                  rf.classes_[c]),
                                          confidence=0.9)[:, c]
    for c in range(len(rf.classes_))]).T

display(prediction_set)

array([[0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0],
       ...,
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0]])

For easy handling of class-conditional conformal classifiers, you are advised to consider using the class WrapClassifier, which is illustrated here.

Conformal classifiers without a separate calibration set#

For conformal classifiers that employ learners that use bagging, like random forests, we may consider an alternative strategy to dividing the original training set into a proper training and calibration set; we may use the out-of-bag (OOB) predictions, which allow us to use the full training set for both model building and calibration. It should be noted that this strategy does not come with the theoretical validity guarantee of the above (inductive) conformal classifiers, due to that calibration and test instances are not handled in exactly the same way. In practice, however, conformal classifiers based on out-of-bag predictions rarely fail to meet the coverage requirements.

Standard conformal classifiers with out-of-bag calibration#

Let us first generate a model from the full training set, making sure the learner has an attribute oob_decision_function_, which e.g. is the case for a RandomForestClassifier if oob_score is set to True when created.

[21]:

rf = RandomForestClassifier(n_jobs=-1, n_estimators=500, oob_score=True)

rf.fit(X_train, y_train)

[21]:

RandomForestClassifier(n_estimators=500, n_jobs=-1, oob_score=True)

We may now obtain a standard conformal regressor using OOB predictions:

[22]:

alphas_oob = hinge(rf.oob_decision_function_, rf.classes_, y_train)

cc_std_oob = ConformalClassifier()

cc_std_oob.fit(alphas_oob)

display(cc_std_oob)

ConformalClassifier(fitted=True, mondrian=False)

… and use it to get prediction sets for the test set:

[23]:

alphas_test = hinge(rf.predict_proba(X_test))

prediction_sets_oob = cc_std_oob.predict_set(alphas_test)

display(prediction_sets_oob)

array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0],
       ...,
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 1, 0]])

Mondrian conformal classifiers with out-of-bag calibration#

Using out-of-bag calibration works equally well for Mondrian conformal classifiers:

[24]:

bins_oob = X_train[:,0] > 50000

cc_mond_oob = ConformalClassifier()

cc_mond_oob.fit(alphas_oob, bins_oob)

display(cc_mond_oob)

ConformalClassifier(fitted=True, mondrian=True)

Prediction sets for the test objects are obtained in the following way:

[25]:

prediction_sets_mond_oob = cc_mond_oob.predict_set(alphas_test, bins=bins_test)

display(prediction_sets_oob)

array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0],
       ...,
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 1, 0]])

We may of course evaluate the Mondrian conformal classifier using out-of-bag predictions too:

[26]:

results = cc_mond_oob.evaluate(alphas_test, rf.classes_, y_test, bins=bins_test)

display(results)

{'error': 0.04457225017972677,
 'avg_c': 0.9565780014378146,
 'one_c': 0.9565780014378146,
 'empty': 0.043421998562185475,
 'time_fit': 0.00042557716369628906,
 'time_evaluate': 0.10801219940185547}

Conformal regressors (CR)#

Importing and splitting a regression dataset#

Let us import a regression dataset from www.openml.org and min-max normalize the targets; the latter is not really necessary, but useful, allowing to directly compare the size of a prediction interval to the whole target range, which becomes 1.0 in this case.

[27]:

dataset = fetch_openml(name="house_sales", version=3, parser="auto")

X = dataset.data.values.astype(float)
y = dataset.target.values.astype(float)

y = np.array([(y[i]-y.min())/(y.max()-y.min()) for i in range(len(y))])

We now split the dataset into a training and a test set, and further split the training set into a proper training set and a calibration set. Let us fit a random forest to the proper training set.

[28]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

X_prop_train, X_cal, y_prop_train, y_cal = train_test_split(X_train, y_train,
                                                            test_size=0.25)

learner_prop = RandomForestRegressor(n_jobs=-1, n_estimators=500, oob_score=True)

learner_prop.fit(X_prop_train, y_prop_train)

[28]:

RandomForestRegressor(n_estimators=500, n_jobs=-1, oob_score=True)

Standard conformal regressors#

Let us create a conformal regressor.

[29]:

cr_std = ConformalRegressor()

We may display the object, e.g., to see whether it has been fitted or not.

[30]:

display(cr_std)

ConformalRegressor(fitted=False)

We will use the residuals from the calibration set to fit the conformal regressor.

[31]:

y_hat_cal = learner_prop.predict(X_cal)

residuals_cal = y_cal - y_hat_cal

cr_std.fit(residuals_cal)

[31]:

ConformalRegressor(fitted=True, normalized=False, mondrian=False)

We may now obtain prediction intervals from the point predictions for the test set; here using a confidence level of 99%.

[32]:

y_hat_test = learner_prop.predict(X_test)

intervals = cr_std.predict(y_hat_test, confidence=0.99)

display(intervals)

array([[-0.03113533,  0.11003691],
       [-0.01944353,  0.12172872],
       [ 0.01294408,  0.15411632],
       ...,
       [-0.00739189,  0.13378035],
       [ 0.0441286 ,  0.18530085],
       [-0.02895129,  0.11222096]])

We may request that the intervals are cut to exclude impossible values, in this case below 0 and above 1; below we also use the default confidence level (95%), which further tightens the intervals.

[33]:

intervals_std = cr_std.predict(y_hat_test, y_min=0, y_max=1)

display(intervals_std)

array([[0.00711806, 0.07178352],
       [0.01880986, 0.08347533],
       [0.05119747, 0.11586293],
       ...,
       [0.0308615 , 0.09552696],
       [0.08238199, 0.14704746],
       [0.0093021 , 0.07396756]])

Normalized conformal regressors#

The above intervals are not normalized, i.e., they are all of the same size (at least before they are cut). We could make the intervals more informative through normalization using difficulty estimates; more difficult instances will be assigned wider intervals. We can use a DifficultyEstimator, as imported from crepes.extras, for this purpose. It can be used to estimate the difficulty by using k-nearest neighbors in three different ways: i) by the (Euclidean) distances to the nearest neighbors, ii) by the standard deviation of the targets of the nearest neighbors, and iii) by the absolute errors of the k nearest neighbors.

A small value (beta) is added to the estimates, which may be given through a (named) argument to the fit method; we will just use the default for this, i.e., beta=0.01. In order to make the beta value have the same effect across different estimators, we may opt for normalizing the difficulty estimates (using min-max scaling) by setting scaler=True. It should be noted that this comes with a computational cost; for estimators based on the k-nearest neighbor, a leave-one-out protocol is employed to find the minimum and maximum distances that are used by the scaler.

We will first consider just using the first option (distances to the k-nearest neighbors) to produce normalized conformal regressors, using the default number of nearest neighbors, i.e., k=25.

[34]:

de_knn = DifficultyEstimator()

de_knn.fit(X=X_prop_train, scaler=True)

display(de_knn)

sigmas_cal_knn_dist = de_knn.apply(X_cal)

cr_norm_knn_dist = ConformalRegressor()

cr_norm_knn_dist.fit(residuals_cal, sigmas=sigmas_cal_knn_dist)

display(cr_norm_knn_dist)

DifficultyEstimator(fitted=True, type=knn, k=25, target=none, scaler=True, beta=0.01, oob=False)

ConformalRegressor(fitted=True, normalized=True, mondrian=False)

To generate prediction intervals for the test set, we need difficulty estimates for the latter too, which we get in the same way as for the calibration objects.

[35]:

sigmas_test_knn_dist = de_knn.apply(X_test)

intervals_norm_knn_dist = cr_norm_knn_dist.predict(y_hat_test,
                                                   sigmas=sigmas_test_knn_dist,
                                                   y_min=0, y_max=1)

display(intervals_norm_knn_dist)

array([[0.02259988, 0.0563017 ],
       [0.02659231, 0.07569289],
       [0.02196583, 0.14509456],
       ...,
       [0.02566299, 0.10072548],
       [0.07601485, 0.1534146 ],
       [0.00609388, 0.07717578]])

Alternatively, we could estimate the difficulty using the standard deviation of the targets of the nearest neighbors; we specify this by providing the targets too:

[36]:

de_knn_std = DifficultyEstimator()

de_knn_std.fit(X=X_prop_train, y=y_prop_train, scaler=True)

display(de_knn_std)

sigmas_cal_knn_std = de_knn_std.apply(X_cal)

cr_norm_knn_std = ConformalRegressor()

cr_norm_knn_std.fit(residuals_cal, sigmas=sigmas_cal_knn_std)

display(cr_norm_knn_std)

DifficultyEstimator(fitted=True, type=knn, k=25, target=labels, scaler=True, beta=0.01, oob=False)

ConformalRegressor(fitted=True, normalized=True, mondrian=False)

… and similarly for the test objects:

[37]:

sigmas_test_knn_std = de_knn_std.apply(X_test)

intervals_norm_knn_std = cr_norm_knn_std.predict(y_hat_test,
                                                 sigmas=sigmas_test_knn_std,
                                                 y_min=0, y_max=1)

display(intervals_norm_knn_std)

array([[0.02219202, 0.05670956],
       [0.03946534, 0.06281986],
       [0.017733  , 0.1493274 ],
       ...,
       [0.03790202, 0.08848645],
       [0.08780434, 0.14162511],
       [0.02370467, 0.059565  ]])

A third option is to use (absolute) residuals for the reference objects. For a model that overfits the training data, it can be a good idea to use a separate set of (reference) objects and labels from which the residuals could be calculated, rather than using the original training data. Since we in this case have trained a random forest, we opt for estimating the residuals by using the out-of-bag predictions for the training instances. (This was made possible by setting oob_score=True for the RandomForestRegressor above.)

To inform the fit method that this is what we want to do, we provide a value for residuals, instead of y as we did above for the option to use the (standard deviation of) the targets.

[38]:

oob_predictions = learner_prop.oob_prediction_

residuals_oob = y_prop_train - oob_predictions

de_knn_res = DifficultyEstimator()

de_knn_res.fit(X=X_prop_train, residuals=residuals_oob, scaler=True)

display(de_knn_res)

sigmas_cal_knn_res = de_knn_res.apply(X_cal)

cr_norm_knn_res = ConformalRegressor()

cr_norm_knn_res.fit(residuals_cal, sigmas=sigmas_cal_knn_res)

display(cr_norm_knn_res)

DifficultyEstimator(fitted=True, type=knn, k=25, target=residuals, scaler=True, beta=0.01, oob=False)

ConformalRegressor(fitted=True, normalized=True, mondrian=False)

… and again, the difficulty estimates are formed in the same way for the test objects:

[39]:

sigmas_test_knn_res = de_knn_res.apply(X_test)

intervals_norm_knn_res = cr_norm_knn_res.predict(y_hat_test,
                                                 sigmas=sigmas_test_knn_res,
                                                 y_min=0, y_max=1)

display(intervals_norm_knn_res)

array([[0.02223252, 0.05666906],
       [0.02612052, 0.07616468],
       [0.02236831, 0.14469208],
       ...,
       [0.03517   , 0.09121846],
       [0.0888693 , 0.14056015],
       [0.01518696, 0.0680827 ]])

In case we have trained an ensemble model, like a RandomForestRegressor, we could alternatively request DifficultyEstimator to estimate the difficulty by the variance of the predictions of the constituent models. This requires us to provide the trained model learner as input to fit, assuming that learner.estimators_ is a collection of base models, each implementing the predict method; this holds e.g., for RandomForestRegressor. A set of objects (X) has to be provided only if we employ scaling (scaler=True).

[40]:

de_var = DifficultyEstimator()

de_var.fit(X=X_prop_train, learner=learner_prop, scaler=True)

display(de_var)

sigmas_cal_var = de_var.apply(X_cal)

cr_norm_var = ConformalRegressor()

cr_norm_var.fit(residuals_cal, sigmas=sigmas_cal_var)

display(cr_norm_var)

DifficultyEstimator(fitted=True, type=variance, scaler=True, beta=0.01, oob=False)

ConformalRegressor(fitted=True, normalized=True, mondrian=False)

The difficulty estimates for the test set are generated in the same way:

[41]:

sigmas_test_var = de_var.apply(X_test)

intervals_norm_var = cr_norm_var.predict(y_hat_test,
                                         sigmas=sigmas_test_var,
                                         y_min=0, y_max=1)

display(intervals_norm_var)

array([[0.0225036 , 0.05639797],
       [0.03261736, 0.06966783],
       [0.04999298, 0.11706741],
       ...,
       [0.03915709, 0.08723137],
       [0.06012566, 0.16930379],
       [0.02448275, 0.05878691]])

Mondrian conformal regressors#

An alternative way of generating prediction intervals of varying size is to divide the object space into non-overlapping so-called Mondrian categories. A Mondrian conformal regressor is formed by providing the names of the categories as an additional argument, named bins, for the fit method.

Here we employ the helper function binning, imported from crepes.extras, which given a list/array of values returns an array of the same length with the assigned bins. If the optional argument bins is an integer, the function will divide the values into equal-sized bins and return both the assigned bins and the bin boundaries. If bins instead is a set of bin boundaries, the function will just return the assigned bins.

We can form the Mondrian categories in any way we like, as long as we only use information that is available for both calibration and test instances; this means that we may not use the target values for this purpose, since these will typically not be available for the test instances. We will form categories by binning of the difficulty estimates, here using the ones previously produced using the standard deviations of the nearest neighbor targets.

[42]:

bins_cal, bin_thresholds = binning(sigmas_cal_var, bins=20)

cr_mond = ConformalRegressor()

cr_mond.fit(residuals_cal, bins=bins_cal)

display(cr_mond)

ConformalRegressor(fitted=True, normalized=False, mondrian=True)

Let us now obtain the categories for the test instances using the same Mondrian categorization, i.e., bin borders.

[43]:

bins_test = binning(sigmas_test_var, bins=bin_thresholds)

… and now we can form prediction intervals for the test instances.

[44]:

intervals_mond = cr_mond.predict(y_hat_test, bins=bins_test, y_min=0, y_max=1)

display(intervals_mond)

array([[0.02249055, 0.05641103],
       [0.03195239, 0.07033281],
       [0.04040065, 0.12665975],
       ...,
       [0.03314223, 0.09324623],
       [0.05171936, 0.17771009],
       [0.0246746 , 0.05859507]])

Investigating the prediction intervals#

Let us first put all the intervals in a dictionary.

[45]:

prediction_intervals = {
    "Std CR":intervals_std,
    "Norm CR knn dist":intervals_norm_knn_dist,
    "Norm CR knn std":intervals_norm_knn_std,
    "Norm CR knn res":intervals_norm_knn_res,
    "Norm CR var":intervals_norm_var,
    "Mond CR":intervals_mond,
}

Let us see what fraction of the intervals that contain the true targets and how large the intervals are.

[46]:

coverages = []
mean_sizes = []
median_sizes = []

for name in prediction_intervals.keys():
    intervals = prediction_intervals[name]
    coverages.append(np.sum([1 if (y_test[i]>=intervals[i,0] and
                                   y_test[i]<=intervals[i,1]) else 0
                            for i in range(len(y_test))])/len(y_test))
    mean_sizes.append((intervals[:,1]-intervals[:,0]).mean())
    median_sizes.append(np.median((intervals[:,1]-intervals[:,0])))

pred_int_df = pd.DataFrame({"Coverage":coverages,
                            "Mean size":mean_sizes,
                            "Median size":median_sizes},
                           index=list(prediction_intervals.keys()))

pred_int_df.loc["Mean"] = [pred_int_df["Coverage"].mean(),
                           pred_int_df["Mean size"].mean(),
                           pred_int_df["Median size"].mean()]

display(pred_int_df.round(4))

	Coverage	Mean size	Median size
Std CR	0.9508	0.0629	0.0647
Norm CR knn dist	0.9497	0.0593	0.0485
Norm CR knn std	0.9415	0.0555	0.0437
Norm CR knn res	0.9481	0.0552	0.0440
Norm CR var	0.9502	0.0524	0.0362
Mond CR	0.9589	0.0541	0.0379
Mean	0.9499	0.0566	0.0458

Let us look at the distribution of the interval sizes.

[47]:

interval_sizes = {}
for name in prediction_intervals.keys():
    interval_sizes[name] = prediction_intervals[name][:,1] \
    - prediction_intervals[name][:,0]

plt.figure(figsize=(6,6))
plt.ylabel("CDF")
plt.xlabel("Interval sizes")
plt.xlim(0,interval_sizes["Mond CR"].max()*1.25)

colors = ["b","r","g","y","k","m","c","orange"]

for i, name in enumerate(interval_sizes.keys()):
    if "Std" in name:
        style = "dotted"
    else:
        style = "solid"
    plt.plot(np.sort(interval_sizes[name]),
             [i/len(interval_sizes[name])
              for i in range(1,len(interval_sizes[name])+1)],
             linestyle=style, c=colors[i], label=name)

plt.legend()
plt.show()

Evaluating the conformal regressors#

Let us put the six above conformal regressors in a dictionary, together with the corresponding difficulty estimates for the test instances (if any).

[48]:

all_cr = {
    "Std CR": (cr_std, []),
    "Norm CR knn dist": (cr_norm_knn_dist, sigmas_test_knn_dist),
    "Norm CR knn std": (cr_norm_knn_std, sigmas_test_knn_std),
    "Norm CR knn res": (cr_norm_knn_res, sigmas_test_knn_res),
    "Norm CR var" : (cr_norm_var, sigmas_test_var),
    "Mond CR": (cr_mond, sigmas_test_var),
}

Let us evaluate them using three confidence levels on the test set. We could specify a subset of the metrics to use by the named metrics argument of the evaluate method; here we use all, which is the default.

Note that the arguments sigmas and bins can always be provided, but they will be ignored by conformal regressors not using them, e.g., both arguments will be ignored by the standard conformal regressors.

[49]:

confidence_levels = [0.9,0.95,0.99]

names = list(all_cr.keys())

all_results = {}

for confidence in confidence_levels:
    for name in names:
        all_results[(name,confidence)] = all_cr[name][0].evaluate(
        y_hat_test, y_test, sigmas=all_cr[name][1],
        bins=bins_test, confidence=confidence,
        y_min=0, y_max=1)

results_df = pd.DataFrame(columns=pd.MultiIndex.from_product(
    [names,confidence_levels]), index=list(list(
    all_results.values())[0].keys()))

for key in all_results.keys():
    results_df[key] = all_results[key].values()

display(results_df.round(4))

	Std CR			Norm CR knn dist			Norm CR knn std			Norm CR knn res			Norm CR var			Mond CR
	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99
error	0.1007	0.0492	0.0093	0.1044	0.0503	0.0119	0.1036	0.0585	0.0119	0.1012	0.0519	0.0089	0.1033	0.0498	0.0122	0.0936	0.0411	0.0066
eff_mean	0.0428	0.0629	0.1206	0.0450	0.0593	0.0913	0.0438	0.0555	0.0904	0.0425	0.0552	0.0916	0.0406	0.0524	0.0765	0.0415	0.0541	0.0895
eff_med	0.0430	0.0647	0.1215	0.0367	0.0485	0.0750	0.0344	0.0437	0.0722	0.0337	0.0440	0.0738	0.0277	0.0362	0.0543	0.0298	0.0379	0.0620
time_fit	0.0002	0.0002	0.0002	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0001	0.0002	0.0002	0.0002
time_evaluate	0.0003	0.0002	0.0002	0.0002	0.0003	0.0002	0.0002	0.0003	0.0002	0.0002	0.0002	0.0003	0.0002	0.0002	0.0002	0.0006	0.0006	0.0006

Conformal regressors without a separate calibration set#

For conformal regressors that employ learners that use bagging, like random forests, we may consider an alternative strategy to dividing the original training set into a proper training and calibration set; we may use the out-of-bag (OOB) predictions, which allow us to use the full training set for both model building and calibration. It should be noted that this strategy does not come with the theoretical validity guarantee of the above (inductive) conformal regressors, due to that calibration and test instances are not handled in exactly the same way. In practice, however, conformal regressors based on out-of-bag predictions rarely do not meet the coverage requirements.

Standard conformal regressors with out-of-bag calibration#

Let us first generate a model from the full training set and then get the residuals using the OOB predictions; we rely on that the learner has an attribute oob_prediction_, which e.g. is the case for a RandomForestRegressor if oob_score is set to True when created.

[50]:

learner_full = RandomForestRegressor(n_jobs=-1, n_estimators=500,
                                     oob_score=True)

learner_full.fit(X_train, y_train)

[50]:

RandomForestRegressor(n_estimators=500, n_jobs=-1, oob_score=True)

Now we can obtain the residuals.

[51]:

oob_predictions = learner_full.oob_prediction_

residuals_oob = y_train - oob_predictions

We may now obtain a standard conformal regressor from these OOB residuals

[52]:

cr_std_oob = ConformalRegressor()

cr_std_oob.fit(residuals_oob)

[52]:

ConformalRegressor(fitted=True, normalized=False, mondrian=False)

… and apply it using the point predictions of the full model.

[53]:

y_hat_full = learner_full.predict(X_test)

intervals_std_oob = cr_std_oob.predict(y_hat_full, y_min=0, y_max=1)

display(intervals_std_oob)

array([[0.00814877, 0.0707599 ],
       [0.01988226, 0.08249339],
       [0.05352123, 0.11613236],
       ...,
       [0.03145775, 0.09406887],
       [0.08269424, 0.14530537],
       [0.00983793, 0.07244906]])

Normalized conformal regressors with out-of-bag calibration#

We may also generate normalized conformal regressors from the OOB predictions. The DifficultyEstimator can be used also for this purpose; for the k-nearest neighbor approaches, the difficulty of each object in the training set will be computed using a leave-one-out procedure, while for the variance-based approach the out-of-bag predictions will be employed.

By setting oob=True, we inform the fit method that we may request difficulty estimates for the provided set of objects; these will be retrieved by not providing any objects when calling the apply method.

Let us start with the k-nearest neighbor approach using distances only.

[54]:

de_knn_dist_oob = DifficultyEstimator()

de_knn_dist_oob.fit(X=X_train, scaler=True, oob=True)

display(de_knn_dist_oob)

sigmas_knn_dist_oob = de_knn_dist_oob.apply()

cr_norm_knn_dist_oob = ConformalRegressor()

cr_norm_knn_dist_oob.fit(residuals_oob, sigmas=sigmas_knn_dist_oob)

DifficultyEstimator(fitted=True, type=knn, k=25, target=none, scaler=True, beta=0.01, oob=True)

[54]:

ConformalRegressor(fitted=True, normalized=True, mondrian=False)

In order to apply the normalized OOB regressors to the test set, we need to generate difficulty estimates for the latter too.

[55]:

sigmas_test_knn_dist_oob = de_knn_dist_oob.apply(X_test)

intervals_norm_knn_dist_oob = cr_norm_knn_dist_oob.predict(
    y_hat_full, sigmas=sigmas_test_knn_dist_oob, y_min=0, y_max=1)

display(intervals_norm_knn_dist_oob)

array([[0.02198504, 0.05692363],
       [0.02899582, 0.07337983],
       [0.02958131, 0.14007228],
       ...,
       [0.02659464, 0.09893198],
       [0.07550386, 0.15249575],
       [0.00640295, 0.07588404]])

For completeness, we will illustrate the use of out-of-bag calibration for the remaining approaches too. For k-nearest neighbors with labels, we do the following:

[56]:

de_knn_std_oob = DifficultyEstimator()

de_knn_std_oob.fit(X=X_train, y=y_train, scaler=True, oob=True)

display(de_knn_std_oob)

sigmas_knn_std_oob = de_knn_std_oob.apply()

cr_norm_knn_std_oob = ConformalRegressor()

cr_norm_knn_std_oob.fit(residuals=residuals_oob, sigmas=sigmas_knn_std_oob)

sigmas_test_knn_std_oob = de_knn_std_oob.apply(X_test)

intervals_norm_knn_std_oob = cr_norm_knn_std_oob.predict(
    y_hat_full, sigmas=sigmas_test_knn_std_oob, y_min=0, y_max=1)

display(intervals_norm_knn_std_oob)

DifficultyEstimator(fitted=True, type=knn, k=25, target=labels, scaler=True, beta=0.01, oob=True)

array([[0.02399253, 0.05491614],
       [0.03598183, 0.06639382],
       [0.        , 0.19467639],
       ...,
       [0.03590201, 0.08962461],
       [0.08936738, 0.13863223],
       [0.02082477, 0.06146222]])

A third option is to use k-nearest neighbors with (OOB) residuals:

[57]:

de_knn_res_oob = DifficultyEstimator()

de_knn_res_oob.fit(X=X_train, residuals=residuals_oob, scaler=True, oob=True)

display(de_knn_res_oob)

sigmas_knn_res_oob = de_knn_res_oob.apply()

cr_norm_knn_res_oob = ConformalRegressor()

cr_norm_knn_res_oob.fit(residuals_oob, sigmas=sigmas_knn_res_oob)

sigmas_test_knn_res_oob = de_knn_res_oob.apply(X_test)

intervals_norm_knn_res_oob = cr_norm_knn_res_oob.predict(
    y_hat_full, sigmas=sigmas_test_knn_res_oob, y_min=0, y_max=1)

display(intervals_norm_knn_res_oob)

DifficultyEstimator(fitted=True, type=knn, k=25, target=residuals, scaler=True, beta=0.01, oob=True)

array([[0.02662119, 0.05228748],
       [0.03111808, 0.07125757],
       [0.01159487, 0.15805872],
       ...,
       [0.03387443, 0.0916522 ],
       [0.09033022, 0.13766939],
       [0.01618463, 0.06610236]])

A fourth and final option for the normalized conformal regressors is to use variance as a difficulty estimate. We then leave labels and residuals out, but provide an (ensemble) learner. In contrast to when oob=False, we are here required to provide the (full) training set, from which the variance of the out-of-bag predictions will be computed. When applied to the test set, the full ensemble model will not be used to obtain the difficulty estimates, but instead a subset of the constituent models is used, following what could be seen as post hoc assignment of each test instance to a bag.

[58]:

de_var_oob = DifficultyEstimator()

de_var_oob.fit(X=X_train, learner=learner_full, scaler=True, oob=True)

display(de_var_oob)

sigmas_var_oob = de_var_oob.apply()

cr_norm_var_oob = ConformalRegressor()

cr_norm_var_oob.fit(residuals_oob, sigmas=sigmas_var_oob)

sigmas_test_var_oob = de_var_oob.apply(X_test)

intervals_norm_var_oob = cr_norm_var_oob.predict(y_hat_full,
                                                 sigmas=sigmas_test_var_oob,
                                                 y_min=0, y_max=1)

display(intervals_norm_var_oob)

DifficultyEstimator(fitted=True, type=variance, scaler=True, beta=0.01, oob=True)

array([[0.0229227 , 0.05598598],
       [0.03229415, 0.0700815 ],
       [0.03462977, 0.13502382],
       ...,
       [0.04168669, 0.08383993],
       [0.07739326, 0.15060635],
       [0.02446768, 0.05781931]])

Mondrian conformal regressors with out-of-bag calibration#

We may form the categories using the difficulty estimates obtained from the OOB predictions. We here consider the difficulty estimates produced by the fourth above option (using variance) only.

[59]:

bins_oob, bin_thresholds_oob = binning(sigmas_var_oob, bins=20)

cr_mond_oob = ConformalRegressor()

cr_mond_oob.fit(residuals=residuals_oob, bins=bins_oob)

[59]:

ConformalRegressor(fitted=True, normalized=False, mondrian=True)

… and assign the categories for the test instances, …

[60]:

bins_test_oob = binning(sigmas_test_var_oob, bins=bin_thresholds_oob)

… and finally generate the prediction intervals.

[61]:

intervals_mond_oob = cr_mond_oob.predict(y_hat_full,
                                         bins=bins_test_oob,
                                         y_min=0, y_max=1)

display(intervals_mond_oob)

array([[0.02499452, 0.05391415],
       [0.03045288, 0.07192277],
       [0.03226566, 0.13738793],
       ...,
       [0.03773981, 0.08778681],
       [0.06143867, 0.16656094],
       [0.02732101, 0.05496598]])

Investigating the OOB prediction intervals#

[62]:

prediction_intervals = {
    "Std CR OOB":intervals_std_oob,
    "Norm CR knn dist OOB":intervals_norm_knn_dist_oob,
    "Norm CR knn std OOB":intervals_norm_knn_std_oob,
    "Norm CR knn res OOB":intervals_norm_knn_res_oob,
    "Norm CR var OOB":intervals_norm_var_oob,
    "Mond CR OOB":intervals_mond_oob,
}

Let us see what fraction of the intervals that contain the true targets and how large the intervals are.

[63]:

coverages = []
mean_sizes = []
median_sizes = []

for name in prediction_intervals.keys():
    intervals = prediction_intervals[name]
    coverages.append(np.sum([1 if (y_test[i]>=intervals[i,0] and
                                   y_test[i]<=intervals[i,1]) else 0
                            for i in range(len(y_test))])/len(y_test))
    mean_sizes.append((intervals[:,1]-intervals[:,0]).mean())
    median_sizes.append(np.median((intervals[:,1]-intervals[:,0])))

pred_int_df = pd.DataFrame({"Coverage":coverages,
                            "Mean size":mean_sizes,
                            "Median size":median_sizes},
                           index=list(prediction_intervals.keys()))

pred_int_df.loc["Mean"] = [pred_int_df["Coverage"].mean(),
                           pred_int_df["Mean size"].mean(),
                           pred_int_df["Median size"].mean()]

display(pred_int_df.round(4))

	Coverage	Mean size	Median size
Std CR OOB	0.9506	0.0610	0.0626
Norm CR knn dist OOB	0.9486	0.0568	0.0467
Norm CR knn std OOB	0.9477	0.0585	0.0455
Norm CR knn res OOB	0.9498	0.0528	0.0418
Norm CR var OOB	0.9493	0.0499	0.0347
Mond CR OOB	0.9558	0.0524	0.0365
Mean	0.9503	0.0552	0.0446

Let us look at the distribution of the interval sizes.

[64]:

interval_sizes = {}
for name in prediction_intervals.keys():
    interval_sizes[name] = prediction_intervals[name][:,1] \
    - prediction_intervals[name][:,0]

plt.figure(figsize=(6,6))
plt.ylabel("CDF")
plt.xlabel("Interval sizes")
plt.xlim(0,interval_sizes["Mond CR OOB"].max()*1.25)

colors = ["b","r","g","y","k","m","c","orange"]

for i, name in enumerate(interval_sizes.keys()):
    if "Std" in name:
        style = "dotted"
    else:
        style = "solid"
    plt.plot(np.sort(interval_sizes[name]),
             [i/len(interval_sizes[name])
              for i in range(1,len(interval_sizes[name])+1)],
             linestyle=style, c=colors[i], label=name)

plt.legend()
plt.show()

Conformal Predictive Systems (CPS)#

Creating and fitting CPS#

Let us create and fit standard and normalized conformal predictive systems, using the residuals from the calibration set (as obtained in the previous section), as well two conformal predictive systems using out-of-bag residuals; with and without normalization. As can be seen, the input for fitting conformal predictive systems is on the same format as for the conformal regressors.

[65]:

cps_std = ConformalPredictiveSystem().fit(residuals_cal)

cps_norm = ConformalPredictiveSystem().fit(residuals_cal,
                                           sigmas=sigmas_cal_var)

cps_std_oob = ConformalPredictiveSystem().fit(residuals_oob)

cps_norm_oob = ConformalPredictiveSystem().fit(residuals_oob,
                                               sigmas=sigmas_var_oob)

Let us also create some Mondrian CPS, but in contrast to the Mondrian conformal regressors above, we here form the categories through binning of the predictions rather than binning of the difficulty estimates. We may use the latter, i.e., the sigmas, to obtain a normalized CPS for each category (bin).

[66]:

bins_cal, bin_thresholds = binning(y_hat_cal, bins=5)

cps_mond_std = ConformalPredictiveSystem().fit(residuals_cal,
                                               bins=bins_cal)

cps_mond_norm = ConformalPredictiveSystem().fit(residuals_cal,
                                                sigmas=sigmas_cal_var,
                                                bins=bins_cal)


bins_oob, bin_thresholds_oob = binning(oob_predictions, bins=5)

cps_mond_std_oob = ConformalPredictiveSystem().fit(residuals_oob,
                                                   bins=bins_oob)

cps_mond_norm_oob = ConformalPredictiveSystem().fit(residuals_oob,
                                                    sigmas=sigmas_var_oob,
                                                    bins=bins_oob)

Making predictions#

For the normalized approaches, we already have the difficulty estimates which are needed for the test instances. For the Mondrian approaches, we also need to assign the new categories to the test instances.

[67]:

bins_test = binning(y_hat_test, bins=bin_thresholds)

bins_test_oob = binning(y_hat_full, bins=bin_thresholds_oob)

The output of the predict method of a ConformalPredictiveSystem will depend on how we specify the input. If we provide specific target values (using the parameter y), the method will output a p-value for each test instance, i.e., the probability that the true target is less than or equal to the provided values. The method assumes that either one value is provided for each test instance or that the same (single) value is provided for all test instances.

Here we will obtain the p-values from cps_mond_norm for the true targets of the test set:

[68]:

p_values = cps_mond_norm.predict(y_hat_test,
                                 sigmas=sigmas_test_knn_res,
                                 bins=bins_test,
                                 y=y_test)

display(p_values)

array([0.45371406, 0.51474318, 0.65230468, ..., 0.42353045, 0.32244079,
       0.53352936])

If we instead would like to get threshold values, with a specified probability that the true target is less than the threshold for each test instance, we may instead provide percentiles as input to the predict method. This is done through the parameter lower_percentiles, which denotes (one or more) percentiles for which a lower value will be selected in case a percentile lies between two values (similar to interpolation="lower" in numpy.percentile), or using higher_percentiles, which denotes (one or more) percentiles for which a higher value will be selected in such cases (similar to interpolation="higher" in numpy.percentile).

Here we will obtain the lowest values from cps_mond_norm, such that the probability for the target values being less than these is at least 50%:

[69]:

thresholds = cps_mond_norm.predict(y_hat_test,
                                   sigmas=sigmas_test_knn_res,
                                   bins=bins_test,
                                   higher_percentiles=50)

display(thresholds)

array([0.0353742 , 0.04159038, 0.06926509, ..., 0.05569537, 0.10868666,
       0.03537305])

We can also specify both target values and percentiles; the resulting p-values will be returned in the first column, while any values corresponding to the lower percentiles will be included in the subsequent columns, followed by columns containing the values corresponding to the higher percentiles. The following call hence results in an array with five columns:

[70]:

results = cps_mond_norm.predict(y_hat_test,
                                sigmas=sigmas_test_knn_res,
                                bins=bins_test,
                                y=y_test,
                                lower_percentiles=[2.5, 5],
                                higher_percentiles=[95, 97.5])

display(results)

array([[ 0.4538964 , -0.03010919, -0.01788887,  0.08723634,  0.09914368],
       [ 0.51497133, -0.0524381 , -0.03692873,  0.15441566,  0.18511967],
       [ 0.65150093, -0.21153798, -0.15907945,  0.38112859,  0.48509289],
       ...,
       [ 0.42332248, -0.10429826, -0.06093992,  0.17236863,  0.21616749],
       [ 0.32210591, -0.00997342,  0.0121942 ,  0.24047207,  0.28440468],
       [ 0.53340096, -0.06521173, -0.04644089,  0.1150351 ,  0.1333252 ]])

In addition to p-values and threshold values, we can request that the predict method returns the full conformal predictive distribution (CPD) for each test instance, as defined by the threshold values, by setting return_cpds=True. The format of the distributions vary with the type of conformal predictive system; for a standard and normalized CPS, the output is an array with a row for each test instance and a column for each calibration instance (residual), while for a Mondrian CPS, the default output is a vector containing one CPD per test instance (since the number of values may vary between categories). If the desired output instead is an array of distributions per category, where all distributions in a category have the same number of columns, which in turn depends on the number of calibration instances in the corresponding category, then cpds_by_bins=True may be specified. In case return_cpds=True is specified together with y, lower_percentiles or higher_percentiles, the output of predict will be a pair, with the first element holding the results of the above type and the second element will contain the CPDs.

For the above Mondrian CPS, the following call to predict will result in a vector of distributions, with one element for each test instance.

[71]:

cpds = cps_mond_norm.predict(y_hat_test,
                             sigmas=sigmas_test_knn_res,
                             bins=bins_test,
                             return_cpds=True)

print(f"No. of test instances: {len(y_hat_test)}")
print(f"Shape of cpds: {cpds.shape}")

No. of test instances: 10807
Shape of cpds: (10807,)

If we instead would prefer to represent these distributions by one array per category, we set cpds_by_bins=True, noting that it will be a bit trickier to associate a test instance to a specific distribution.

[72]:

cpds = cps_mond_norm.predict(y_hat_test,
                             sigmas=sigmas_test_knn_res,
                             bins=bins_test,
                             return_cpds=True,
                             cpds_by_bins=True)

for i, cpd in enumerate(cpds):
    print(f"bin {i}: {cpd.shape[0]} test instances, {cpd.shape[1]} threshold values")

print(f"No. of test instances: {sum([c.shape[0] for c in cpds])}")

bin 0: 1922 test instances, 541 threshold values
bin 1: 2235 test instances, 540 threshold values
bin 2: 2288 test instances, 540 threshold values
bin 3: 2213 test instances, 540 threshold values
bin 4: 2149 test instances, 541 threshold values
No. of test instances: 10807

We may also plot the conformal predictive distribution for some test object. In case the calibration set is very large, you may consider plotting an approximation of the full distribution by using a grid of values for lower_percentiles or higher_percentiles, instead of setting return_cpds=True. For the Mondrian CPS, the size of the calibration set for each bin is reasonable in this case, so we may just use the distributions directly.

[73]:

cpds = cps_mond_norm_oob.predict(y_hat_full,
                                 bins=bins_test_oob,
                                 sigmas=sigmas_test_var_oob,
                                 return_cpds=True)

test_index = np.random.randint(len(y_hat_full)) # A test object is randomly selected
cpd = cpds[test_index]

p = np.array([i/len(cpd) for i in range(len(cpd))])

lower_index = np.where(p<=0.025)[0][-1]
mid_index = np.where(p>=0.50)[0][0]
upper_index = np.where(p>=0.975)[0][0]

low_percentile = cpd[lower_index]
median = cpd[mid_index]
high_percentile = cpd[upper_index]

plt.figure(figsize=(6,6))
plt.plot([y_hat_full[test_index],y_hat_full[test_index]],[0,1], color="tab:orange")
plt.plot([y_test[test_index],y_test[test_index]],[0,1], color="tab:red")
plt.xlabel("y")
plt.ylabel("Q(y)")
plt.ylim(0,1)

plt.plot([median,median],[0,1],"g--")
plt.plot([low_percentile,low_percentile],[0,1],"y--")
plt.legend(["ŷ","target","$y_{0.5}$","[$y_{0.025}$,$y_{0.975}$]"])
plt.plot([high_percentile,high_percentile],[0,1],"y--")
plt.plot(cpd,p, color="tab:blue")
rectangle = plt.Rectangle((low_percentile,0),
                          abs(high_percentile-low_percentile),1, color="y",
                          alpha=0.05)
plt.gca().add_patch(rectangle)
plt.show()

Analyzing the p-values#

Let us put all the generated CPS in a dictionary.

[74]:

all_cps = {"Std CPS":cps_std,
           "Std OOB CPS":cps_std_oob,
           "Norm CPS":cps_norm,
           "Norm OOB CPS":cps_norm_oob,
           "Mond CPS":cps_mond_std,
           "Mond OOB CPS":cps_mond_std_oob,
           "Mond norm CPS":cps_mond_norm,
           "Mond norm OOB CPS":cps_mond_norm_oob
          }

Now we will check if the p-values for the test targets seem to be uniformly distributed.

[75]:

plt.subplots(len(all_cps.keys())//2,2,figsize=(12,18))

for i, name in enumerate(all_cps.keys()):

    if "OOB" in name:
        p_values = all_cps[name].predict(y_hat_full,
                                         sigmas=sigmas_test_var_oob,
                                         bins=bins_test_oob,
                                         y=y_test)
    else:
        p_values = all_cps[name].predict(y_hat_test,
                                         sigmas=sigmas_test_var,
                                         bins=bins_test,
                                         y=y_test)

    plt.subplot(len(all_cps.keys())//2,2,i+1)

    plt.scatter(np.sort(p_values),
                [(i+1)/len(y_test) for i in range(len(y_test))],
                label=name, c="y", marker=".", alpha=0.1)

    plt.plot([0,1],[0,1],"r--")
    plt.legend()
    plt.ylabel("fraction")
    plt.xlabel("p value")

plt.show()

Investigating the coverage and size of extracted prediction intervals#

Let us investigate the extracted prediction intervals at the 95% confidence level. This is done by a specifying percentiles corresponding to the interval endpoints.

[76]:

all_cps_intervals = {}

coverages = []
mean_sizes = []
median_sizes = []

for name in all_cps.keys():
    if "OOB" in name:
        intervals = all_cps[name].predict(y_hat_full,
                                          sigmas=sigmas_test_var_oob,
                                          bins=bins_test_oob,
                                          lower_percentiles=2.5,
                                          higher_percentiles=97.5,
                                          y_min=0, y_max=1)
    else:
        intervals = all_cps[name].predict(y_hat_test,
                                          sigmas=sigmas_test_var,
                                          bins=bins_test,
                                          lower_percentiles=2.5,
                                          higher_percentiles=97.5,
                                          y_min=0, y_max=1)
    all_cps_intervals[name] = intervals
    coverages.append(np.sum([1 if (y_test[i]>=intervals[i,0] and
                                   y_test[i]<=intervals[i,1]) else 0
                            for i in range(len(y_test))])/len(y_test))
    mean_sizes.append((intervals[:,1]-intervals[:,0]).mean())
    median_sizes.append(np.median((intervals[:,1]-intervals[:,0])))

pred_int_df = pd.DataFrame({"Coverage":coverages,
                            "Mean size":mean_sizes,
                            "Median size":median_sizes},
                           index=list(all_cps_intervals.keys()))

pred_int_df.loc["Mean"] = [pred_int_df["Coverage"].mean(),
                           pred_int_df["Mean size"].mean(),
                           pred_int_df["Median size"].mean()]

display(pred_int_df.round(4))

	Coverage	Mean size	Median size
Std CPS	0.9545	0.0668	0.0683
Std OOB CPS	0.9525	0.0621	0.0631
Norm CPS	0.9485	0.0521	0.0359
Norm OOB CPS	0.9508	0.0511	0.0355
Mond CPS	0.9489	0.0589	0.0397
Mond OOB CPS	0.9502	0.0567	0.0399
Mond norm CPS	0.9514	0.0567	0.0374
Mond norm OOB CPS	0.9540	0.0540	0.0364
Mean	0.9514	0.0573	0.0445

Investigating the distributions of extracted prediction intervals#

Let us take a look at the distribution of the interval sizes.

[77]:

cps_interval_sizes = {}

for name in all_cps_intervals.keys():
    cps_interval_sizes[name] = \
    all_cps_intervals[name][:,1] - all_cps_intervals[name][:,0]

plt.figure(figsize=(6,6))
plt.ylabel("CDF")
plt.xlabel("Interval sizes")
plt.xlim(0,cps_interval_sizes["Mond OOB CPS"].max()*1.25)

colors = ["b","r","g","y","k","m", "gray", "orange"]

for i, name in enumerate(cps_interval_sizes.keys()):
    if "Std" in name:
        style = "dotted"
    else:
        style = "solid"
    plt.plot(np.sort(cps_interval_sizes[name]),
             [i/len(cps_interval_sizes[name])
              for i in range(1,len(cps_interval_sizes[name])+1)],
             linestyle=style, c=colors[i], label=name)

plt.legend()
plt.show()

Extracting medians#

Let us take a look at the medians; they can be derived using either lower or higher interpolation, but ideally the differences should be small.

[78]:

all_cps_medians = {}

for name in all_cps.keys():
    if "OOB" in name:
        medians = all_cps[name].predict(y_hat_full,
                                        sigmas=sigmas_test_var_oob,
                                        bins=bins_test_oob,
                                        lower_percentiles=50,
                                        higher_percentiles=50)
    else:
        medians = all_cps[name].predict(y_hat_test,
                                        sigmas=sigmas_test_var,
                                        bins=bins_test,
                                        lower_percentiles=50,
                                        higher_percentiles=50)
    all_cps_medians[name] = medians
    print(name)
    print("\tMean difference of the medians:    {:.6f}".format((medians[:,1]-medians[:,0]).mean()))
    print("\tLargest difference of the medians: {:.6f}".format((medians[:,1]-medians[:,0]).max()))

Std CPS
        Mean difference of the medians:    0.000002
        Largest difference of the medians: 0.000002
Std OOB CPS
        Mean difference of the medians:    0.000001
        Largest difference of the medians: 0.000001
Norm CPS
        Mean difference of the medians:    0.000014
        Largest difference of the medians: 0.000815
Norm OOB CPS
        Mean difference of the medians:    0.000003
        Largest difference of the medians: 0.000150
Mond CPS
        Mean difference of the medians:    0.000019
        Largest difference of the medians: 0.000058
Mond OOB CPS
        Mean difference of the medians:    0.000001
        Largest difference of the medians: 0.000003
Mond norm CPS
        Mean difference of the medians:    0.000013
        Largest difference of the medians: 0.000349
Mond norm OOB CPS
        Mean difference of the medians:    0.000001
        Largest difference of the medians: 0.000004

Another view of the medians and prediction intervals#

[79]:

plt.subplots(len(all_cps_intervals.keys())//2,2,figsize=(12,24))

sorted_prop_indexes = np.argsort(y_hat_test)

sorted_full_indexes = np.argsort(y_hat_full)

alpha=0.2

for i, name in enumerate(all_cps_intervals.keys()):

    plt.subplot(len(all_cps_intervals.keys())//2,2,i+1)
    if "OOB" in name:
        indexes = sorted_full_indexes
        y_hat_ = y_hat_full
    else:
        indexes = sorted_prop_indexes
        y_hat_ = y_hat_test

    plt.title(name)
    plt.plot(y_hat_[indexes], all_cps_intervals[name][indexes,0],
             color="r", alpha=alpha)
    plt.plot(y_hat_[indexes], all_cps_intervals[name][indexes,1],
             color="r", alpha=alpha)
    plt.scatter(y_hat_[indexes],y_test[indexes],
                color="b", marker="o", alpha=alpha)
    plt.scatter(y_hat_[indexes],y_hat_[indexes],
                color="y", marker=".", alpha=alpha)
    plt.xlabel("predicted")
    plt.ylabel("endpoints")

plt.show()

Evaluating the CPS using a test set#

Let us evaluate the generated CPS using three confidence levels on the test set. We could specify a subset of the metrics to use by the metrics parameter of the evaluate method; here we use all metrics, which is the default

Note that values for the parameters sigmas and bins can always be provided, but they will be ignored by CPS that have not been fitted with such values, e.g., both arguments will be ignored by the standard CPS.

Note that CRPS takes some time to compute, in particular when the CPS have been fitted with larger calibration sets.

[80]:

confidence_levels = [0.9,0.95,0.99]

names = np.array(list(all_cps.keys()))

first_set = names[["OOB" not in name for name in names]]
second_set = names[["OOB" in name for name in names]]

for methods in [first_set, second_set]:
    all_cps_results = {}
    for confidence in confidence_levels:
        for name in methods:
            if "OOB" in name:
                all_cps_results[(name,confidence)] = all_cps[name].evaluate(
                    y_hat_full, y_test, sigmas=sigmas_test_var_oob,
                    bins=bins_test_oob, confidence=confidence,
                    y_min=0, y_max=1)
            else:
                all_cps_results[(name,confidence)] =  all_cps[name].evaluate(
                    y_hat_test, y_test, sigmas=sigmas_test_var,
                    bins=bins_test, confidence=confidence, y_min=0, y_max=1)

    cps_results_df = pd.DataFrame(columns=pd.MultiIndex.from_product(
        [methods,confidence_levels]), index=list(list(
        all_cps_results.values())[0].keys()))

    for key in all_cps_results.keys():
        cps_results_df[key] = all_cps_results[key].values()

    display(cps_results_df.round(4))

	Std CPS			Norm CPS			Mond CPS			Mond norm CPS
	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99
error	0.1010	0.0455	0.0099	0.1036	0.0515	0.0090	0.0987	0.0511	0.0087	0.0998	0.0486	0.0076
eff_mean	0.0427	0.0668	0.1339	0.0404	0.0521	0.0816	0.0438	0.0589	0.0980	0.0444	0.0567	0.0930
eff_med	0.0430	0.0683	0.1402	0.0276	0.0359	0.0576	0.0320	0.0397	0.0763	0.0298	0.0374	0.0601
CRPS	0.0076	0.0076	0.0076	0.0070	0.0070	0.0070	0.0072	0.0072	0.0072	0.0069	0.0069	0.0069
time_fit	0.0005	0.0005	0.0005	0.0005	0.0005	0.0005	0.0002	0.0002	0.0002	0.0002	0.0002	0.0002
time_evaluate	0.3766	0.3519	0.3648	0.3829	0.4043	0.3981	0.2397	0.2470	0.2466	0.2734	0.2777	0.2780

	Std OOB CPS			Norm OOB CPS			Mond OOB CPS			Mond norm OOB CPS
	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99	0.90	0.95	0.99
error	0.1050	0.0475	0.0112	0.0946	0.0492	0.0088	0.0936	0.0498	0.0085	0.0963	0.0460	0.0081
eff_mean	0.0408	0.0621	0.1241	0.0403	0.0511	0.0792	0.0435	0.0567	0.1000	0.0420	0.0540	0.0836
eff_med	0.0410	0.0631	0.1311	0.0278	0.0355	0.0560	0.0310	0.0399	0.0726	0.0288	0.0364	0.0584
CRPS	0.0073	0.0073	0.0073	0.0068	0.0068	0.0068	0.0070	0.0070	0.0070	0.0068	0.0068	0.0068
time_fit	0.0005	0.0005	0.0005	0.0005	0.0005	0.0005	0.0007	0.0007	0.0007	0.0008	0.0008	0.0008
time_evaluate	0.9068	0.7983	0.7986	0.8762	0.8774	0.8807	0.3195	0.3273	0.3224	0.3717	0.3657	0.3609