The crepes package#

class crepes.WrapClassifier(learner)[source]#

A learner wrapped with a ConformalClassifier.

Methods

calibrate(X, y[, bins, oob, class_cond, nc])

Fit a ConformalClassifier using learner.

evaluate(X, y[, bins, confidence, ...])

Evaluate ConformalClassifier.

fit(X, y, **kwargs)

Fit learner.

predict(X)

Predict with learner.

predict_p(X[, bins])

Obtain (smoothed) p-values using conformal classifier.

predict_proba(X)

Predict with learner.

predict_set(X[, bins, confidence, smoothing])

Obtain prediction sets using conformal classifier.

fit(X, y, **kwargs)[source]#

Fit learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – target values

  • kwargs (optional arguments) – any additional arguments are forwarded to the fit method of the learner object

Return type:

None

Examples

Assuming X_train and y_train to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:

from sklearn.ensemble import RandomForestClassifier
from crepes import WrapClassifier

rf = Wrap(RandomForestClassifier())
rf.fit(X_train, y_train)

Note

The learner, which can be accessed by rf.learner, may be fitted before as well as after being wrapped.

Note

All arguments, including any additional keyword arguments, to fit() are forwarded to the fit method of the learner.

predict(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – values predicted using the predict method of the learner object.

Return type:

array-like of shape (n_samples,),

Examples

Assuming w is a WrapClassifier object for which the wrapped learner w.learner has been fitted, (point) predictions of the learner can be obtained for a set of test objects X_test by:

y_hat = w.predict(X_test)

The above is equivalent to:

y_hat = w.learner.predict(X_test)
predict_proba(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – predicted probabilities using the predict_proba method of the learner object.

Return type:

array-like of shape (n_samples, n_classes),

Examples

Assuming w is a WrapClassifier object for which the wrapped learner w.learner has been fitted, predicted probabilities of the learner can be obtained for a set of test objects X_test by:

probabilities = w.predict_proba(X_test)

The above is equivalent to:

probabilities = w.learner.predict_proba(X_test)
calibrate(X, y, bins=None, oob=False, class_cond=False, nc=<function hinge>)[source]#

Fit a ConformalClassifier using learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – target values

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • oob (bool, default=False) – use out-of-bag estimation

  • class_cond (bool, default=False) – if class_cond=True, the method fits a Mondrian ConformalClassifier using the class labels as categories

  • nc (function, default = crepes.extras.hinge()) – function to compute non-conformity scores

Returns:

self – Wrap object updated with a fitted ConformalClassifier

Return type:

object

Examples

Assuming X_cal and y_cal to be an array and vector, respectively, with objects and labels for the calibration set, and w is a WrapClassifier object for which the learner has been fitted, a standard conformal classifier can be formed by:

w.calibrate(X_cal, y_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), a Mondrian conformal classifier can be generated by:

w.calibrate(X_cal, y_cal, bins=bins_cal)

By providing the option oob=True, the conformal classifier will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,

w.calibrate(X_train, y_train, oob=True)

By providing the option class_cond=True, a Mondrian conformal classifier will be formed using the class labels as categories, e.g.,

w.calibrate(X_cal, y_cal, class_cond=True)

Note

Any Mondrian categories provided with the bins argument will be ignored by calibrate(), if class_cond=True, as the latter implies that Mondrian categories are formed using the labels in y.

Note

Enabling out-of-bag calibration, i.e., setting oob=True, requires that the wrapped learner has an attribute oob_decision_function_, which e.g., is the case for a sklearn.ensemble.RandomForestClassifier, if enabled when created, e.g., RandomForestClassifier(oob_score=True)

Note

The use of out-of-bag calibration, as enabled by oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal classifiers, due to that calibration and test instances are not handled in exactly the same way.

predict_p(X, bins=None)[source]#

Obtain (smoothed) p-values using conformal classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that X_test is a set of test objects and w is a WrapClassifier object that has been calibrated, i.e., calibrate() has been applied, the p-values for the test objects are obtained by:

p_values = w.predict_p(X_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and w is a WrapClassifier object that has been calibrated with bins, the following provides p-values for the test set:

p_values = w.predict_p(X_test, bins=bins_test)
predict_set(X, bins=None, confidence=0.95, smoothing=False)[source]#

Obtain prediction sets using conformal classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=False) – use smoothed p-values

Returns:

prediction sets – prediction sets, where the value 1 (0) indicates that the class label is included (excluded), i.e., the corresponding p-value is less than 1-confidence

Return type:

ndarray of shape (n_values, n_classes)

Examples

Assuming that X_test is a set of test objects and w is a WrapClassifier object that has been calibrated, i.e., calibrate() has been applied, the prediction sets for the test objects at the default confidence level (95%) are obtained by:

prediction_sets = w.predict_set(X_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and w is a WrapClassifier object that has been calibrated with bins, the following provides prediction sets at the 99% confidence level:

prediction_sets = w.predict_set(X_test, bins=bins_test,
                                confidence=0.99)

Note

Using smoothed p-values substantially increases computation time and hardly has any effect on the predictions sets, except for when having small calibration sets.

evaluate(X, y, bins=None, confidence=0.95, smoothing=False, metrics=None)[source]#

Evaluate ConformalClassifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – set of objects

  • y (array-like of shape (n_samples,)) – correct target values

  • bins (array-like of shape (n_samples,), default=None,) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=False) – use smoothed p-values

  • metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “time_fit”, “time_evaluate”]

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, bins_test is a vector with Mondrian categories (bin labels) for the test set, and w is a calibrated WrapClassifier object, then the latter can be evaluated at the 90% confidence level with respect to error, average prediction set size and fraction of singleton predictions by:

results = w.evaluate(X_test, y_test, bins=bins_test, confidence=0.9,
                     metrics=["error", "avg_c", "one_c"])

Note

The reported result for time_fit only considers fitting the conformal regressor or predictive system; not for fitting the learner.

Note

Using smoothed p-values substantially increases computation time and hardly has any effect on the results, except for when having small calibration sets.

class crepes.WrapRegressor(learner)[source]#

A learner wrapped with a ConformalRegressor or ConformalPredictiveSystem.

Methods

calibrate(X, y[, sigmas, bins, oob, cps])

Fit a ConformalRegressor or ConformalPredictiveSystem using learner.

evaluate(X, y[, sigmas, bins, confidence, ...])

Evaluate ConformalRegressor or ConformalPredictiveSystem.

fit(X, y, **kwargs)

Fit learner.

predict(X)

Predict with learner.

predict_cps(X[, sigmas, bins, y, ...])

Predict using ConformalPredictiveSystem.

predict_int(X[, sigmas, bins, confidence, ...])

Predict interval using fitted ConformalRegressor or ConformalPredictiveSystem.

fit(X, y, **kwargs)[source]#

Fit learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – target values

  • kwargs (optional arguments) – any additional arguments are forwarded to the fit method of the learner object

Return type:

None

Examples

Assuming X_train and y_train to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:

from sklearn.ensemble import RandomForestRegressor
from crepes import WrapRegressor

rf = WrapRegressor(RandomForestRegressor())
rf.fit(X_train, y_train)

Note

The learner, which can be accessed by rf.learner, may be fitted before as well as after being wrapped.

Note

All arguments, including any additional keyword arguments, to fit() are forwarded to the fit method of the learner.

predict(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – values predicted using the predict method of the learner object.

Return type:

array-like of shape (n_samples,),

Examples

Assuming w is a WrapRegressor object for which the wrapped learner w.learner has been fitted, (point) predictions of the learner can be obtained for a set of test objects X_test by:

y_hat = w.predict(X_test)

The above is equivalent to:

y_hat = w.learner.predict(X_test)
calibrate(X, y, sigmas=None, bins=None, oob=False, cps=False)[source]#

Fit a ConformalRegressor or ConformalPredictiveSystem using learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – target values

  • sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • oob (bool, default=False) – use out-of-bag estimation

  • cps (bool, default=False) – if cps=False, the method fits a ConformalRegressor and otherwise, a ConformalPredictiveSystem

Returns:

self – The WrapRegressor object is updated with a fitted ConformalRegressor or ConformalPredictiveSystem

Return type:

object

Examples

Assuming X_cal and y_cal to be an array and vector, respectively, with objects and labels for the calibration set, and w is a WrapRegressor object for which the learner has been fitted, a standard conformal regressor is formed by:

w.calibrate(X_cal, y_cal)

Assuming that sigmas_cal is a vector with difficulty estimates, a normalized conformal regressor is obtained by:

w.calibrate(X_cal, y_cal, sigmas=sigmas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), a Mondrian conformal regressor is obtained by:

w.calibrate(X_cal, y_cal, bins=bins_cal)

A normalized Mondrian conformal regressor is generated in the following way:

w.calibrate(X_cal, y_cal, sigmas=sigmas_cal, bins=bins_cal)

By providing the option oob=True, the conformal regressor will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,

w.calibrate(X_train, y_train, oob=True)

By providing the option cps=True, each of the above calls will instead generate a ConformalPredictiveSystem, e.g.,

w.calibrate(X_cal, y_cal, sigmas=sigmas_cal, cps=True)

Note

Enabling out-of-bag calibration, i.e., setting oob=True, requires that the wrapped learner has an attribute oob_prediction_, which e.g., is the case for a sklearn.ensemble.RandomForestRegressor, if enabled when created, e.g., RandomForestRegressor(oob_score=True)

Note

The use of out-of-bag calibration, as enabled by oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.

predict_int(X, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#

Predict interval using fitted ConformalRegressor or ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_samples, 2)

Examples

Assuming that X_test is a set of test objects and w is a WrapRegressor object that has been calibrated, i.e., calibrate() has been applied, prediction intervals at the 99% confidence level can be obtained by:

intervals = w.predict_int(X_test, confidence=0.99)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and w is a WrapRegressor object that has been calibrated with both residuals and difficulty estimates, prediction intervals at the default (95%) confidence level can be obtained by:

intervals = w.predict_int(X_test, sigmas=sigmas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and w is a WrapRegressor object that has been calibrated with both residuals and bins, the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:

intervals = w.predict_int(X_test, bins=bins_test, y_min=0)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.

Note

Note that sigmas and bins will be ignored by predict_int(), if the WrapRegressor object has been calibrated without specifying any such values.

Note

Note that an error will be reported if sigmas and bins are not provided to predict_int(), if the WrapRegressor object has been calibrated with such values.

predict_cps(X, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False)[source]#

Predict using ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • y (float, int or array-like of shape (n_samples,), default=None) – values for which p-values should be returned

  • lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.

  • y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.

  • return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not

  • cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True

Returns:

  • results (ndarray of shape (n_samples, n_cols) or (n_samples,)) – the shape is (n_samples, n_cols) if n_cols > 1 and otherwise (n_samples,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.

  • cpds (ndarray of (n_samples, c_values), ndarray of (n_samples,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, w is a WrapRegressor object calibrated with the option cps=True, p-values for the true targets can be obtained by:

p_values = w.predict_cps(X_test, y=y_test)

P-values with respect to some specific value, e.g., 37, can be obtained by:

p_values = w.predict_cps(X_test, y=37)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and w has been calibrated with such estimates, the 90th and 95th percentiles can be obtained by:

percentiles = w.predict_cps(X_test, sigmas=sigmas_test,
                            higher_percentiles=[90,95])

In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:

percentiles = w.predict_cps(X_test, sigmas=sigmas_test,
                            lower_percentiles=[90,95])

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and w has been calibrated with bins, the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:

intervals = w.predict_cps(X_test, bins=bins_test,
                          lower_percentiles=2.5,
                          higher_percentiles=97.5,
                          y_min=0)

If we would like to obtain the conformal distributions, we could write the following:

cpds = w.predict_cps(X_test, sigmas=sigmas_test, return_cpds=True)

The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). If the learner is wrapped with a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:

cpds = w.predict_cps(X_test, sigmas=sigmas_test, return_cpds=True,
                     cpds_by_bins=True)

Note

This method is available only if the learner has been wrapped with a ConformalPredictiveSystem, i.e., calibrate() has been called with the option cps=True.

Note

In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be y_min and y_max, respectively.

Note

Setting return_cpds=True may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

Setting cpds_by_bins=True has an effect only for Mondrian conformal predictive systems.

evaluate(X, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#

Evaluate ConformalRegressor or ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – set of objects

  • y (array-like of shape (n_samples,)) – correct target values

  • sigmas (array-like of shape (n_samples,), default=None,) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None,) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings, default=list of all) – metrics; for a learner wrapped with a conformal regressor these are “error”, “eff_mean”,”eff_med”, “time_fit”, and “time_evaluate”, while if wrapped with a conformal predictive system, the metrics also include “CRPS”.

Returns:

results – estimated performance using the metrics

Return type:

dictionary with a key for each selected metric

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, sigmas_test and bins_test are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, and w is a calibrated WrapRegressor object, then the latter can be evaluated at the 90% confidence level with respect to error, mean and median efficiency (interval size) by:

results = w.evaluate(X_test, y_test, sigmas=sigmas_test,
                     bins=bins_test, confidence=0.9,
                     metrics=["error", "eff_mean", "eff_med"])

Note

If included in the list of metrics, “CRPS” (continuous-ranked probability score) will be ignored if the WrapRegressor object has been calibrated with the (default) option cps=False, i.e., the learner is wrapped with a ConformalRegressor.

Note

The use of the metric CRPS may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

The reported result for time_fit only considers fitting the conformal regressor or predictive system; not for fitting the learner.

class crepes.ConformalClassifier[source]#

A conformal classifier transforms non-conformity scores into p-values or prediction sets for a certain confidence level.

Methods

evaluate(alphas, classes, y[, bins, ...])

Evaluate conformal classifier.

fit(alphas[, bins])

Fit conformal classifier.

predict_p(alphas[, bins, confidence])

Obtain (smoothed) p-values from conformal classifier.

predict_set(alphas[, bins, confidence, ...])

Obtain prediction sets using conformal classifier.

fit(alphas, bins=None)[source]#

Fit conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples,)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

Returns:

self – Fitted ConformalClassifier.

Return type:

object

Examples

Assuming that alphas_cal is a vector with non-conformity scores, then a standard conformal classifier is formed in the following way:

from crepes import ConformalClassifier

cc_std = ConformalClassifier()

cc_std.fit(alphas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal classifier is fitted in the following way:

cc_mond = ConformalClassifier()
cc_mond.fit(alphas_cal, bins=bins_cal)
predict_p(alphas, bins=None, confidence=0.95)[source]#

Obtain (smoothed) p-values from conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set and cc_std a fitted standard conformal classifier, then p-values for the test is obtained by:

p_values = cc_std.predict_p(alphas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides p-values for the test set:

p_values = cc_mond.predict_p(alphas_test, bins=bins_test)
predict_set(alphas, bins=None, confidence=0.95, smoothing=False)[source]#

Obtain prediction sets using conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=False) – use smoothed p-values

Returns:

prediction sets – prediction sets

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set and cc_std a fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained by:

prediction_sets = cc_std.predict_set(alphas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:

p_values = cc_mond.predict_set(alphas_test,
                               bins=bins_test,
                               confidence=0.9)

Note

Using smoothed p-values substantially increases computation time and hardly has any effect on the predictions sets, except for when having small calibration sets.

evaluate(alphas, classes, y, bins=None, confidence=0.95, smoothing=False, metrics=None)[source]#

Evaluate conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • classes (array-like of shape (n_classes,)) – class names

  • y (array-like of shape (n_samples,)) – correct class labels

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=False) – use smoothed p-values

  • metrics (a string or a list of strings,) – default = list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “time_fit”, “time_evaluate”]

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that alphas is an array containing non-conformity scores for all classes for the test objects, classes and y_test are vectors with the class names and true class labels for the test set, respectively, and cc is a fitted standard conformal classifier, then the latter can be evaluated at the default confidence level with respect to error and average number of labels in the prediction sets by:

results = cc.evaluate(alphas, y_test, metrics=["error", "avg_c"])

Note

Using smoothed p-values substantially increases computation time and hardly has any effect on the results, except for when having small calibration sets.

class crepes.ConformalRegressor[source]#

A conformal regressor transforms point predictions (regression values) into prediction intervals, for a certain confidence level.

Methods

evaluate(y_hat, y[, sigmas, bins, ...])

Evaluate conformal regressor.

fit(residuals[, sigmas, bins])

Fit conformal regressor.

predict(y_hat[, sigmas, bins, confidence, ...])

Predict using conformal regressor.

fit(residuals, sigmas=None, bins=None)[source]#

Fit conformal regressor.

Parameters:
  • residuals (array-like of shape (n_values,)) – true values - predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

Returns:

self – Fitted ConformalRegressor.

Return type:

object

Examples

Assuming that y_cal and y_hat_cal are vectors with true and predicted targets for some calibration set, then a standard conformal regressor can be formed from the residuals:

residuals_cal = y_cal - y_hat_cal

from crepes import ConformalRegressor

cr_std = ConformalRegressor()

cr_std.fit(residuals_cal)

Assuming that sigmas_cal is a vector with difficulty estimates, then a normalized conformal regressor can be fitted in the following way:

cr_norm = ConformalRegressor()
cr_norm.fit(residuals_cal, sigmas=sigmas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal regressor can be fitted in the following way:

cr_mond = ConformalRegressor()
cr_mond.fit(residuals_cal, bins=bins_cal)

A normalized Mondrian conformal regressor can be fitted in the following way:

cr_norm_mond = ConformalRegressor()
cr_norm_mond.fit(residuals_cal, sigmas=sigmas_cal,
                 bins=bins_cal)
predict(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#

Predict using conformal regressor.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_values, 2)

Examples

Assuming that y_hat_test is a vector with predicted targets for a test set and cr_std a fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained by:

intervals = cr_std.predict(y_hat_test, confidence=0.99)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cr_norm a fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:

intervals = cr_norm.predict(y_hat_test, sigmas=sigmas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cr_mond a fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:

intervals = cr_mond.predict(y_hat_test, bins=bins_test,
                            y_min=0)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.

evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#

Evaluate conformal regressor.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct target values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “eff_mean”, “eff_med”, “time_fit”, “time_evaluate”]

Returns:

results – estimated performance using the metrics

Return type:

dictionary with a key for each selected metric

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets for a test set, sigmas_test and bins_test are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, and cr_norm_mond is a fitted normalized Mondrian conformal regressor, then the latter can be evaluated at the default confidence level with respect to error and mean efficiency (interval size) by:

results = cr_norm_mond.evaluate(y_hat_test, y_test,
                                sigmas=sigmas_test, bins=bins_test,
                                metrics=["error", "eff_mean"])
class crepes.ConformalPredictiveSystem[source]#

A conformal predictive system transforms point predictions (regression values) into cumulative distribution functions (conformal predictive distributions).

Methods

evaluate(y_hat, y[, sigmas, bins, ...])

Evaluate conformal predictive system.

fit(residuals[, sigmas, bins])

Fit conformal predictive system.

predict(y_hat[, sigmas, bins, y, ...])

Predict using conformal predictive system.

fit(residuals, sigmas=None, bins=None)[source]#

Fit conformal predictive system.

Parameters:
  • residuals (array-like of shape (n_values,)) – actual values - predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

Returns:

self – Fitted ConformalPredictiveSystem.

Return type:

object

Examples

Assuming that y_cal and y_hat_cal are vectors with true and predicted targets for some calibration set, then a standard conformal predictive system can be formed from the residuals:

residuals_cal = y_cal - y_hat_cal

from crepes import ConformalPredictiveSystem

cps_std = ConformalPredictiveSystem()

cps_std.fit(residuals_cal)

Assuming that sigmas_cal is a vector with difficulty estimates, then a normalized conformal predictive system can be fitted in the following way:

cps_norm = ConformalPredictiveSystem()
cps_norm.fit(residuals_cal, sigmas=sigmas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal predictive system can be fitted in the following way:

cps_mond = ConformalPredictiveSystem()
cps_mond.fit(residuals_cal, bins=bins_cal)

A normalized Mondrian conformal predictive system can be fitted in the following way:

cps_norm_mond = ConformalPredictiveSystem()
cps_norm_mond.fit(residuals_cal, sigmas=sigmas_cal,
                  bins=bins_cal)
predict(y_hat, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False)[source]#

Predict using conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • y (float, int or array-like of shape (n_values,), default=None) – values for which p-values should be returned

  • lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.

  • y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.

  • return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not

  • cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True

Returns:

  • results (ndarray of shape (n_values, n_cols) or (n_values,)) – the shape is (n_values, n_cols) if n_cols > 1 and otherwise (n_values,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.

  • cpds (ndarray of (n_values, c_values), ndarray of (n_values,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, the p-values for the true targets can be obtained by:

p_values = cps_std.predict(y_hat_test, y=y_test)

The p-values with respect to some specific value, e.g., 37, can be obtained by:

p_values = cps_std.predict(y_hat_test, y=37)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cps_norm a fitted normalized conformal predictive system, then the 90th and 95th percentiles can be obtained by:

percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                               higher_percentiles=[90,95])

In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:

percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                               lower_percentiles=[90,95])

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cps_mond a fitted Mondrian conformal regressor, then the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:

intervals = cps_mond.predict(y_hat_test, bins=bins_test,
                             lower_percentiles=2.5,
                             higher_percentiles=97.5,
                             y_min=0)

If we would like to obtain the conformal distributions, we could write the following:

cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                        return_cpds=True)

The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). For a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:

cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                        return_cpds=True, cpds_by_bins=True)

Note

In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be y_min and y_max, respectively.

Note

Setting return_cpds=True may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

Setting cpds_by_bins=True has an effect only for Mondrian conformal predictive systems.

evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#

Evaluate conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct target values

  • sigmas (array-like of shape (n_values,), default=None,) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None,) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings, default=list of all) –

    metrics; [“error”, “eff_mean”,”eff_med”, “CRPS”, “time_fit”,

    ”time_evaluate”]

Returns:

results – estimated performance using the metrics

Return type:

dictionary with a key for each selected metric

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets for a test set, sigmas_test and bins_test are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, and cps_norm_mond is a fitted normalized Mondrian conformal predictive system, then the latter can be evaluated at the default confidence level with respect to error, mean and median efficiency (interval size, given the default confidence level) and continuous-ranked probability score (CRPS) by:

results = cps_norm_mond.evaluate(y_hat_test, y_test,
                                 sigmas=sigmas_test, bins=bins_test,
                                 metrics=["error", "eff_mean",
                                          "eff_med", "CRPS"])

Note

The use of the metric CRPS may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.