The crepes package#

class crepes.WrapClassifier(learner)[source]#

A learner wrapped with a ConformalClassifier.

Methods

calibrate([X, y, oob, class_cond, nc, mc, seed])

Fit a ConformalClassifier using the wrapped learner.

evaluate(X, y[, confidence, smoothing, ...])

Evaluate the conformal classifier.

fit(X, y, **kwargs)

Fit learner.

predict(X)

Predict with learner.

predict_p(X[, y, all_classes, smoothing, ...])

Obtain (smoothed or non-smoothed) p-values using conformal classifier.

predict_proba(X)

Predict with learner.

predict_set(X[, y, confidence, smoothing, ...])

Obtain prediction sets using conformal classifier.

fit(X, y, **kwargs)[source]#

Fit learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – labels

  • kwargs (optional arguments) – any additional arguments are forwarded to the fit method of the learner object

Return type:

None

Examples

Assuming X_train and y_train to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:

from sklearn.ensemble import RandomForestClassifier
from crepes import WrapClassifier

rf = Wrap(RandomForestClassifier())
rf.fit(X_train, y_train)

Note

The learner, which can be accessed by rf.learner, may be fitted before as well as after being wrapped.

Note

All arguments, including any additional keyword arguments, to fit() are forwarded to the fit method of the learner.

predict(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – values predicted using the predict method of the learner object.

Return type:

array-like of shape (n_samples,),

Examples

Assuming w is a WrapClassifier object for which the wrapped learner w.learner has been fitted, (point) predictions of the learner can be obtained for a set of test objects X_test by:

y_hat = w.predict(X_test)

The above is equivalent to:

y_hat = w.learner.predict(X_test)
predict_proba(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – predicted probabilities using the predict_proba method of the learner object.

Return type:

array-like of shape (n_samples, n_classes),

Examples

Assuming w is a WrapClassifier object for which the wrapped learner w.learner has been fitted, predicted probabilities of the learner can be obtained for a set of test objects X_test by:

probabilities = w.predict_proba(X_test)

The above is equivalent to:

probabilities = w.learner.predict_proba(X_test)
calibrate(X=[], y=[], oob=False, class_cond=False, nc=<function hinge>, mc=None, seed=None)[source]#

Fit a ConformalClassifier using the wrapped learner.

Parameters:
  • X (array-like of shape (n_samples, n_features), default=[]) – set of objects

  • y (array-like of shape (n_samples,), default=[]) – labels

  • oob (bool, default=False) – use out-of-bag estimation

  • class_cond (bool, default=False) – if class_cond=True, the method fits a Mondrian ConformalClassifier using the class labels as categories

  • nc (function, default = crepes.extras.hinge()) – function to compute non-conformity scores

  • mc (function or crepes.extras.MondrianCategorizer, default=None) – function or crepes.extras.MondrianCategorizer for computing Mondrian categories

  • seed (int, default=None) – set random seed

Returns:

self – Wrap object updated with a fitted ConformalClassifier

Return type:

object

Examples

Assuming X_cal and y_cal to be an array and vector, respectively, with objects and labels for the calibration set, and w is a WrapClassifier object for which the learner has been fitted, a standard conformal classifier can be formed by:

w.calibrate(X_cal, y_cal)

Assuming that get_categories is a function that returns a vector of Mondrian categories (bin labels), a Mondrian conformal classifier can be generated by:

w.calibrate(X_cal, y_cal, mc=get_categories)

By providing the option oob=True, the conformal classifier will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,

w.calibrate(X_train, y_train, oob=True)

By providing the option class_cond=True, a Mondrian conformal classifier will be formed using the class labels as categories, e.g.,

w.calibrate(X_cal, y_cal, class_cond=True)

Note

Any Mondrian categorizer specified by the mc argument will be ignored by calibrate(), if class_cond=True, as the latter implies that Mondrian categories are formed using the labels in y.

Note

By providing a random seed, e.g., seed=123, the call to calibrate as well as calls to the methods predict_set, predict_p and evaluate of the WrapClassifier object will be deterministic.

Note

Enabling out-of-bag calibration, i.e., setting oob=True, requires that the wrapped learner has an attribute oob_decision_function_, which e.g., as for a sklearn.ensemble.RandomForestClassifier, if enabled when created, e.g., RandomForestClassifier(oob_score=True)

Note

The use of out-of-bag calibration, as enabled by oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal classifiers, due to that calibration and test instances are not handled in exactly the same way.

predict_p(X, y=None, all_classes=True, smoothing=True, seed=None, online=False, warm_start=True)[source]#

Obtain (smoothed or non-smoothed) p-values using conformal classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct class labels; used only if online=True or all_classes=False

  • all_classes (bool, default=True) – return p-values for all classes

  • smoothing (bool, default=True) – use smoothed p-values

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that X_test is a set of test objects and w is a WrapClassifier object that has been calibrated, i.e., calibrate() has been applied, the (smoothed) p-values for the test objects are obtained by:

p_values = w.predict_p(X_test)

Assuming that y_test a vector of correct labels for the test objects, then p-values for the test objects are obtained using online calibration by:

p_values = w.predict_p(X_test, y_test, online=True)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to calibrate.

predict_set(X, y=None, confidence=0.95, smoothing=True, seed=None, online=False, warm_start=True)[source]#

Obtain prediction sets using conformal classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct class labels; used only if online=True

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=True) – use smoothed p-values

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

prediction sets – prediction sets, where the value 1 (0) indicates that the class label is included (excluded), i.e., the corresponding p-value is less than 1-confidence

Return type:

ndarray of shape (n_values, n_classes)

Examples

Assuming that X_test is a set of test objects and w is a WrapClassifier object that has been calibrated, i.e., calibrate() has been applied, then prediction sets for the test objects at the 99% confidence level are obtained by:

prediction_sets = w.predict_set(X_test, confidence=0.99)

Assuming that y_test a vector of correct labels for the test objects, then prediction sets for the test objects at the default (95%) confidence level are obtained using online calibration by:

prediction_sets = w.predict_set(X_test, y_test, online=True)

Note

The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to calibrate.

evaluate(X, y, confidence=0.95, smoothing=True, metrics=None, seed=None, online=False, warm_start=True)[source]#

Evaluate the conformal classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – set of objects

  • y (array-like of shape (n_samples,)) – correct labels

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=True) – use smoothed p-values

  • metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “ks_test”, “time_fit”, “time_evaluate”]

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, and w is a calibrated WrapClassifier object, then the latter can be evaluated at the 90% confidence level with respect to error, average prediction set size and fraction of singleton predictions by:

results = w.evaluate(X_test, y_test, confidence=0.9,
                     metrics=["error", "avg_c", "one_c"])

Note

The reported result for time_fit only considers fitting the conformal regressor or predictive system; not for fitting the learner.

Note

The use of smoothed p-values increases computation time and typically has a minor effect on the results, except for small calibration sets.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to calibrate.

class crepes.WrapRegressor(learner)[source]#

A learner wrapped with a ConformalRegressor or ConformalPredictiveSystem.

Methods

calibrate([X, y, de, mc, oob, cps, seed])

Fit a ConformalRegressor or ConformalPredictiveSystem using the wrapped learner.

evaluate(X, y[, confidence, y_min, y_max, ...])

Evaluate ConformalRegressor or ConformalPredictiveSystem.

fit(X, y, **kwargs)

Fit learner.

predict(X)

Predict with learner.

predict_cpds(X[, y, seed, online, warm_start])

Obtain conformal predictive distributions from conformal predictive system.

predict_cps(X[, y, lower_percentiles, ...])

Predict using ConformalPredictiveSystem.

predict_int(X[, y, confidence, y_min, ...])

Obtain prediction intervals with fitted ConformalRegressor or ConformalPredictiveSystem.

predict_p(X[, y, t, smoothing, seed, ...])

Return (smoothed or non-smoothed) p-values for provided targets, using fitted ConformalRegressor or ConformalPredictiveSystem.

predict_percentiles(X[, y, ...])

Obtain percentiles with conformal predictive system.

fit(X, y, **kwargs)[source]#

Fit learner.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,),) – labels

  • kwargs (optional arguments) – any additional arguments are forwarded to the fit method of the learner object

Return type:

None

Examples

Assuming X_train and y_train to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:

from sklearn.ensemble import RandomForestRegressor
from crepes import WrapRegressor

rf = WrapRegressor(RandomForestRegressor())
rf.fit(X_train, y_train)

Note

The learner, which can be accessed by rf.learner, may be fitted before as well as after being wrapped.

Note

All arguments, including any additional keyword arguments, to fit() are forwarded to the fit method of the learner.

predict(X)[source]#

Predict with learner.

Parameters:

X (array-like of shape (n_samples, n_features),) – set of objects

Returns:

y – values predicted using the predict method of the learner object.

Return type:

array-like of shape (n_samples,),

Examples

Assuming w is a WrapRegressor object for which the wrapped learner w.learner has been fitted, (point) predictions of the learner can be obtained for a set of test objects X_test by:

y_hat = w.predict(X_test)

The above is equivalent to:

y_hat = w.learner.predict(X_test)
calibrate(X=[], y=[], de=None, mc=None, oob=False, cps=False, seed=None)[source]#

Fit a ConformalRegressor or ConformalPredictiveSystem using the wrapped learner.

Parameters:
Returns:

self – The WrapRegressor object is updated with a fitted ConformalRegressor or ConformalPredictiveSystem

Return type:

object

Examples

Assuming X_cal and y_cal to be an array and vector, respectively, with objects and labels for the calibration set, and w is a WrapRegressor object for which the learner has been fitted, a standard conformal regressor is formed by:

w.calibrate(X_cal, y_cal)

Assuming that de is a fitted difficulty estimator, a normalized conformal regressor is obtained by:

w.calibrate(X_cal, y_cal, de=de)

Assuming that get_categories is a function that returns categories (bin labels), a Mondrian conformal regressor is obtained by:

w.calibrate(X_cal, y_cal, mc=get_categories)

A normalized Mondrian conformal regressor is generated in the following way:

w.calibrate(X_cal, y_cal, de=de, mc=get_categories)

By providing the option oob=True, the conformal regressor will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,

w.calibrate(X_train, y_train, oob=True)

By providing the option cps=True, each of the above calls will instead generate a ConformalPredictiveSystem, e.g.,

w.calibrate(X_cal, y_cal, de=de, cps=True)

Note

By providing a random seed, e.g., seed=123, the call to calibrate as well as calls to the methods predict_int, predict_cps and evaluate of the WrapRegressor object will be deterministic.

Note

Enabling out-of-bag calibration, i.e., setting oob=True, requires that the wrapped learner has an attribute oob_prediction_, which e.g., is the case for a sklearn.ensemble.RandomForestRegressor, if enabled when created, e.g., RandomForestRegressor(oob_score=True)

Note

The use of out-of-bag calibration, as enabled by oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.

predict_p(X, y=None, t=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#

Return (smoothed or non-smoothed) p-values for provided targets, using fitted ConformalRegressor or ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct labels, used for online calibration if online=True, and used as targets if t=None

  • t (int, float or array-like of shape (n_samples,), default=None) – targets

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

p_values – p_values

Return type:

ndarray of shape (n_samples,)

Examples

Assuming that X_test is a set of test objects, y_test is the set of correct labels and w is a WrapRegressor object that has been calibrated, i.e., calibrate() has been applied, then (smoothed) p-values are obtained by:

p_values = w.predict_p(X_test, y_test)

Given a single or vector of targets t, p-values can be obtained using online calibration by:

p_values = w.predict_p(X_test, y_test, t, online=True)

Note

If a value for seed is given, it will take precedence over any seed value given when calling calibrate.

predict_int(X, y=None, confidence=0.95, y_min=-inf, y_max=inf, seed=None, online=False, warm_start=True)[source]#

Obtain prediction intervals with fitted ConformalRegressor or ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_samples, 2)

Examples

Assuming that X_test is a set of test objects and w is a WrapRegressor object that has been calibrated, i.e., calibrate() has been applied, prediction intervals at the 99% confidence level can be obtained by:

intervals = w.predict_int(X_test, confidence=0.99)

The following provides prediction intervals at the default confidence level (95%), where the intervals are lower-bounded by 0:

intervals = w.predict_int(X_test, y_min=0)

Assuming y_test is a vector containing the correct labels for the test objects, intervals (at the default confidence level) are provided using online calibration by:

intervals = w.predict_int(X_test, y_test, online=True)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.

Note

If a value for seed is given, it will take precedence over any seed value given when calling calibrate.

predict_percentiles(X, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, seed=None, online=False, warm_start=True)[source]#

Obtain percentiles with conformal predictive system.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True

  • lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include

  • y_max (float or int, default=numpy.inf) – The maximum value to include

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

percentiles

Return type:

ndarray of shape (n_values, n_percentiles)

Examples

Assuming that X_test is a set of test objects and cps is a WrapRegressor object that has been calibrated while enabling the generation of a conformal predictive system, i.e., calibrate() has been called with cps=True, percentiles can be obtained by:

percentiles = cps.predict_percentiles(X_test, lower_percentiles=2.5,
                                    higher_percentiles=97.5)

Multiple (lower and higher) percentiles may be requested by: .. code-block:: python

percentiles = cps.predict_percentiles(X_test,

lower_percentiles=[2.5,5], higher_percentiles=[95,97.5])

Assuming y_test is a vector containing the correct labels for the test objects, percentiles are provided using online calibration by:

intervals = cps.predict_percentiles(X_test, y_test,
                                    higher_percentiles=[90,95,99],
                                    online=True)
predict_cpds(X, y=None, seed=None, online=False, warm_start=True)[source]#

Obtain conformal predictive distributions from conformal predictive system.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

cpds – conformal predictive distributions. If online=False and bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the output is a vector of arrays.

Return type:

ndarray of shape (n_values, c_values) or (n_values,)

Examples

Assuming that X_test is a set of test objects and cps is a WrapRegressor object that has been calibrated while enabling the generation of a conformal predictive system, i.e., calibrate() has been called with cps=True, conformal predictive distributions (cpds) can be obtained by:

cpds = cps.predict_cpds(X_test)

Assuming y_test is a vector containing the correct labels for the test objects, cpds can be generated using online calibration by:

cpds = cps.predict_cpds(X_test, y_test, online=True)

Note

The returned array may be very large as its size is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins. For online calibration, the largest element in the vector may be of the same size as the concatenation of the calibration and test sets.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_cps(X, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False, smoothing=True, seed=None)[source]#

Predict using ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features),) – set of objects

  • y (float, int or array-like of shape (n_samples,), default=None) – values for which p-values should be returned

  • lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.

  • y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.

  • return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not

  • cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

  • results (ndarray of shape (n_samples, n_cols) or (n_samples,)) – the shape is (n_samples, n_cols) if n_cols > 1 and otherwise (n_samples,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.

  • cpds (ndarray of (n_samples, c_values), ndarray of (n_samples,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. For non-Mondrian conformal predictive systems, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. For Mondrian conformal predictive systems, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each Mondrian category, if cpds_by_bins = True.

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, w is a WrapRegressor object calibrated with the option cps=True, p-values for the true targets can be obtained by:

p_values = w.predict_cps(X_test, y=y_test)

P-values with respect to some specific value, e.g., 37, can be obtained by:

p_values = w.predict_cps(X_test, y=37)

The 90th and 95th percentiles can be obtained by:

percentiles = w.predict_cps(X_test, higher_percentiles=[90,95])

In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:

percentiles = w.predict_cps(X_test, lower_percentiles=[90,95])

The following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:

intervals = w.predict_cps(X_test,
                          lower_percentiles=2.5,
                          higher_percentiles=97.5,
                          y_min=0)

If we would like to obtain the conformal distributions, we could write the following:

cpds = w.predict_cps(X_test, return_cpds=True)

The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). If the learner is wrapped with a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:

cpds = w.predict_cps(X_test, return_cpds=True, cpds_by_bins=True)

Note

This method is available only if the learner has been wrapped with a ConformalPredictiveSystem, i.e., calibrate() has been called with the option cps=True.

Note

In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be y_min and y_max, respectively.

Note

Setting return_cpds=True may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

Setting cpds_by_bins=True has an effect only for Mondrian conformal predictive systems.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to calibrate.

evaluate(X, y, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, seed=None, online=False, warm_start=True)[source]#

Evaluate ConformalRegressor or ConformalPredictiveSystem.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – set of objects

  • y (array-like of shape (n_samples,)) – correct labels

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings, default=list of all) – metrics; for a learner wrapped with a conformal regressor these are “error”, “eff_mean”,”eff_med”, “ks_test”, “time_fit”, and “time_evaluate”, while if wrapped with a conformal predictive system, the metrics also include “CRPS”.

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “CRPS” is the continuous ranked probability score, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal regressor/predictive system, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that X_test is a set of test objects, y_test is a vector with true targets, and w is a calibrated WrapRegressor object, then the latter can be evaluated at the 90% confidence level with respect to error, mean and median efficiency (interval size) by:

results = w.evaluate(X_test, y_test, confidence=0.9,
                     metrics=["error", "eff_mean", "eff_med"])

Note

The metric CRPS is only available for batch evaluation, i.e., when online=False, and will be ignored if the WrapRegressor object has been calibrated with the (default) option cps=False, i.e., the learner is wrapped with a ConformalRegressor.

Note

The use of the metric CRPS may require a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of categories.

Note

The reported result for time_fit only considers fitting the conformal regressor or predictive system; not for fitting the learner.

Note

If a value for seed is given, it will take precedence over any seed value given when calling calibrate.

class crepes.ConformalClassifier[source]#

A conformal classifier transforms non-conformity scores into p-values or prediction sets for a certain confidence level.

Methods

evaluate(alphas, classes, y[, bins, ...])

Evaluate conformal classifier.

fit(alphas[, bins, seed])

Fit conformal classifier.

predict_p(alphas[, bins, all_classes, ...])

Obtain (smoothed or non-smoothed) p-values from conformal classifier.

predict_p_online(alphas, classes, y[, bins, ...])

Obtain (smoothed or non-smoothed) p-values from conformal classifier, computed using online calibration.

predict_set(alphas[, bins, confidence, ...])

Obtain prediction sets using conformal classifier.

predict_set_online(alphas, classes, y[, ...])

Obtain prediction sets using conformal classifier, computed using online calibration.

fit(alphas, bins=None, seed=None)[source]#

Fit conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples,)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • seed (int, default=None) – set random seed

Returns:

self – Fitted ConformalClassifier.

Return type:

object

Examples

Assuming that alphas_cal is a vector with non-conformity scores, then a standard conformal classifier is formed in the following way:

from crepes import ConformalClassifier

cc_std = ConformalClassifier()

cc_std.fit(alphas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal classifier is fitted in the following way:

cc_mond = ConformalClassifier()
cc_mond.fit(alphas_cal, bins=bins_cal)

Note

By providing a random seed, e.g., seed=123, calls to the methods predict_p, predict_set and evaluate of the ConformalClassifier object will be deterministic.

predict_p(alphas, bins=None, all_classes=True, classes=None, y=None, smoothing=True, seed=None)[source]#

Obtain (smoothed or non-smoothed) p-values from conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • all_classes (bool, default=True) – return p-values for all classes

  • classes (array-like of shape (n_classes,), default=None) – class names, used only if all_classes=False

  • y (array-like of shape (n_samples,), default=None) – correct class labels, used only if all_classes=False

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set and cc_std a fitted standard conformal classifier, then p-values for the test set is obtained by:

p_values = cc_std.predict_p(alphas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides (smoothed) p-values for the test set:

p_values = cc_mond.predict_p(alphas_test, bins=bins_test)

Note

If a value for seed is given, it will take precedence over any seed value given when calling fit.

predict_p_online(alphas, classes, y, bins=None, all_classes=True, smoothing=True, seed=None, warm_start=True)[source]#

Obtain (smoothed or non-smoothed) p-values from conformal classifier, computed using online calibration.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • classes (array-like of shape (n_classes,)) – class names

  • y (array-like of shape (n_samples,)) – correct class labels

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • all_classes (bool, default=True) – return p-values for all classes

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

  • warm_start (bool, default=True) – extend original calibration set

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set, classes is a vector with class names, y_test is a vector with the correct class labels for the test set, and cc_std a fitted standard conformal classifier, then p-values for the test set is obtained by:

p_values = cc_std.predict_p_online(alphas_test, classes, y_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides (smoothed) p-values for the test set:

p_values = cc_mond.predict_p_online(alphas_test, classes, y_test,
                                    bins=bins_test)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_set(alphas, bins=None, confidence=0.95, smoothing=True, seed=None)[source]#

Obtain prediction sets using conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=True) – use smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

prediction sets – prediction sets

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set and cc_std a fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained by:

prediction_sets = cc_std.predict_set(alphas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:

p_values = cc_mond.predict_set(alphas_test,
                               bins=bins_test,
                               confidence=0.9)

Note

The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_set_online(alphas, classes, y, bins=None, confidence=0.95, smoothing=True, seed=None, warm_start=True)[source]#

Obtain prediction sets using conformal classifier, computed using online calibration.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • classes (array-like of shape (n_classes,)) – class names

  • y (array-like of shape (n_samples,)) – correct class labels

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=True) – use smoothed p-values

  • seed (int, default=None) – set random seed

  • warm_start (bool, default=True) – extend original calibration set

Returns:

prediction sets – prediction sets

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that alphas_test is a vector with non-conformity scores for a test set, classes is a vector with class names, y is a vector with the correct class labels for the test set, and cc_std a fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained using online calibration by:

prediction_sets = cc_std.predict_set_online(alphas_test, classes,
                                            y_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cc_mond a fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:

p_values = cc_mond.predict_set_online(alphas_test, classes, y_test,
                                      bins=bins_test, confidence=0.9)

Note

The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

evaluate(alphas, classes, y, bins=None, confidence=0.95, smoothing=True, metrics=None, seed=None, online=False, warm_start=True)[source]#

Evaluate conformal classifier.

Parameters:
  • alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores

  • classes (array-like of shape (n_classes,)) – class names

  • y (array-like of shape (n_samples,)) – correct class labels

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • smoothing (bool, default=True) – use smoothed p-values

  • metrics (a string or a list of strings,) – default = list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “ks_test”, “time_fit”, “time_evaluate”]

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – compute p-values using online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that alphas is an array containing non-conformity scores for all classes for the test objects, classes and y_test are vectors with the class names and true class labels for the test set, respectively, and cc is a fitted standard conformal classifier, then the latter can be evaluated at the default confidence level with respect to error and average number of labels in the prediction sets by:

results = cc.evaluate(alphas, y_test, metrics=["error", "avg_c"])

Note

The use of smoothed p-values increases computation time and typically has a minor effect on the results, except for small calibration sets.

Note

If a value for seed is given, it will take precedence over any seed value given when calling fit.

class crepes.ConformalRegressor[source]#

A conformal regressor transforms point predictions (regression values) into prediction intervals, for a certain confidence level.

Methods

evaluate(y_hat, y[, sigmas, bins, ...])

Evaluate conformal regressor.

fit(residuals[, sigmas, bins])

Fit conformal regressor.

predict_int(y_hat[, sigmas, bins, ...])

Obtain prediction intervals from conformal regressor.

predict_int_online(y_hat, y[, sigmas, bins, ...])

Obtain prediction intervals from conformal regressor, where the intervals are formed using online calibration.

predict_p(y_hat, y[, sigmas, bins, ...])

Obtain (smoothed or non-smoothed) p-values from conformal regressor.

predict_p_online(y_hat, y[, t, sigmas, ...])

Obtain (smoothed or non-smoothed) p-values from conformal regressor, computed using online calibration.

fit(residuals, sigmas=None, bins=None)[source]#

Fit conformal regressor.

Parameters:
  • residuals (array-like of shape (n_values,)) – true values - predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

Returns:

self – Fitted ConformalRegressor.

Return type:

object

Examples

Assuming that y_cal and y_hat_cal are vectors with true and predicted targets for some calibration set, then a standard conformal regressor can be formed from the residuals:

residuals_cal = y_cal - y_hat_cal

from crepes import ConformalRegressor

cr_std = ConformalRegressor()

cr_std.fit(residuals_cal)

Assuming that sigmas_cal is a vector with difficulty estimates, then a normalized conformal regressor can be fitted in the following way:

cr_norm = ConformalRegressor()
cr_norm.fit(residuals_cal, sigmas=sigmas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal regressor can be fitted in the following way:

cr_mond = ConformalRegressor()
cr_mond.fit(residuals_cal, bins=bins_cal)

A normalized Mondrian conformal regressor can be fitted in the following way:

cr_norm_mond = ConformalRegressor()
cr_norm_mond.fit(residuals_cal, sigmas=sigmas_cal,
                 bins=bins_cal)
predict_p(y_hat, y, sigmas=None, bins=None, smoothing=True, seed=None)[source]#

Obtain (smoothed or non-smoothed) p-values from conformal regressor.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples, n_classes)

Examples

Assuming that y_hat and y_test are vectors with predicted and correct labels for a test set and cr_std a fitted standard conformal regressor, then p-values are obtained by:

p_values = cr_std.predict_p(y_hat, y_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cr_mond a fitted Mondrian conformal regressor, then the following provides (smoothed) p-values:

p_values = cr_mond.predict_p(y_hat, y, bins=bins_test)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_p_online(y_hat, y, t=None, sigmas=None, bins=None, smoothing=True, seed=None, warm_start=True)[source]#

Obtain (smoothed or non-smoothed) p-values from conformal regressor, computed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels, used as targets if t=None

  • t (int, float or array-like of shape (n_samples,), default=None) – targets

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

  • warm_start (bool, default=True) – extend original calibration set

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples,)

Examples

Assuming that y_hat and y_test are vectors with predicted and correct labels for a test set and cr_std a fitted standard conformal regressor, then p-values for the correct labels are obtained by online calibration by:

p_values = cr_std.predict_p_online(y_hat, y_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cr_mond a fitted Mondrian conformal regressor, then the following provides (smoothed) p-values:

p_values = cr_mond.predict_p_online(y_hat, y, bins=bins_test)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_int(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#

Obtain prediction intervals from conformal regressor.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_values, 2)

Examples

Assuming that y_hat_test is a vector with predicted targets for a test set and cr_std a fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained by:

intervals = cr_std.predict_int(y_hat_test, confidence=0.99)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cr_norm a fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:

intervals = cr_norm.predict_int(y_hat_test, sigmas=sigmas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cr_mond a fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:

intervals = cr_mond.predict_int(y_hat_test, bins=bins_test,
                                y_min=0)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.

predict_int_online(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, warm_start=True)[source]#

Obtain prediction intervals from conformal regressor, where the intervals are formed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • warm_start (bool, default=True) – extend original calibration set

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_values, 2)

Examples

Assuming that y_hat_test is a vector with predicted targets and y_test is a vector with correct targets for a test set and cr_std is a fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained using online calibration by:

intervals = cr_std.predict_int_online(y_hat_test, y_test,
                                      confidence=0.99)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cr_norm a fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:

intervals = cr_norm.predict_int_online(y_hat_test, y_test,
                                       sigmas=sigmas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cr_mond a fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:

intervals = cr_mond.predict_int_online(y_hat_test, y_test,
                                       bins=bins_test, y_min=0)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.

evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#

Evaluate conformal regressor.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings,) –

    default=list of all metrics, i.e., [“error”, “eff_mean”, “eff_med”, “ks_test”,

    ”time_fit”, “time_evaluate”]

  • smoothing (bool, default=True) – employ smoothed p-values

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal regressor, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets for a test set, sigmas_test and bins_test are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, and cr_norm_mond is a fitted normalized Mondrian conformal regressor, then the latter can be evaluated at the default confidence level with respect to error and mean efficiency (interval size) by:

results = cr_norm_mond.evaluate(y_hat_test, y_test,
                                sigmas=sigmas_test, bins=bins_test,
                                metrics=["error", "eff_mean"])
class crepes.ConformalPredictiveSystem[source]#

A conformal predictive system transforms point predictions (regression values) into cumulative distribution functions (conformal predictive distributions).

Methods

evaluate(y_hat, y[, sigmas, bins, ...])

Evaluate conformal predictive system.

fit(residuals[, sigmas, bins, seed])

Fit conformal predictive system.

predict(y_hat[, sigmas, bins, y, ...])

Predict using conformal predictive system.

predict_cpds(y_hat[, sigmas, bins, cpds_by_bins])

Obtain conformal predictive distributions from conformal predictive system.

predict_cpds_online(y_hat, y[, sigmas, ...])

Obtain conformal predictive distributions from conformal predictive system, computed using online calibration.

predict_int(y_hat[, sigmas, bins, ...])

Obtain prediction intervals from conformal predictive system.

predict_int_online(y_hat, y[, sigmas, bins, ...])

Obtain prediction intervals from conformal predictive system, where the intervals are formed using online calibration.

predict_p(y_hat, y[, sigmas, bins, ...])

Obtain p-values from conformal predictive system.

predict_p_online(y_hat, y[, t, sigmas, ...])

Obtain (smoothed or non-smoothed) p-values from conformal predictive system, computed using online calibration.

predict_percentiles(y_hat[, sigmas, bins, ...])

Obtain percentiles with conformal predictive system.

predict_percentiles_online(y_hat, y[, ...])

Obtain percentiles from conformal predictive system, computed using online calibration.

fit(residuals, sigmas=None, bins=None, seed=None)[source]#

Fit conformal predictive system.

Parameters:
  • residuals (array-like of shape (n_values,)) – actual values - predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • seed (int, default=None) – set random seed

Returns:

self – Fitted ConformalPredictiveSystem.

Return type:

object

Examples

Assuming that y_cal and y_hat_cal are vectors with true and predicted targets for some calibration set, then a standard conformal predictive system can be formed from the residuals:

residuals_cal = y_cal - y_hat_cal

from crepes import ConformalPredictiveSystem

cps_std = ConformalPredictiveSystem()

cps_std.fit(residuals_cal)

Assuming that sigmas_cal is a vector with difficulty estimates, then a normalized conformal predictive system can be fitted in the following way:

cps_norm = ConformalPredictiveSystem()
cps_norm.fit(residuals_cal, sigmas=sigmas_cal)

Assuming that bins_cals is a vector with Mondrian categories (bin labels), then a Mondrian conformal predictive system can be fitted in the following way:

cps_mond = ConformalPredictiveSystem()
cps_mond.fit(residuals_cal, bins=bins_cal)

A normalized Mondrian conformal predictive system can be fitted in the following way:

cps_norm_mond = ConformalPredictiveSystem()
cps_norm_mond.fit(residuals_cal, sigmas=sigmas_cal,
                  bins=bins_cal)

Note

By providing a random seed, e.g., seed=123, calls to the methods predict and evaluate of the ConformalPredictiveSystem object will be deterministic.

predict_p(y_hat, y, sigmas=None, bins=None, smoothing=True, seed=None)[source]#

Obtain p-values from conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (int, float or array-like of shape (n_values,)) – labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

p_values

Return type:

ndarray of shape (n_values,)

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, the p-values for the true targets can be obtained by:

p_values = cps_std.predict(y_hat_test, y=y_test)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

Note

If smoothing is disabled, i.e., smoothing=False, then setting a value for seed has no effect.

predict_p_online(y_hat, y, t=None, sigmas=None, bins=None, smoothing=True, seed=None, warm_start=True)[source]#

Obtain (smoothed or non-smoothed) p-values from conformal predictive system, computed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels, used as targets if t=None

  • t (int, float or array-like of shape (n_samples,), default=None) – targets

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_samples,), default=None) – Mondrian categories

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

  • warm_start (bool, default=True) – extend original calibration set

Returns:

p-values – p-values

Return type:

ndarray of shape (n_samples,)

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, the p-values for the true targets, computed using online calibration, can be obtained by:

p_values = cps_std.predict_p_online(y_hat_test, y_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cps_mond a fitted Mondrian conformal predictive system, then the following provides (smoothed) p-values for the test set:

p_values = cps_mond.predict_p_online(y_hat_test, y_test,
                                     bins=bins_test)

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

Note

If smoothing is disabled, i.e., smoothing=False, then setting a value for seed has no effect.

predict_int(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#

Obtain prediction intervals from conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.

  • y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.

Returns:

intervals

Return type:

ndarray of shape (n_values, 2)

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, the p-values for the true targets can be obtained by:

p_values = cps_std.predict_int(y_hat_test, y=y_test)

Note

In case the calibration set is too small for the specified confidence level, a warning will be issued and the output will be y_min and y_max, respectively.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

predict_int_online(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, warm_start=True)[source]#

Obtain prediction intervals from conformal predictive system, where the intervals are formed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • warm_start (bool, default=True) – extend original calibration set

Returns:

intervals – prediction intervals

Return type:

ndarray of shape (n_values, 2)

Examples

Assuming that y_hat_test is a vector with predicted targets and y_test is a vector with the correct targets for a test set and cps_std a fitted standard conformal predictive system, then prediction intervals at the 99% confidence level can be obtained using online calibration by:

intervals = cps_std.predict_int_online(y_hat_test, y_test,
                                       confidence=0.99)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cps_norm a fitted normalized conformal predictive system, then prediction intervals at the default (95%) confidence level can be obtained using online calibration by:

intervals = cps_norm.predict_int_online(y_hat_test, y_test,
                                        sigmas=sigmas_test)

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cps_mond a fitted Mondrian conformal predictive system, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:

intervals = cps_mond.predict_int_online(y_hat_test, y_test,
                                        bins=bins_test, y_min=0)

Note

In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.

predict_percentiles(y_hat, sigmas=None, bins=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf)[source]#

Obtain percentiles with conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • lower_percentiles (float, int, or array-like of shape (l_values,),) – default=None percentiles for which a lower value will be output in case a percentile lies between two values (equivalent to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (float, int, or array-like of shape (h_values,),) – default=None percentiles for which a higher value will be output in case a percentile lies between two values (equivalent to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include

  • y_max (float or int, default=numpy.inf) – The maximum value to include

Returns:

percentiles – percentiles

Return type:

ndarray of shape (n_values, l_values + h_values)

Examples

Assuming that y_hat_test is a vector with predicted targets for a test set and cps_std a fitted standard conformal predictive system, then percentiles can be obtained by:

p_values = cps_std.predict_percentiles(y_hat_test,
                                       lower_percentiles=2.5,
                                       higher_percentiles=97.5)

Note

In case the calibration set is too small for the specified percentiles level, a warning will be issued and the output will be y_min and y_max, respectively.

predict_percentiles_online(y_hat, y, sigmas=None, bins=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, warm_start=True)[source]#

Obtain percentiles from conformal predictive system, computed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • lower_percentiles (float, int, or array-like of shape (l_values,),) – default=None percentiles for which a lower value will be output in case a percentile lies between two values (equivalent to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (float, int, or array-like of shape (h_values,),) – default=None percentiles for which a higher value will be output in case a percentile lies between two values (equivalent to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include

  • y_max (float or int, default=numpy.inf) – The maximum value to include

  • warm_start (bool, default=True) – extend original calibration set

Returns:

percentiles – percentiles

Return type:

ndarray of shape (n_values, l_values + h_values)

Examples

Assuming that y_hat_test and y_test are vectors with predicted and correct targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, then percentiles computed using online calibration can be obtained by:

p_values = cps_std.predict_percentiles_online(y_hat_test, y_test,
                                              lower_percentiles=2.5,
                                              higher_percentiles=97.5)

Note

In case the calibration set is too small for the specified percentiles level, the output values will be y_min and y_max, respectively.

predict_cpds(y_hat, sigmas=None, bins=None, cpds_by_bins=False)[source]#

Obtain conformal predictive distributions from conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not;

Returns:

cpds – or list of ndarrays conformal predictive distributions. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.

Return type:

ndarray of shape (n_values, c_values) or (n_values,)

Examples

Assuming that y_hat_test is a vector with predicted targets for a test set and cps_std a fitted standard conformal predictive system, conformal predictive distributions (cpds) can be obtained by:

cpds = cps_std.predict_cpds(y_hat_test)

Note

The returned array may be very large as its size is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

Setting cpds_by_bins=True has an effect only for Mondrian conformal predictive systems.

predict_cpds_online(y_hat, y, sigmas=None, bins=None, warm_start=True)[source]#

Obtain conformal predictive distributions from conformal predictive system, computed using online calibration.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • warm_start (bool, default=True) – extend original calibration set

Returns:

cpds – conformal predictive distributions

Return type:

ndarray of shape (n_values,)

Examples

Assuming that y_hat_test and y_test are vectors with predicted and correct targets for a test set and cps_std a fitted standard conformal predictive system, then conformal predictive distributions can be obtained using online calibration by:

cpds = cps_std.predict_cpds_online(y_hat_test, y_test)

Note

The returned vector of vectors may be very large; the largest element may be of the same size as the concatenation of the calibration and test sets.

predict(y_hat, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False, smoothing=True, seed=None)[source]#

Predict using conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • y (float, int or array-like of shape (n_values,), default=None) – values for which p-values should be returned

  • lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)

  • higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)

  • y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.

  • y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.

  • return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not

  • cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True

  • smoothing (bool, default=True) – return smoothed p-values

  • seed (int, default=None) – set random seed

Returns:

  • results (ndarray of shape (n_values, n_cols) or (n_values,)) – the shape is (n_values, n_cols) if n_cols > 1 and otherwise (n_values,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.

  • cpds (ndarray of (n_values, c_values), ndarray of (n_values,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets, respectively, for a test set and cps_std a fitted standard conformal predictive system, the p-values for the true targets can be obtained by:

p_values = cps_std.predict(y_hat_test, y=y_test)

The p-values with respect to some specific value, e.g., 37, can be obtained by:

p_values = cps_std.predict(y_hat_test, y=37)

Assuming that sigmas_test is a vector with difficulty estimates for the test set and cps_norm a fitted normalized conformal predictive system, then the 90th and 95th percentiles can be obtained by:

percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                               higher_percentiles=[90,95])

In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:

percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                               lower_percentiles=[90,95])

Assuming that bins_test is a vector with Mondrian categories (bin labels) for the test set and cps_mond a fitted Mondrian conformal regressor, then the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:

intervals = cps_mond.predict(y_hat_test, bins=bins_test,
                             lower_percentiles=2.5,
                             higher_percentiles=97.5,
                             y_min=0)

If we would like to obtain the conformal distributions, we could write the following:

cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                        return_cpds=True)

The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). For a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:

cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test,
                        return_cpds=True, cpds_by_bins=True)

Note

In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be y_min and y_max, respectively.

Note

Setting return_cpds=True may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

Setting cpds_by_bins=True has an effect only for Mondrian conformal predictive systems.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.

evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#

Evaluate conformal predictive system.

Parameters:
  • y_hat (array-like of shape (n_values,)) – predicted values

  • y (array-like of shape (n_values,)) – correct labels

  • sigmas (array-like of shape (n_values,), default=None) – difficulty estimates

  • bins (array-like of shape (n_values,), default=None) – Mondrian categories

  • confidence (float in range (0,1), default=0.95) – confidence level

  • y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals

  • y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals

  • metrics (a string or a list of strings, default=list of all) – applicable metrics; [“error”, “eff_mean”,”eff_med”, “CRPS”, “ks_test”, “time_fit”, “time_evaluate”]

  • smoothing (bool, default=True) – employ smoothed p-values

  • seed (int, default=None) – set random seed

  • online (bool, default=False) – employ online calibration

  • warm_start (bool, default=True) – extend original calibration set; used only if online=True

Returns:

results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “CRPS” is the continuous ranked probability score, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal predictive system, and “time_evaluate” is the time taken for the evaluation

Return type:

dictionary with a key for each selected metric

Examples

Assuming that y_hat_test and y_test are vectors with predicted and true targets for a test set, sigmas_test and bins_test are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, and cps_norm_mond is a fitted normalized Mondrian conformal predictive system, then the latter can be evaluated at the default confidence level with respect to error, mean and median efficiency (interval size, given the default confidence level) and continuous-ranked probability score (CRPS) by:

results = cps_norm_mond.evaluate(y_hat_test, y_test,
                                 sigmas=sigmas_test, bins=bins_test,
                                 metrics=["error", "eff_mean",
                                          "eff_med", "CRPS"])

Note

The use of the metric CRPS may require a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.

Note

The metric CRPS is only available for batch evaluation, i.e., when online=False.

Note

If a value for seed is given, it will take precedence over any seed value given in the call to fit.