The crepes package#
- class crepes.WrapClassifier(learner)[source]#
A learner wrapped with a
ConformalClassifier
.Methods
calibrate
(X, y[, bins, oob, class_cond, nc])Fit a
ConformalClassifier
using learner.evaluate
(X, y[, bins, confidence, ...])Evaluate
ConformalClassifier
.fit
(X, y, **kwargs)Fit learner.
predict
(X)Predict with learner.
predict_p
(X[, bins])Obtain (smoothed) p-values using conformal classifier.
Predict with learner.
predict_set
(X[, bins, confidence, smoothing])Obtain prediction sets using conformal classifier.
- fit(X, y, **kwargs)[source]#
Fit learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – target values
kwargs (optional arguments) – any additional arguments are forwarded to the
fit
method of thelearner
object
- Return type:
None
Examples
Assuming
X_train
andy_train
to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:from sklearn.ensemble import RandomForestClassifier from crepes import WrapClassifier rf = Wrap(RandomForestClassifier()) rf.fit(X_train, y_train)
Note
The learner, which can be accessed by
rf.learner
, may be fitted before as well as after being wrapped.Note
All arguments, including any additional keyword arguments, to
fit()
are forwarded to thefit
method of the learner.
- predict(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – values predicted using the
predict
method of thelearner
object.- Return type:
array-like of shape (n_samples,),
Examples
Assuming
w
is aWrapClassifier
object for which the wrapped learnerw.learner
has been fitted, (point) predictions of the learner can be obtained for a set of test objectsX_test
by:y_hat = w.predict(X_test)
The above is equivalent to:
y_hat = w.learner.predict(X_test)
- predict_proba(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – predicted probabilities using the
predict_proba
method of thelearner
object.- Return type:
array-like of shape (n_samples, n_classes),
Examples
Assuming
w
is aWrapClassifier
object for which the wrapped learnerw.learner
has been fitted, predicted probabilities of the learner can be obtained for a set of test objectsX_test
by:probabilities = w.predict_proba(X_test)
The above is equivalent to:
probabilities = w.learner.predict_proba(X_test)
- calibrate(X, y, bins=None, oob=False, class_cond=False, nc=<function hinge>)[source]#
Fit a
ConformalClassifier
using learner.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – target values
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
oob (bool, default=False) – use out-of-bag estimation
class_cond (bool, default=False) – if class_cond=True, the method fits a Mondrian
ConformalClassifier
using the class labels as categoriesnc (function, default =
crepes.extras.hinge()
) – function to compute non-conformity scores
- Returns:
self – Wrap object updated with a fitted
ConformalClassifier
- Return type:
object
Examples
Assuming
X_cal
andy_cal
to be an array and vector, respectively, with objects and labels for the calibration set, andw
is aWrapClassifier
object for which the learner has been fitted, a standard conformal classifier can be formed by:w.calibrate(X_cal, y_cal)
Assuming that
bins_cals
is a vector with Mondrian categories (bin labels), a Mondrian conformal classifier can be generated by:w.calibrate(X_cal, y_cal, bins=bins_cal)
By providing the option
oob=True
, the conformal classifier will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train
) and labels (y_train
) to be used, e.g.,w.calibrate(X_train, y_train, oob=True)
By providing the option
class_cond=True
, a Mondrian conformal classifier will be formed using the class labels as categories, e.g.,w.calibrate(X_cal, y_cal, class_cond=True)
Note
Any Mondrian categories provided with the
bins
argument will be ignored bycalibrate()
, ifclass_cond=True
, as the latter implies that Mondrian categories are formed using the labels iny
.Note
Enabling out-of-bag calibration, i.e., setting
oob=True
, requires that the wrapped learner has an attributeoob_decision_function_
, which e.g., is the case for asklearn.ensemble.RandomForestClassifier
, if enabled when created, e.g.,RandomForestClassifier(oob_score=True)
Note
The use of out-of-bag calibration, as enabled by
oob=True
, does not come with the theoretical validity guarantees of the regular (inductive) conformal classifiers, due to that calibration and test instances are not handled in exactly the same way.
- predict_p(X, bins=None)[source]#
Obtain (smoothed) p-values using conformal classifier.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
X_test
is a set of test objects andw
is aWrapClassifier
object that has been calibrated, i.e.,calibrate()
has been applied, the p-values for the test objects are obtained by:p_values = w.predict_p(X_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andw
is aWrapClassifier
object that has been calibrated with bins, the following provides p-values for the test set:p_values = w.predict_p(X_test, bins=bins_test)
- predict_set(X, bins=None, confidence=0.95, smoothing=False)[source]#
Obtain prediction sets using conformal classifier.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=False) – use smoothed p-values
- Returns:
prediction sets – prediction sets, where the value 1 (0) indicates that the class label is included (excluded), i.e., the corresponding p-value is less than 1-confidence
- Return type:
ndarray of shape (n_values, n_classes)
Examples
Assuming that
X_test
is a set of test objects andw
is aWrapClassifier
object that has been calibrated, i.e.,calibrate()
has been applied, the prediction sets for the test objects at the default confidence level (95%) are obtained by:prediction_sets = w.predict_set(X_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andw
is aWrapClassifier
object that has been calibrated with bins, the following provides prediction sets at the 99% confidence level:prediction_sets = w.predict_set(X_test, bins=bins_test, confidence=0.99)
Note
Using smoothed p-values substantially increases computation time and hardly has any effect on the predictions sets, except for when having small calibration sets.
- evaluate(X, y, bins=None, confidence=0.95, smoothing=False, metrics=None)[source]#
Evaluate
ConformalClassifier
.- Parameters:
X (array-like of shape (n_samples, n_features)) – set of objects
y (array-like of shape (n_samples,)) – correct target values
bins (array-like of shape (n_samples,), default=None,) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=False) – use smoothed p-values
metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “time_fit”, “time_evaluate”]
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
X_test
is a set of test objects,y_test
is a vector with true targets,bins_test
is a vector with Mondrian categories (bin labels) for the test set, andw
is a calibratedWrapClassifier
object, then the latter can be evaluated at the 90% confidence level with respect to error, average prediction set size and fraction of singleton predictions by:results = w.evaluate(X_test, y_test, bins=bins_test, confidence=0.9, metrics=["error", "avg_c", "one_c"])
Note
The reported result for
time_fit
only considers fitting the conformal regressor or predictive system; not for fitting the learner.Note
Using smoothed p-values substantially increases computation time and hardly has any effect on the results, except for when having small calibration sets.
- class crepes.WrapRegressor(learner)[source]#
A learner wrapped with a
ConformalRegressor
orConformalPredictiveSystem
.Methods
calibrate
(X, y[, sigmas, bins, oob, cps])Fit a
ConformalRegressor
orConformalPredictiveSystem
using learner.evaluate
(X, y[, sigmas, bins, confidence, ...])Evaluate
ConformalRegressor
orConformalPredictiveSystem
.fit
(X, y, **kwargs)Fit learner.
predict
(X)Predict with learner.
predict_cps
(X[, sigmas, bins, y, ...])Predict using
ConformalPredictiveSystem
.predict_int
(X[, sigmas, bins, confidence, ...])Predict interval using fitted
ConformalRegressor
orConformalPredictiveSystem
.- fit(X, y, **kwargs)[source]#
Fit learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – target values
kwargs (optional arguments) – any additional arguments are forwarded to the
fit
method of thelearner
object
- Return type:
None
Examples
Assuming
X_train
andy_train
to be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:from sklearn.ensemble import RandomForestRegressor from crepes import WrapRegressor rf = WrapRegressor(RandomForestRegressor()) rf.fit(X_train, y_train)
Note
The learner, which can be accessed by
rf.learner
, may be fitted before as well as after being wrapped.Note
All arguments, including any additional keyword arguments, to
fit()
are forwarded to thefit
method of the learner.
- predict(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – values predicted using the
predict
method of thelearner
object.- Return type:
array-like of shape (n_samples,),
Examples
Assuming
w
is aWrapRegressor
object for which the wrapped learnerw.learner
has been fitted, (point) predictions of the learner can be obtained for a set of test objectsX_test
by:y_hat = w.predict(X_test)
The above is equivalent to:
y_hat = w.learner.predict(X_test)
- calibrate(X, y, sigmas=None, bins=None, oob=False, cps=False)[source]#
Fit a
ConformalRegressor
orConformalPredictiveSystem
using learner.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – target values
sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
oob (bool, default=False) – use out-of-bag estimation
cps (bool, default=False) – if cps=False, the method fits a
ConformalRegressor
and otherwise, aConformalPredictiveSystem
- Returns:
self – The
WrapRegressor
object is updated with a fittedConformalRegressor
orConformalPredictiveSystem
- Return type:
object
Examples
Assuming
X_cal
andy_cal
to be an array and vector, respectively, with objects and labels for the calibration set, andw
is aWrapRegressor
object for which the learner has been fitted, a standard conformal regressor is formed by:w.calibrate(X_cal, y_cal)
Assuming that
sigmas_cal
is a vector with difficulty estimates, a normalized conformal regressor is obtained by:w.calibrate(X_cal, y_cal, sigmas=sigmas_cal)
Assuming that
bins_cals
is a vector with Mondrian categories (bin labels), a Mondrian conformal regressor is obtained by:w.calibrate(X_cal, y_cal, bins=bins_cal)
A normalized Mondrian conformal regressor is generated in the following way:
w.calibrate(X_cal, y_cal, sigmas=sigmas_cal, bins=bins_cal)
By providing the option
oob=True
, the conformal regressor will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train
) and labels (y_train
) to be used, e.g.,w.calibrate(X_train, y_train, oob=True)
By providing the option
cps=True
, each of the above calls will instead generate aConformalPredictiveSystem
, e.g.,w.calibrate(X_cal, y_cal, sigmas=sigmas_cal, cps=True)
Note
Enabling out-of-bag calibration, i.e., setting
oob=True
, requires that the wrapped learner has an attributeoob_prediction_
, which e.g., is the case for asklearn.ensemble.RandomForestRegressor
, if enabled when created, e.g.,RandomForestRegressor(oob_score=True)
Note
The use of out-of-bag calibration, as enabled by
oob=True
, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.
- predict_int(X, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#
Predict interval using fitted
ConformalRegressor
orConformalPredictiveSystem
.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_samples, 2)
Examples
Assuming that
X_test
is a set of test objects andw
is aWrapRegressor
object that has been calibrated, i.e.,calibrate()
has been applied, prediction intervals at the 99% confidence level can be obtained by:intervals = w.predict_int(X_test, confidence=0.99)
Assuming that
sigmas_test
is a vector with difficulty estimates for the test set andw
is aWrapRegressor
object that has been calibrated with both residuals and difficulty estimates, prediction intervals at the default (95%) confidence level can be obtained by:intervals = w.predict_int(X_test, sigmas=sigmas_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andw
is aWrapRegressor
object that has been calibrated with both residuals and bins, the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:intervals = w.predict_int(X_test, bins=bins_test, y_min=0)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.
Note
Note that
sigmas
andbins
will be ignored bypredict_int()
, if theWrapRegressor
object has been calibrated without specifying any such values.Note
Note that an error will be reported if
sigmas
andbins
are not provided topredict_int()
, if theWrapRegressor
object has been calibrated with such values.
- predict_cps(X, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False)[source]#
Predict using
ConformalPredictiveSystem
.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
sigmas (array-like of shape (n_samples,), default=None) – difficulty estimates
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
y (float, int or array-like of shape (n_samples,), default=None) – values for which p-values should be returned
lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)
higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.
y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.
return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not
cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True
- Returns:
results (ndarray of shape (n_samples, n_cols) or (n_samples,)) – the shape is (n_samples, n_cols) if n_cols > 1 and otherwise (n_samples,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.
cpds (ndarray of (n_samples, c_values), ndarray of (n_samples,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.
Examples
Assuming that
X_test
is a set of test objects,y_test
is a vector with true targets,w
is aWrapRegressor
object calibrated with the optioncps=True
, p-values for the true targets can be obtained by:p_values = w.predict_cps(X_test, y=y_test)
P-values with respect to some specific value, e.g., 37, can be obtained by:
p_values = w.predict_cps(X_test, y=37)
Assuming that
sigmas_test
is a vector with difficulty estimates for the test set andw
has been calibrated with such estimates, the 90th and 95th percentiles can be obtained by:percentiles = w.predict_cps(X_test, sigmas=sigmas_test, higher_percentiles=[90,95])
In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:
percentiles = w.predict_cps(X_test, sigmas=sigmas_test, lower_percentiles=[90,95])
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andw
has been calibrated with bins, the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:intervals = w.predict_cps(X_test, bins=bins_test, lower_percentiles=2.5, higher_percentiles=97.5, y_min=0)
If we would like to obtain the conformal distributions, we could write the following:
cpds = w.predict_cps(X_test, sigmas=sigmas_test, return_cpds=True)
The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). If the learner is wrapped with a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:
cpds = w.predict_cps(X_test, sigmas=sigmas_test, return_cpds=True, cpds_by_bins=True)
Note
This method is available only if the learner has been wrapped with a
ConformalPredictiveSystem
, i.e.,calibrate()
has been called with the optioncps=True
.Note
In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be
y_min
andy_max
, respectively.Note
Setting
return_cpds=True
may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
Setting
cpds_by_bins=True
has an effect only for Mondrian conformal predictive systems.
- evaluate(X, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#
Evaluate
ConformalRegressor
orConformalPredictiveSystem
.- Parameters:
X (array-like of shape (n_samples, n_features)) – set of objects
y (array-like of shape (n_samples,)) – correct target values
sigmas (array-like of shape (n_samples,), default=None,) – difficulty estimates
bins (array-like of shape (n_samples,), default=None,) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings, default=list of all) – metrics; for a learner wrapped with a conformal regressor these are “error”, “eff_mean”,”eff_med”, “time_fit”, and “time_evaluate”, while if wrapped with a conformal predictive system, the metrics also include “CRPS”.
- Returns:
results – estimated performance using the metrics
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
X_test
is a set of test objects,y_test
is a vector with true targets,sigmas_test
andbins_test
are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, andw
is a calibratedWrapRegressor
object, then the latter can be evaluated at the 90% confidence level with respect to error, mean and median efficiency (interval size) by:results = w.evaluate(X_test, y_test, sigmas=sigmas_test, bins=bins_test, confidence=0.9, metrics=["error", "eff_mean", "eff_med"])
Note
If included in the list of metrics, “CRPS” (continuous-ranked probability score) will be ignored if the
WrapRegressor
object has been calibrated with the (default) optioncps=False
, i.e., the learner is wrapped with aConformalRegressor
.Note
The use of the metric
CRPS
may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
The reported result for
time_fit
only considers fitting the conformal regressor or predictive system; not for fitting the learner.
- class crepes.ConformalClassifier[source]#
A conformal classifier transforms non-conformity scores into p-values or prediction sets for a certain confidence level.
Methods
evaluate
(alphas, classes, y[, bins, ...])Evaluate conformal classifier.
fit
(alphas[, bins])Fit conformal classifier.
predict_p
(alphas[, bins, confidence])Obtain (smoothed) p-values from conformal classifier.
predict_set
(alphas[, bins, confidence, ...])Obtain prediction sets using conformal classifier.
- fit(alphas, bins=None)[source]#
Fit conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples,)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
- Returns:
self – Fitted ConformalClassifier.
- Return type:
object
Examples
Assuming that
alphas_cal
is a vector with non-conformity scores, then a standard conformal classifier is formed in the following way:from crepes import ConformalClassifier cc_std = ConformalClassifier() cc_std.fit(alphas_cal)
Assuming that
bins_cals
is a vector with Mondrian categories (bin labels), then a Mondrian conformal classifier is fitted in the following way:cc_mond = ConformalClassifier() cc_mond.fit(alphas_cal, bins=bins_cal)
- predict_p(alphas, bins=None, confidence=0.95)[source]#
Obtain (smoothed) p-values from conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_test
is a vector with non-conformity scores for a test set andcc_std
a fitted standard conformal classifier, then p-values for the test is obtained by:p_values = cc_std.predict_p(alphas_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andcc_mond
a fitted Mondrian conformal classifier, then the following provides p-values for the test set:p_values = cc_mond.predict_p(alphas_test, bins=bins_test)
- predict_set(alphas, bins=None, confidence=0.95, smoothing=False)[source]#
Obtain prediction sets using conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=False) – use smoothed p-values
- Returns:
prediction sets – prediction sets
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_test
is a vector with non-conformity scores for a test set andcc_std
a fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained by:prediction_sets = cc_std.predict_set(alphas_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andcc_mond
a fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:p_values = cc_mond.predict_set(alphas_test, bins=bins_test, confidence=0.9)
Note
Using smoothed p-values substantially increases computation time and hardly has any effect on the predictions sets, except for when having small calibration sets.
- evaluate(alphas, classes, y, bins=None, confidence=0.95, smoothing=False, metrics=None)[source]#
Evaluate conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
classes (array-like of shape (n_classes,)) – class names
y (array-like of shape (n_samples,)) – correct class labels
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=False) – use smoothed p-values
metrics (a string or a list of strings,) – default = list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “time_fit”, “time_evaluate”]
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
alphas
is an array containing non-conformity scores for all classes for the test objects,classes
andy_test
are vectors with the class names and true class labels for the test set, respectively, andcc
is a fitted standard conformal classifier, then the latter can be evaluated at the default confidence level with respect to error and average number of labels in the prediction sets by:results = cc.evaluate(alphas, y_test, metrics=["error", "avg_c"])
Note
Using smoothed p-values substantially increases computation time and hardly has any effect on the results, except for when having small calibration sets.
- class crepes.ConformalRegressor[source]#
A conformal regressor transforms point predictions (regression values) into prediction intervals, for a certain confidence level.
Methods
evaluate
(y_hat, y[, sigmas, bins, ...])Evaluate conformal regressor.
fit
(residuals[, sigmas, bins])Fit conformal regressor.
predict
(y_hat[, sigmas, bins, confidence, ...])Predict using conformal regressor.
- fit(residuals, sigmas=None, bins=None)[source]#
Fit conformal regressor.
- Parameters:
residuals (array-like of shape (n_values,)) – true values - predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
- Returns:
self – Fitted ConformalRegressor.
- Return type:
object
Examples
Assuming that
y_cal
andy_hat_cal
are vectors with true and predicted targets for some calibration set, then a standard conformal regressor can be formed from the residuals:residuals_cal = y_cal - y_hat_cal from crepes import ConformalRegressor cr_std = ConformalRegressor() cr_std.fit(residuals_cal)
Assuming that
sigmas_cal
is a vector with difficulty estimates, then a normalized conformal regressor can be fitted in the following way:cr_norm = ConformalRegressor() cr_norm.fit(residuals_cal, sigmas=sigmas_cal)
Assuming that
bins_cals
is a vector with Mondrian categories (bin labels), then a Mondrian conformal regressor can be fitted in the following way:cr_mond = ConformalRegressor() cr_mond.fit(residuals_cal, bins=bins_cal)
A normalized Mondrian conformal regressor can be fitted in the following way:
cr_norm_mond = ConformalRegressor() cr_norm_mond.fit(residuals_cal, sigmas=sigmas_cal, bins=bins_cal)
- predict(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#
Predict using conformal regressor.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_values, 2)
Examples
Assuming that
y_hat_test
is a vector with predicted targets for a test set andcr_std
a fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained by:intervals = cr_std.predict(y_hat_test, confidence=0.99)
Assuming that
sigmas_test
is a vector with difficulty estimates for the test set andcr_norm
a fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:intervals = cr_norm.predict(y_hat_test, sigmas=sigmas_test)
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andcr_mond
a fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:intervals = cr_mond.predict(y_hat_test, bins=bins_test, y_min=0)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.
- evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#
Evaluate conformal regressor.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct target values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “eff_mean”, “eff_med”, “time_fit”, “time_evaluate”]
- Returns:
results – estimated performance using the metrics
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
y_hat_test
andy_test
are vectors with predicted and true targets for a test set,sigmas_test
andbins_test
are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, andcr_norm_mond
is a fitted normalized Mondrian conformal regressor, then the latter can be evaluated at the default confidence level with respect to error and mean efficiency (interval size) by:results = cr_norm_mond.evaluate(y_hat_test, y_test, sigmas=sigmas_test, bins=bins_test, metrics=["error", "eff_mean"])
- class crepes.ConformalPredictiveSystem[source]#
A conformal predictive system transforms point predictions (regression values) into cumulative distribution functions (conformal predictive distributions).
Methods
evaluate
(y_hat, y[, sigmas, bins, ...])Evaluate conformal predictive system.
fit
(residuals[, sigmas, bins])Fit conformal predictive system.
predict
(y_hat[, sigmas, bins, y, ...])Predict using conformal predictive system.
- fit(residuals, sigmas=None, bins=None)[source]#
Fit conformal predictive system.
- Parameters:
residuals (array-like of shape (n_values,)) – actual values - predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
- Returns:
self – Fitted ConformalPredictiveSystem.
- Return type:
object
Examples
Assuming that
y_cal
andy_hat_cal
are vectors with true and predicted targets for some calibration set, then a standard conformal predictive system can be formed from the residuals:residuals_cal = y_cal - y_hat_cal from crepes import ConformalPredictiveSystem cps_std = ConformalPredictiveSystem() cps_std.fit(residuals_cal)
Assuming that
sigmas_cal
is a vector with difficulty estimates, then a normalized conformal predictive system can be fitted in the following way:cps_norm = ConformalPredictiveSystem() cps_norm.fit(residuals_cal, sigmas=sigmas_cal)
Assuming that
bins_cals
is a vector with Mondrian categories (bin labels), then a Mondrian conformal predictive system can be fitted in the following way:cps_mond = ConformalPredictiveSystem() cps_mond.fit(residuals_cal, bins=bins_cal)
A normalized Mondrian conformal predictive system can be fitted in the following way:
cps_norm_mond = ConformalPredictiveSystem() cps_norm_mond.fit(residuals_cal, sigmas=sigmas_cal, bins=bins_cal)
- predict(y_hat, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False)[source]#
Predict using conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
y (float, int or array-like of shape (n_values,), default=None) – values for which p-values should be returned
lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)
higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.
y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.
return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not
cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True
- Returns:
results (ndarray of shape (n_values, n_cols) or (n_values,)) – the shape is (n_values, n_cols) if n_cols > 1 and otherwise (n_values,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.
cpds (ndarray of (n_values, c_values), ndarray of (n_values,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.
Examples
Assuming that
y_hat_test
andy_test
are vectors with predicted and true targets, respectively, for a test set andcps_std
a fitted standard conformal predictive system, the p-values for the true targets can be obtained by:p_values = cps_std.predict(y_hat_test, y=y_test)
The p-values with respect to some specific value, e.g., 37, can be obtained by:
p_values = cps_std.predict(y_hat_test, y=37)
Assuming that
sigmas_test
is a vector with difficulty estimates for the test set andcps_norm
a fitted normalized conformal predictive system, then the 90th and 95th percentiles can be obtained by:percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test, higher_percentiles=[90,95])
In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:
percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test, lower_percentiles=[90,95])
Assuming that
bins_test
is a vector with Mondrian categories (bin labels) for the test set andcps_mond
a fitted Mondrian conformal regressor, then the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:intervals = cps_mond.predict(y_hat_test, bins=bins_test, lower_percentiles=2.5, higher_percentiles=97.5, y_min=0)
If we would like to obtain the conformal distributions, we could write the following:
cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test, return_cpds=True)
The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). For a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:
cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test, return_cpds=True, cpds_by_bins=True)
Note
In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be
y_min
andy_max
, respectively.Note
Setting
return_cpds=True
may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
Setting
cpds_by_bins=True
has an effect only for Mondrian conformal predictive systems.
- evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None)[source]#
Evaluate conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct target values
sigmas (array-like of shape (n_values,), default=None,) – difficulty estimates
bins (array-like of shape (n_values,), default=None,) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings, default=list of all) –
- metrics; [“error”, “eff_mean”,”eff_med”, “CRPS”, “time_fit”,
”time_evaluate”]
- Returns:
results – estimated performance using the metrics
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
y_hat_test
andy_test
are vectors with predicted and true targets for a test set,sigmas_test
andbins_test
are vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, andcps_norm_mond
is a fitted normalized Mondrian conformal predictive system, then the latter can be evaluated at the default confidence level with respect to error, mean and median efficiency (interval size, given the default confidence level) and continuous-ranked probability score (CRPS) by:results = cps_norm_mond.evaluate(y_hat_test, y_test, sigmas=sigmas_test, bins=bins_test, metrics=["error", "eff_mean", "eff_med", "CRPS"])
Note
The use of the metric
CRPS
may consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.