The crepes package#
- class crepes.WrapClassifier(learner)[source]#
A learner wrapped with a
ConformalClassifier.Methods
calibrate([X, y, oob, class_cond, nc, mc, seed])Fit a
ConformalClassifierusing the wrapped learner.evaluate(X, y[, confidence, smoothing, ...])Evaluate the conformal classifier.
fit(X, y, **kwargs)Fit learner.
predict(X)Predict with learner.
predict_p(X[, y, all_classes, smoothing, ...])Obtain (smoothed or non-smoothed) p-values using conformal classifier.
Predict with learner.
predict_set(X[, y, confidence, smoothing, ...])Obtain prediction sets using conformal classifier.
- fit(X, y, **kwargs)[source]#
Fit learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – labels
kwargs (optional arguments) – any additional arguments are forwarded to the
fitmethod of thelearnerobject
- Return type:
None
Examples
Assuming
X_trainandy_trainto be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:from sklearn.ensemble import RandomForestClassifier from crepes import WrapClassifier rf = Wrap(RandomForestClassifier()) rf.fit(X_train, y_train)
Note
The learner, which can be accessed by
rf.learner, may be fitted before as well as after being wrapped.Note
All arguments, including any additional keyword arguments, to
fit()are forwarded to thefitmethod of the learner.
- predict(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – values predicted using the
predictmethod of thelearnerobject.- Return type:
array-like of shape (n_samples,),
Examples
Assuming
wis aWrapClassifierobject for which the wrapped learnerw.learnerhas been fitted, (point) predictions of the learner can be obtained for a set of test objectsX_testby:y_hat = w.predict(X_test)
The above is equivalent to:
y_hat = w.learner.predict(X_test)
- predict_proba(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – predicted probabilities using the
predict_probamethod of thelearnerobject.- Return type:
array-like of shape (n_samples, n_classes),
Examples
Assuming
wis aWrapClassifierobject for which the wrapped learnerw.learnerhas been fitted, predicted probabilities of the learner can be obtained for a set of test objectsX_testby:probabilities = w.predict_proba(X_test)
The above is equivalent to:
probabilities = w.learner.predict_proba(X_test)
- calibrate(X=[], y=[], oob=False, class_cond=False, nc=<function hinge>, mc=None, seed=None)[source]#
Fit a
ConformalClassifierusing the wrapped learner.- Parameters:
X (array-like of shape (n_samples, n_features), default=[]) – set of objects
y (array-like of shape (n_samples,), default=[]) – labels
oob (bool, default=False) – use out-of-bag estimation
class_cond (bool, default=False) – if class_cond=True, the method fits a Mondrian
ConformalClassifierusing the class labels as categoriesnc (function, default =
crepes.extras.hinge()) – function to compute non-conformity scoresmc (function or
crepes.extras.MondrianCategorizer, default=None) – function orcrepes.extras.MondrianCategorizerfor computing Mondrian categoriesseed (int, default=None) – set random seed
- Returns:
self – Wrap object updated with a fitted
ConformalClassifier- Return type:
object
Examples
Assuming
X_calandy_calto be an array and vector, respectively, with objects and labels for the calibration set, andwis aWrapClassifierobject for which the learner has been fitted, a standard conformal classifier can be formed by:w.calibrate(X_cal, y_cal)
Assuming that
get_categoriesis a function that returns a vector of Mondrian categories (bin labels), a Mondrian conformal classifier can be generated by:w.calibrate(X_cal, y_cal, mc=get_categories)
By providing the option
oob=True, the conformal classifier will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,w.calibrate(X_train, y_train, oob=True)
By providing the option
class_cond=True, a Mondrian conformal classifier will be formed using the class labels as categories, e.g.,w.calibrate(X_cal, y_cal, class_cond=True)
Note
Any Mondrian categorizer specified by the
mcargument will be ignored bycalibrate(), ifclass_cond=True, as the latter implies that Mondrian categories are formed using the labels iny.Note
By providing a random seed, e.g.,
seed=123, the call tocalibrateas well as calls to the methodspredict_set,predict_pandevaluateof theWrapClassifierobject will be deterministic.Note
Enabling out-of-bag calibration, i.e., setting
oob=True, requires that the wrapped learner has an attributeoob_decision_function_, which e.g., as for asklearn.ensemble.RandomForestClassifier, if enabled when created, e.g.,RandomForestClassifier(oob_score=True)Note
The use of out-of-bag calibration, as enabled by
oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal classifiers, due to that calibration and test instances are not handled in exactly the same way.
- predict_p(X, y=None, all_classes=True, smoothing=True, seed=None, online=False, warm_start=True)[source]#
Obtain (smoothed or non-smoothed) p-values using conformal classifier.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct class labels; used only if online=True or all_classes=False
all_classes (bool, default=True) – return p-values for all classes
smoothing (bool, default=True) – use smoothed p-values
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
X_testis a set of test objects andwis aWrapClassifierobject that has been calibrated, i.e.,calibrate()has been applied, the (smoothed) p-values for the test objects are obtained by:p_values = w.predict_p(X_test)
Assuming that
y_testa vector of correct labels for the test objects, then p-values for the test objects are obtained using online calibration by:p_values = w.predict_p(X_test, y_test, online=True)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tocalibrate.
- predict_set(X, y=None, confidence=0.95, smoothing=True, seed=None, online=False, warm_start=True)[source]#
Obtain prediction sets using conformal classifier.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct class labels; used only if online=True
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=True) – use smoothed p-values
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
prediction sets – prediction sets, where the value 1 (0) indicates that the class label is included (excluded), i.e., the corresponding p-value is less than 1-confidence
- Return type:
ndarray of shape (n_values, n_classes)
Examples
Assuming that
X_testis a set of test objects andwis aWrapClassifierobject that has been calibrated, i.e.,calibrate()has been applied, then prediction sets for the test objects at the 99% confidence level are obtained by:prediction_sets = w.predict_set(X_test, confidence=0.99)
Assuming that
y_testa vector of correct labels for the test objects, then prediction sets for the test objects at the default (95%) confidence level are obtained using online calibration by:prediction_sets = w.predict_set(X_test, y_test, online=True)
Note
The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tocalibrate.
- evaluate(X, y, confidence=0.95, smoothing=True, metrics=None, seed=None, online=False, warm_start=True)[source]#
Evaluate the conformal classifier.
- Parameters:
X (array-like of shape (n_samples, n_features)) – set of objects
y (array-like of shape (n_samples,)) – correct labels
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=True) – use smoothed p-values
metrics (a string or a list of strings,) – default=list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “ks_test”, “time_fit”, “time_evaluate”]
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
X_testis a set of test objects,y_testis a vector with true targets, andwis a calibratedWrapClassifierobject, then the latter can be evaluated at the 90% confidence level with respect to error, average prediction set size and fraction of singleton predictions by:results = w.evaluate(X_test, y_test, confidence=0.9, metrics=["error", "avg_c", "one_c"])
Note
The reported result for
time_fitonly considers fitting the conformal regressor or predictive system; not for fitting the learner.Note
The use of smoothed p-values increases computation time and typically has a minor effect on the results, except for small calibration sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tocalibrate.
- class crepes.WrapRegressor(learner)[source]#
A learner wrapped with a
ConformalRegressororConformalPredictiveSystem.Methods
calibrate([X, y, de, mc, oob, cps, seed])Fit a
ConformalRegressororConformalPredictiveSystemusing the wrapped learner.evaluate(X, y[, confidence, y_min, y_max, ...])Evaluate
ConformalRegressororConformalPredictiveSystem.fit(X, y, **kwargs)Fit learner.
predict(X)Predict with learner.
predict_cpds(X[, y, seed, online, warm_start])Obtain conformal predictive distributions from conformal predictive system.
predict_cps(X[, y, lower_percentiles, ...])Predict using
ConformalPredictiveSystem.predict_int(X[, y, confidence, y_min, ...])Obtain prediction intervals with fitted
ConformalRegressororConformalPredictiveSystem.predict_p(X[, y, t, smoothing, seed, ...])Return (smoothed or non-smoothed) p-values for provided targets, using fitted
ConformalRegressororConformalPredictiveSystem.predict_percentiles(X[, y, ...])Obtain percentiles with conformal predictive system.
- fit(X, y, **kwargs)[source]#
Fit learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,),) – labels
kwargs (optional arguments) – any additional arguments are forwarded to the
fitmethod of thelearnerobject
- Return type:
None
Examples
Assuming
X_trainandy_trainto be an array and vector with training objects and labels, respectively, a random forest may be wrapped and fitted by:from sklearn.ensemble import RandomForestRegressor from crepes import WrapRegressor rf = WrapRegressor(RandomForestRegressor()) rf.fit(X_train, y_train)
Note
The learner, which can be accessed by
rf.learner, may be fitted before as well as after being wrapped.Note
All arguments, including any additional keyword arguments, to
fit()are forwarded to thefitmethod of the learner.
- predict(X)[source]#
Predict with learner.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
- Returns:
y – values predicted using the
predictmethod of thelearnerobject.- Return type:
array-like of shape (n_samples,),
Examples
Assuming
wis aWrapRegressorobject for which the wrapped learnerw.learnerhas been fitted, (point) predictions of the learner can be obtained for a set of test objectsX_testby:y_hat = w.predict(X_test)
The above is equivalent to:
y_hat = w.learner.predict(X_test)
- calibrate(X=[], y=[], de=None, mc=None, oob=False, cps=False, seed=None)[source]#
Fit a
ConformalRegressororConformalPredictiveSystemusing the wrapped learner.- Parameters:
X (array-like of shape (n_samples, n_features), default=[]) – set of objects
y (array-like of shape (n_samples,), default=[]) – labels
de (
crepes.extras.DifficultyEstimator, default=None) – object used for computing difficulty estimatesmc (function or
crepes.extras.MondrianCategorizer, default=None) – function orcrepes.extras.MondrianCategorizerfor computing Mondrian categoriesoob (bool, default=False) – use out-of-bag estimation
cps (bool, default=False) – if cps=False, the method fits a
ConformalRegressorand otherwise, aConformalPredictiveSystemseed (int, default=None) – set random seed
- Returns:
self – The
WrapRegressorobject is updated with a fittedConformalRegressororConformalPredictiveSystem- Return type:
object
Examples
Assuming
X_calandy_calto be an array and vector, respectively, with objects and labels for the calibration set, andwis aWrapRegressorobject for which the learner has been fitted, a standard conformal regressor is formed by:w.calibrate(X_cal, y_cal)
Assuming that
deis a fitted difficulty estimator, a normalized conformal regressor is obtained by:w.calibrate(X_cal, y_cal, de=de)
Assuming that
get_categoriesis a function that returns categories (bin labels), a Mondrian conformal regressor is obtained by:w.calibrate(X_cal, y_cal, mc=get_categories)
A normalized Mondrian conformal regressor is generated in the following way:
w.calibrate(X_cal, y_cal, de=de, mc=get_categories)
By providing the option
oob=True, the conformal regressor will be calibrating using out-of-bag predictions, allowing the full set of training objects (X_train) and labels (y_train) to be used, e.g.,w.calibrate(X_train, y_train, oob=True)
By providing the option
cps=True, each of the above calls will instead generate aConformalPredictiveSystem, e.g.,w.calibrate(X_cal, y_cal, de=de, cps=True)
Note
By providing a random seed, e.g.,
seed=123, the call tocalibrateas well as calls to the methodspredict_int,predict_cpsandevaluateof theWrapRegressorobject will be deterministic.Note
Enabling out-of-bag calibration, i.e., setting
oob=True, requires that the wrapped learner has an attributeoob_prediction_, which e.g., is the case for asklearn.ensemble.RandomForestRegressor, if enabled when created, e.g.,RandomForestRegressor(oob_score=True)Note
The use of out-of-bag calibration, as enabled by
oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.
- predict_p(X, y=None, t=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#
Return (smoothed or non-smoothed) p-values for provided targets, using fitted
ConformalRegressororConformalPredictiveSystem.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct labels, used for online calibration if online=True, and used as targets if t=None
t (int, float or array-like of shape (n_samples,), default=None) – targets
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
p_values – p_values
- Return type:
ndarray of shape (n_samples,)
Examples
Assuming that
X_testis a set of test objects,y_testis the set of correct labels andwis aWrapRegressorobject that has been calibrated, i.e.,calibrate()has been applied, then (smoothed) p-values are obtained by:p_values = w.predict_p(X_test, y_test)
Given a single or vector of targets
t, p-values can be obtained using online calibration by:p_values = w.predict_p(X_test, y_test, t, online=True)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given when callingcalibrate.
- predict_int(X, y=None, confidence=0.95, y_min=-inf, y_max=inf, seed=None, online=False, warm_start=True)[source]#
Obtain prediction intervals with fitted
ConformalRegressororConformalPredictiveSystem.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_samples, 2)
Examples
Assuming that
X_testis a set of test objects andwis aWrapRegressorobject that has been calibrated, i.e.,calibrate()has been applied, prediction intervals at the 99% confidence level can be obtained by:intervals = w.predict_int(X_test, confidence=0.99)
The following provides prediction intervals at the default confidence level (95%), where the intervals are lower-bounded by 0:
intervals = w.predict_int(X_test, y_min=0)
Assuming
y_testis a vector containing the correct labels for the test objects, intervals (at the default confidence level) are provided using online calibration by:intervals = w.predict_int(X_test, y_test, online=True)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given when callingcalibrate.
- predict_percentiles(X, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, seed=None, online=False, warm_start=True)[source]#
Obtain percentiles with conformal predictive system.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True
lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)
higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include
y_max (float or int, default=numpy.inf) – The maximum value to include
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
percentiles
- Return type:
ndarray of shape (n_values, n_percentiles)
Examples
Assuming that
X_testis a set of test objects andcpsis aWrapRegressorobject that has been calibrated while enabling the generation of a conformal predictive system, i.e.,calibrate()has been called withcps=True, percentiles can be obtained by:percentiles = cps.predict_percentiles(X_test, lower_percentiles=2.5, higher_percentiles=97.5)
Multiple (lower and higher) percentiles may be requested by: .. code-block:: python
- percentiles = cps.predict_percentiles(X_test,
lower_percentiles=[2.5,5], higher_percentiles=[95,97.5])
Assuming
y_testis a vector containing the correct labels for the test objects, percentiles are provided using online calibration by:intervals = cps.predict_percentiles(X_test, y_test, higher_percentiles=[90,95,99], online=True)
- predict_cpds(X, y=None, seed=None, online=False, warm_start=True)[source]#
Obtain conformal predictive distributions from conformal predictive system.
- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (array-like of shape (n_samples,), default=None) – correct labels; used only if online=True
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
cpds – conformal predictive distributions. If online=False and bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the output is a vector of arrays.
- Return type:
ndarray of shape (n_values, c_values) or (n_values,)
Examples
Assuming that
X_testis a set of test objects andcpsis aWrapRegressorobject that has been calibrated while enabling the generation of a conformal predictive system, i.e.,calibrate()has been called withcps=True, conformal predictive distributions (cpds) can be obtained by:cpds = cps.predict_cpds(X_test)
Assuming
y_testis a vector containing the correct labels for the test objects, cpds can be generated using online calibration by:cpds = cps.predict_cpds(X_test, y_test, online=True)
Note
The returned array may be very large as its size is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins. For online calibration, the largest element in the vector may be of the same size as the concatenation of the calibration and test sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_cps(X, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False, smoothing=True, seed=None)[source]#
Predict using
ConformalPredictiveSystem.- Parameters:
X (array-like of shape (n_samples, n_features),) – set of objects
y (float, int or array-like of shape (n_samples,), default=None) – values for which p-values should be returned
lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)
higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.
y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.
return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not
cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
- Returns:
results (ndarray of shape (n_samples, n_cols) or (n_samples,)) – the shape is (n_samples, n_cols) if n_cols > 1 and otherwise (n_samples,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.
cpds (ndarray of (n_samples, c_values), ndarray of (n_samples,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. For non-Mondrian conformal predictive systems, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. For Mondrian conformal predictive systems, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each Mondrian category, if cpds_by_bins = True.
Examples
Assuming that
X_testis a set of test objects,y_testis a vector with true targets,wis aWrapRegressorobject calibrated with the optioncps=True, p-values for the true targets can be obtained by:p_values = w.predict_cps(X_test, y=y_test)
P-values with respect to some specific value, e.g., 37, can be obtained by:
p_values = w.predict_cps(X_test, y=37)
The 90th and 95th percentiles can be obtained by:
percentiles = w.predict_cps(X_test, higher_percentiles=[90,95])
In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:
percentiles = w.predict_cps(X_test, lower_percentiles=[90,95])
The following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:
intervals = w.predict_cps(X_test, lower_percentiles=2.5, higher_percentiles=97.5, y_min=0)
If we would like to obtain the conformal distributions, we could write the following:
cpds = w.predict_cps(X_test, return_cpds=True)
The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). If the learner is wrapped with a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:
cpds = w.predict_cps(X_test, return_cpds=True, cpds_by_bins=True)
Note
This method is available only if the learner has been wrapped with a
ConformalPredictiveSystem, i.e.,calibrate()has been called with the optioncps=True.Note
In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be
y_minandy_max, respectively.Note
Setting
return_cpds=Truemay consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
Setting
cpds_by_bins=Truehas an effect only for Mondrian conformal predictive systems.Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tocalibrate.
- evaluate(X, y, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, seed=None, online=False, warm_start=True)[source]#
Evaluate
ConformalRegressororConformalPredictiveSystem.- Parameters:
X (array-like of shape (n_samples, n_features)) – set of objects
y (array-like of shape (n_samples,)) – correct labels
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings, default=list of all) – metrics; for a learner wrapped with a conformal regressor these are “error”, “eff_mean”,”eff_med”, “ks_test”, “time_fit”, and “time_evaluate”, while if wrapped with a conformal predictive system, the metrics also include “CRPS”.
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “CRPS” is the continuous ranked probability score, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal regressor/predictive system, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
X_testis a set of test objects,y_testis a vector with true targets, andwis a calibratedWrapRegressorobject, then the latter can be evaluated at the 90% confidence level with respect to error, mean and median efficiency (interval size) by:results = w.evaluate(X_test, y_test, confidence=0.9, metrics=["error", "eff_mean", "eff_med"])
Note
The metric
CRPSis only available for batch evaluation, i.e., whenonline=False, and will be ignored if theWrapRegressorobject has been calibrated with the (default) optioncps=False, i.e., the learner is wrapped with aConformalRegressor.Note
The use of the metric
CRPSmay require a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of categories.Note
The reported result for
time_fitonly considers fitting the conformal regressor or predictive system; not for fitting the learner.Note
If a value for
seedis given, it will take precedence over anyseedvalue given when callingcalibrate.
- class crepes.ConformalClassifier[source]#
A conformal classifier transforms non-conformity scores into p-values or prediction sets for a certain confidence level.
Methods
evaluate(alphas, classes, y[, bins, ...])Evaluate conformal classifier.
fit(alphas[, bins, seed])Fit conformal classifier.
predict_p(alphas[, bins, all_classes, ...])Obtain (smoothed or non-smoothed) p-values from conformal classifier.
predict_p_online(alphas, classes, y[, bins, ...])Obtain (smoothed or non-smoothed) p-values from conformal classifier, computed using online calibration.
predict_set(alphas[, bins, confidence, ...])Obtain prediction sets using conformal classifier.
predict_set_online(alphas, classes, y[, ...])Obtain prediction sets using conformal classifier, computed using online calibration.
- fit(alphas, bins=None, seed=None)[source]#
Fit conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples,)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
seed (int, default=None) – set random seed
- Returns:
self – Fitted ConformalClassifier.
- Return type:
object
Examples
Assuming that
alphas_calis a vector with non-conformity scores, then a standard conformal classifier is formed in the following way:from crepes import ConformalClassifier cc_std = ConformalClassifier() cc_std.fit(alphas_cal)
Assuming that
bins_calsis a vector with Mondrian categories (bin labels), then a Mondrian conformal classifier is fitted in the following way:cc_mond = ConformalClassifier() cc_mond.fit(alphas_cal, bins=bins_cal)
Note
By providing a random seed, e.g.,
seed=123, calls to the methodspredict_p,predict_setandevaluateof theConformalClassifierobject will be deterministic.
- predict_p(alphas, bins=None, all_classes=True, classes=None, y=None, smoothing=True, seed=None)[source]#
Obtain (smoothed or non-smoothed) p-values from conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
all_classes (bool, default=True) – return p-values for all classes
classes (array-like of shape (n_classes,), default=None) – class names, used only if all_classes=False
y (array-like of shape (n_samples,), default=None) – correct class labels, used only if all_classes=False
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_testis a vector with non-conformity scores for a test set andcc_stda fitted standard conformal classifier, then p-values for the test set is obtained by:p_values = cc_std.predict_p(alphas_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcc_monda fitted Mondrian conformal classifier, then the following provides (smoothed) p-values for the test set:p_values = cc_mond.predict_p(alphas_test, bins=bins_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given when callingfit.
- predict_p_online(alphas, classes, y, bins=None, all_classes=True, smoothing=True, seed=None, warm_start=True)[source]#
Obtain (smoothed or non-smoothed) p-values from conformal classifier, computed using online calibration.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
classes (array-like of shape (n_classes,)) – class names
y (array-like of shape (n_samples,)) – correct class labels
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
all_classes (bool, default=True) – return p-values for all classes
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
warm_start (bool, default=True) – extend original calibration set
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_testis a vector with non-conformity scores for a test set,classesis a vector with class names,y_testis a vector with the correct class labels for the test set, andcc_stda fitted standard conformal classifier, then p-values for the test set is obtained by:p_values = cc_std.predict_p_online(alphas_test, classes, y_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcc_monda fitted Mondrian conformal classifier, then the following provides (smoothed) p-values for the test set:p_values = cc_mond.predict_p_online(alphas_test, classes, y_test, bins=bins_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_set(alphas, bins=None, confidence=0.95, smoothing=True, seed=None)[source]#
Obtain prediction sets using conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=True) – use smoothed p-values
seed (int, default=None) – set random seed
- Returns:
prediction sets – prediction sets
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_testis a vector with non-conformity scores for a test set andcc_stda fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained by:prediction_sets = cc_std.predict_set(alphas_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcc_monda fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:p_values = cc_mond.predict_set(alphas_test, bins=bins_test, confidence=0.9)
Note
The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_set_online(alphas, classes, y, bins=None, confidence=0.95, smoothing=True, seed=None, warm_start=True)[source]#
Obtain prediction sets using conformal classifier, computed using online calibration.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
classes (array-like of shape (n_classes,)) – class names
y (array-like of shape (n_samples,)) – correct class labels
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=True) – use smoothed p-values
seed (int, default=None) – set random seed
warm_start (bool, default=True) – extend original calibration set
- Returns:
prediction sets – prediction sets
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
alphas_testis a vector with non-conformity scores for a test set,classesis a vector with class names,yis a vector with the correct class labels for the test set, andcc_stda fitted standard conformal classifier, then prediction sets at the default (95%) confidence level are obtained using online calibration by:prediction_sets = cc_std.predict_set_online(alphas_test, classes, y_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcc_monda fitted Mondrian conformal classifier, then the following provides prediction sets for the test set, at the 90% confidence level:p_values = cc_mond.predict_set_online(alphas_test, classes, y_test, bins=bins_test, confidence=0.9)
Note
The use of smoothed p-values increases computation time and typically has a minor effect on the predictions sets, except for small calibration sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- evaluate(alphas, classes, y, bins=None, confidence=0.95, smoothing=True, metrics=None, seed=None, online=False, warm_start=True)[source]#
Evaluate conformal classifier.
- Parameters:
alphas (array-like of shape (n_samples, n_classes)) – non-conformity scores
classes (array-like of shape (n_classes,)) – class names
y (array-like of shape (n_samples,)) – correct class labels
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
smoothing (bool, default=True) – use smoothed p-values
metrics (a string or a list of strings,) – default = list of all metrics, i.e., [“error”, “avg_c”, “one_c”, “empty”, “ks_test”, “time_fit”, “time_evaluate”]
seed (int, default=None) – set random seed
online (bool, default=False) – compute p-values using online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction sets not containing the true class label, “avg_c” is the average no. of predicted class labels, “one_c” is the fraction of singleton prediction sets, “empty” is the fraction of empty prediction sets, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal classifier, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
alphasis an array containing non-conformity scores for all classes for the test objects,classesandy_testare vectors with the class names and true class labels for the test set, respectively, andccis a fitted standard conformal classifier, then the latter can be evaluated at the default confidence level with respect to error and average number of labels in the prediction sets by:results = cc.evaluate(alphas, y_test, metrics=["error", "avg_c"])
Note
The use of smoothed p-values increases computation time and typically has a minor effect on the results, except for small calibration sets.
Note
If a value for
seedis given, it will take precedence over anyseedvalue given when callingfit.
- class crepes.ConformalRegressor[source]#
A conformal regressor transforms point predictions (regression values) into prediction intervals, for a certain confidence level.
Methods
evaluate(y_hat, y[, sigmas, bins, ...])Evaluate conformal regressor.
fit(residuals[, sigmas, bins])Fit conformal regressor.
predict_int(y_hat[, sigmas, bins, ...])Obtain prediction intervals from conformal regressor.
predict_int_online(y_hat, y[, sigmas, bins, ...])Obtain prediction intervals from conformal regressor, where the intervals are formed using online calibration.
predict_p(y_hat, y[, sigmas, bins, ...])Obtain (smoothed or non-smoothed) p-values from conformal regressor.
predict_p_online(y_hat, y[, t, sigmas, ...])Obtain (smoothed or non-smoothed) p-values from conformal regressor, computed using online calibration.
- fit(residuals, sigmas=None, bins=None)[source]#
Fit conformal regressor.
- Parameters:
residuals (array-like of shape (n_values,)) – true values - predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
- Returns:
self – Fitted ConformalRegressor.
- Return type:
object
Examples
Assuming that
y_calandy_hat_calare vectors with true and predicted targets for some calibration set, then a standard conformal regressor can be formed from the residuals:residuals_cal = y_cal - y_hat_cal from crepes import ConformalRegressor cr_std = ConformalRegressor() cr_std.fit(residuals_cal)
Assuming that
sigmas_calis a vector with difficulty estimates, then a normalized conformal regressor can be fitted in the following way:cr_norm = ConformalRegressor() cr_norm.fit(residuals_cal, sigmas=sigmas_cal)
Assuming that
bins_calsis a vector with Mondrian categories (bin labels), then a Mondrian conformal regressor can be fitted in the following way:cr_mond = ConformalRegressor() cr_mond.fit(residuals_cal, bins=bins_cal)
A normalized Mondrian conformal regressor can be fitted in the following way:
cr_norm_mond = ConformalRegressor() cr_norm_mond.fit(residuals_cal, sigmas=sigmas_cal, bins=bins_cal)
- predict_p(y_hat, y, sigmas=None, bins=None, smoothing=True, seed=None)[source]#
Obtain (smoothed or non-smoothed) p-values from conformal regressor.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples, n_classes)
Examples
Assuming that
y_hatandy_testare vectors with predicted and correct labels for a test set andcr_stda fitted standard conformal regressor, then p-values are obtained by:p_values = cr_std.predict_p(y_hat, y_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcr_monda fitted Mondrian conformal regressor, then the following provides (smoothed) p-values:p_values = cr_mond.predict_p(y_hat, y, bins=bins_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_p_online(y_hat, y, t=None, sigmas=None, bins=None, smoothing=True, seed=None, warm_start=True)[source]#
Obtain (smoothed or non-smoothed) p-values from conformal regressor, computed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels, used as targets if t=None
t (int, float or array-like of shape (n_samples,), default=None) – targets
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
warm_start (bool, default=True) – extend original calibration set
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples,)
Examples
Assuming that
y_hatandy_testare vectors with predicted and correct labels for a test set andcr_stda fitted standard conformal regressor, then p-values for the correct labels are obtained by online calibration by:p_values = cr_std.predict_p_online(y_hat, y_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcr_monda fitted Mondrian conformal regressor, then the following provides (smoothed) p-values:p_values = cr_mond.predict_p_online(y_hat, y, bins=bins_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_int(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#
Obtain prediction intervals from conformal regressor.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_values, 2)
Examples
Assuming that
y_hat_testis a vector with predicted targets for a test set andcr_stda fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained by:intervals = cr_std.predict_int(y_hat_test, confidence=0.99)
Assuming that
sigmas_testis a vector with difficulty estimates for the test set andcr_norma fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:intervals = cr_norm.predict_int(y_hat_test, sigmas=sigmas_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcr_monda fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:intervals = cr_mond.predict_int(y_hat_test, bins=bins_test, y_min=0)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, a warning will be issued and the output intervals will be of maximum size.
- predict_int_online(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, warm_start=True)[source]#
Obtain prediction intervals from conformal regressor, where the intervals are formed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
warm_start (bool, default=True) – extend original calibration set
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_values, 2)
Examples
Assuming that
y_hat_testis a vector with predicted targets andy_testis a vector with correct targets for a test set andcr_stdis a fitted standard conformal regressor, then prediction intervals at the 99% confidence level can be obtained using online calibration by:intervals = cr_std.predict_int_online(y_hat_test, y_test, confidence=0.99)
Assuming that
sigmas_testis a vector with difficulty estimates for the test set andcr_norma fitted normalized conformal regressor, then prediction intervals at the default (95%) confidence level can be obtained by:intervals = cr_norm.predict_int_online(y_hat_test, y_test, sigmas=sigmas_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcr_monda fitted Mondrian conformal regressor, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:intervals = cr_mond.predict_int_online(y_hat_test, y_test, bins=bins_test, y_min=0)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.
- evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#
Evaluate conformal regressor.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings,) –
default=list of all metrics, i.e., [“error”, “eff_mean”, “eff_med”, “ks_test”,
”time_fit”, “time_evaluate”]
smoothing (bool, default=True) – employ smoothed p-values
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal regressor, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets for a test set,sigmas_testandbins_testare vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, andcr_norm_mondis a fitted normalized Mondrian conformal regressor, then the latter can be evaluated at the default confidence level with respect to error and mean efficiency (interval size) by:results = cr_norm_mond.evaluate(y_hat_test, y_test, sigmas=sigmas_test, bins=bins_test, metrics=["error", "eff_mean"])
- class crepes.ConformalPredictiveSystem[source]#
A conformal predictive system transforms point predictions (regression values) into cumulative distribution functions (conformal predictive distributions).
Methods
evaluate(y_hat, y[, sigmas, bins, ...])Evaluate conformal predictive system.
fit(residuals[, sigmas, bins, seed])Fit conformal predictive system.
predict(y_hat[, sigmas, bins, y, ...])Predict using conformal predictive system.
predict_cpds(y_hat[, sigmas, bins, cpds_by_bins])Obtain conformal predictive distributions from conformal predictive system.
predict_cpds_online(y_hat, y[, sigmas, ...])Obtain conformal predictive distributions from conformal predictive system, computed using online calibration.
predict_int(y_hat[, sigmas, bins, ...])Obtain prediction intervals from conformal predictive system.
predict_int_online(y_hat, y[, sigmas, bins, ...])Obtain prediction intervals from conformal predictive system, where the intervals are formed using online calibration.
predict_p(y_hat, y[, sigmas, bins, ...])Obtain p-values from conformal predictive system.
predict_p_online(y_hat, y[, t, sigmas, ...])Obtain (smoothed or non-smoothed) p-values from conformal predictive system, computed using online calibration.
predict_percentiles(y_hat[, sigmas, bins, ...])Obtain percentiles with conformal predictive system.
predict_percentiles_online(y_hat, y[, ...])Obtain percentiles from conformal predictive system, computed using online calibration.
- fit(residuals, sigmas=None, bins=None, seed=None)[source]#
Fit conformal predictive system.
- Parameters:
residuals (array-like of shape (n_values,)) – actual values - predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
seed (int, default=None) – set random seed
- Returns:
self – Fitted ConformalPredictiveSystem.
- Return type:
object
Examples
Assuming that
y_calandy_hat_calare vectors with true and predicted targets for some calibration set, then a standard conformal predictive system can be formed from the residuals:residuals_cal = y_cal - y_hat_cal from crepes import ConformalPredictiveSystem cps_std = ConformalPredictiveSystem() cps_std.fit(residuals_cal)
Assuming that
sigmas_calis a vector with difficulty estimates, then a normalized conformal predictive system can be fitted in the following way:cps_norm = ConformalPredictiveSystem() cps_norm.fit(residuals_cal, sigmas=sigmas_cal)
Assuming that
bins_calsis a vector with Mondrian categories (bin labels), then a Mondrian conformal predictive system can be fitted in the following way:cps_mond = ConformalPredictiveSystem() cps_mond.fit(residuals_cal, bins=bins_cal)
A normalized Mondrian conformal predictive system can be fitted in the following way:
cps_norm_mond = ConformalPredictiveSystem() cps_norm_mond.fit(residuals_cal, sigmas=sigmas_cal, bins=bins_cal)
Note
By providing a random seed, e.g.,
seed=123, calls to the methodspredictandevaluateof theConformalPredictiveSystemobject will be deterministic.
- predict_p(y_hat, y, sigmas=None, bins=None, smoothing=True, seed=None)[source]#
Obtain p-values from conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (int, float or array-like of shape (n_values,)) – labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
- Returns:
p_values
- Return type:
ndarray of shape (n_values,)
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets, respectively, for a test set andcps_stda fitted standard conformal predictive system, the p-values for the true targets can be obtained by:p_values = cps_std.predict(y_hat_test, y=y_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.Note
If smoothing is disabled, i.e.,
smoothing=False, then setting a value forseedhas no effect.
- predict_p_online(y_hat, y, t=None, sigmas=None, bins=None, smoothing=True, seed=None, warm_start=True)[source]#
Obtain (smoothed or non-smoothed) p-values from conformal predictive system, computed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels, used as targets if t=None
t (int, float or array-like of shape (n_samples,), default=None) – targets
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_samples,), default=None) – Mondrian categories
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
warm_start (bool, default=True) – extend original calibration set
- Returns:
p-values – p-values
- Return type:
ndarray of shape (n_samples,)
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets, respectively, for a test set andcps_stda fitted standard conformal predictive system, the p-values for the true targets, computed using online calibration, can be obtained by:p_values = cps_std.predict_p_online(y_hat_test, y_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcps_monda fitted Mondrian conformal predictive system, then the following provides (smoothed) p-values for the test set:p_values = cps_mond.predict_p_online(y_hat_test, y_test, bins=bins_test)
Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.Note
If smoothing is disabled, i.e.,
smoothing=False, then setting a value forseedhas no effect.
- predict_int(y_hat, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf)[source]#
Obtain prediction intervals from conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.
y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.
- Returns:
intervals
- Return type:
ndarray of shape (n_values, 2)
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets, respectively, for a test set andcps_stda fitted standard conformal predictive system, the p-values for the true targets can be obtained by:p_values = cps_std.predict_int(y_hat_test, y=y_test)
Note
In case the calibration set is too small for the specified confidence level, a warning will be issued and the output will be
y_minandy_max, respectively.Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- predict_int_online(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, warm_start=True)[source]#
Obtain prediction intervals from conformal predictive system, where the intervals are formed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
warm_start (bool, default=True) – extend original calibration set
- Returns:
intervals – prediction intervals
- Return type:
ndarray of shape (n_values, 2)
Examples
Assuming that
y_hat_testis a vector with predicted targets andy_testis a vector with the correct targets for a test set andcps_stda fitted standard conformal predictive system, then prediction intervals at the 99% confidence level can be obtained using online calibration by:intervals = cps_std.predict_int_online(y_hat_test, y_test, confidence=0.99)
Assuming that
sigmas_testis a vector with difficulty estimates for the test set andcps_norma fitted normalized conformal predictive system, then prediction intervals at the default (95%) confidence level can be obtained using online calibration by:intervals = cps_norm.predict_int_online(y_hat_test, y_test, sigmas=sigmas_test)
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcps_monda fitted Mondrian conformal predictive system, then the following provides prediction intervals at the default confidence level, where the intervals are lower-bounded by 0:intervals = cps_mond.predict_int_online(y_hat_test, y_test, bins=bins_test, y_min=0)
Note
In case the specified confidence level is too high in relation to the size of the calibration set, the output intervals will be of maximum size.
- predict_percentiles(y_hat, sigmas=None, bins=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf)[source]#
Obtain percentiles with conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
lower_percentiles (float, int, or array-like of shape (l_values,),) – default=None percentiles for which a lower value will be output in case a percentile lies between two values (equivalent to interpolation=”lower” in numpy.percentile)
higher_percentiles (float, int, or array-like of shape (h_values,),) – default=None percentiles for which a higher value will be output in case a percentile lies between two values (equivalent to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include
y_max (float or int, default=numpy.inf) – The maximum value to include
- Returns:
percentiles – percentiles
- Return type:
ndarray of shape (n_values, l_values + h_values)
Examples
Assuming that
y_hat_testis a vector with predicted targets for a test set andcps_stda fitted standard conformal predictive system, then percentiles can be obtained by:p_values = cps_std.predict_percentiles(y_hat_test, lower_percentiles=2.5, higher_percentiles=97.5)
Note
In case the calibration set is too small for the specified percentiles level, a warning will be issued and the output will be
y_minandy_max, respectively.
- predict_percentiles_online(y_hat, y, sigmas=None, bins=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, warm_start=True)[source]#
Obtain percentiles from conformal predictive system, computed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
lower_percentiles (float, int, or array-like of shape (l_values,),) – default=None percentiles for which a lower value will be output in case a percentile lies between two values (equivalent to interpolation=”lower” in numpy.percentile)
higher_percentiles (float, int, or array-like of shape (h_values,),) – default=None percentiles for which a higher value will be output in case a percentile lies between two values (equivalent to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include
y_max (float or int, default=numpy.inf) – The maximum value to include
warm_start (bool, default=True) – extend original calibration set
- Returns:
percentiles – percentiles
- Return type:
ndarray of shape (n_values, l_values + h_values)
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and correct targets, respectively, for a test set andcps_stda fitted standard conformal predictive system, then percentiles computed using online calibration can be obtained by:p_values = cps_std.predict_percentiles_online(y_hat_test, y_test, lower_percentiles=2.5, higher_percentiles=97.5)
Note
In case the calibration set is too small for the specified percentiles level, the output values will be
y_minandy_max, respectively.
- predict_cpds(y_hat, sigmas=None, bins=None, cpds_by_bins=False)[source]#
Obtain conformal predictive distributions from conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not;
- Returns:
cpds – or list of ndarrays conformal predictive distributions. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.
- Return type:
ndarray of shape (n_values, c_values) or (n_values,)
Examples
Assuming that
y_hat_testis a vector with predicted targets for a test set andcps_stda fitted standard conformal predictive system, conformal predictive distributions (cpds) can be obtained by:cpds = cps_std.predict_cpds(y_hat_test)
Note
The returned array may be very large as its size is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.
Note
Setting
cpds_by_bins=Truehas an effect only for Mondrian conformal predictive systems.
- predict_cpds_online(y_hat, y, sigmas=None, bins=None, warm_start=True)[source]#
Obtain conformal predictive distributions from conformal predictive system, computed using online calibration.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
warm_start (bool, default=True) – extend original calibration set
- Returns:
cpds – conformal predictive distributions
- Return type:
ndarray of shape (n_values,)
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and correct targets for a test set andcps_stda fitted standard conformal predictive system, then conformal predictive distributions can be obtained using online calibration by:cpds = cps_std.predict_cpds_online(y_hat_test, y_test)
Note
The returned vector of vectors may be very large; the largest element may be of the same size as the concatenation of the calibration and test sets.
- predict(y_hat, sigmas=None, bins=None, y=None, lower_percentiles=None, higher_percentiles=None, y_min=-inf, y_max=inf, return_cpds=False, cpds_by_bins=False, smoothing=True, seed=None)[source]#
Predict using conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
y (float, int or array-like of shape (n_values,), default=None) – values for which p-values should be returned
lower_percentiles (array-like of shape (l_values,), default=None) – percentiles for which a lower value will be output in case a percentile lies between two values (similar to interpolation=”lower” in numpy.percentile)
higher_percentiles (array-like of shape (h_values,), default=None) – percentiles for which a higher value will be output in case a percentile lies between two values (similar to interpolation=”higher” in numpy.percentile)
y_min (float or int, default=-numpy.inf) – The minimum value to include in prediction intervals.
y_max (float or int, default=numpy.inf) – The maximum value to include in prediction intervals.
return_cpds (Boolean, default=False) – specifies whether conformal predictive distributions (cpds) should be output or not
cpds_by_bins (Boolean, default=False) – specifies whether the output cpds should be grouped by bin or not; only applicable when bins is not None and return_cpds = True
smoothing (bool, default=True) – return smoothed p-values
seed (int, default=None) – set random seed
- Returns:
results (ndarray of shape (n_values, n_cols) or (n_values,)) – the shape is (n_values, n_cols) if n_cols > 1 and otherwise (n_values,), where n_cols = p_values+l_values+h_values where p_values = 1 if y is not None and 0 otherwise, l_values are the number of lower percentiles, and h_values are the number of higher percentiles. Only returned if n_cols > 0.
cpds (ndarray of (n_values, c_values), ndarray of (n_values,)) – or list of ndarrays conformal predictive distributions. Only returned if return_cpds == True. If bins is None, the distributions are represented by a single array, where the number of columns (c_values) is determined by the number of residuals of the fitted conformal predictive system. Otherwise, the distributions are represented by a vector of arrays, if cpds_by_bins = False, or a list of arrays, with one element for each bin, if cpds_by_bins = True.
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets, respectively, for a test set andcps_stda fitted standard conformal predictive system, the p-values for the true targets can be obtained by:p_values = cps_std.predict(y_hat_test, y=y_test)
The p-values with respect to some specific value, e.g., 37, can be obtained by:
p_values = cps_std.predict(y_hat_test, y=37)
Assuming that
sigmas_testis a vector with difficulty estimates for the test set andcps_norma fitted normalized conformal predictive system, then the 90th and 95th percentiles can be obtained by:percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test, higher_percentiles=[90,95])
In the above example, the nearest higher value is returned, if there is no value that corresponds exactly to the requested percentile. If we instead would like to retrieve the nearest lower value, we should write:
percentiles = cps_norm.predict(y_hat_test, sigmas=sigmas_test, lower_percentiles=[90,95])
Assuming that
bins_testis a vector with Mondrian categories (bin labels) for the test set andcps_monda fitted Mondrian conformal regressor, then the following returns prediction intervals at the 95% confidence level, where the intervals are lower-bounded by 0:intervals = cps_mond.predict(y_hat_test, bins=bins_test, lower_percentiles=2.5, higher_percentiles=97.5, y_min=0)
If we would like to obtain the conformal distributions, we could write the following:
cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test, return_cpds=True)
The output of the above will be an array with a row for each test instance and a column for each calibration instance (residual). For a Mondrian conformal predictive system, the above will instead result in a vector, in which each element is a vector, as the number of calibration instances may vary between categories. If we instead would like an array for each category, this can be obtained by:
cpds = cps_norm.predict(y_hat_test, sigmas=sigmas_test, return_cpds=True, cpds_by_bins=True)
Note
In case the calibration set is too small for the specified lower and higher percentiles, a warning will be issued and the output will be
y_minandy_max, respectively.Note
Setting
return_cpds=Truemay consume a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
Setting
cpds_by_bins=Truehas an effect only for Mondrian conformal predictive systems.Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.
- evaluate(y_hat, y, sigmas=None, bins=None, confidence=0.95, y_min=-inf, y_max=inf, metrics=None, smoothing=True, seed=None, online=False, warm_start=True)[source]#
Evaluate conformal predictive system.
- Parameters:
y_hat (array-like of shape (n_values,)) – predicted values
y (array-like of shape (n_values,)) – correct labels
sigmas (array-like of shape (n_values,), default=None) – difficulty estimates
bins (array-like of shape (n_values,), default=None) – Mondrian categories
confidence (float in range (0,1), default=0.95) – confidence level
y_min (float or int, default=-numpy.inf) – minimum value to include in prediction intervals
y_max (float or int, default=numpy.inf) – maximum value to include in prediction intervals
metrics (a string or a list of strings, default=list of all) – applicable metrics; [“error”, “eff_mean”,”eff_med”, “CRPS”, “ks_test”, “time_fit”, “time_evaluate”]
smoothing (bool, default=True) – employ smoothed p-values
seed (int, default=None) – set random seed
online (bool, default=False) – employ online calibration
warm_start (bool, default=True) – extend original calibration set; used only if online=True
- Returns:
results – estimated performance using the metrics, where “error” is the fraction of prediction intervals not containing the true label, “eff_mean” is the mean length of prediction intervals, “eff_med” is the median length of the prediction intervals, “CRPS” is the continuous ranked probability score, “ks_test” is the p-value for the Kolmogorov-Smirnov test of uniformity of predicted p-values, “time_fit” is the time taken to fit the conformal predictive system, and “time_evaluate” is the time taken for the evaluation
- Return type:
dictionary with a key for each selected metric
Examples
Assuming that
y_hat_testandy_testare vectors with predicted and true targets for a test set,sigmas_testandbins_testare vectors with difficulty estimates and Mondrian categories (bin labels) for the test set, andcps_norm_mondis a fitted normalized Mondrian conformal predictive system, then the latter can be evaluated at the default confidence level with respect to error, mean and median efficiency (interval size, given the default confidence level) and continuous-ranked probability score (CRPS) by:results = cps_norm_mond.evaluate(y_hat_test, y_test, sigmas=sigmas_test, bins=bins_test, metrics=["error", "eff_mean", "eff_med", "CRPS"])
Note
The use of the metric
CRPSmay require a lot of memory, as a matrix is generated for which the number of elements is the product of the number of calibration and test objects, unless a Mondrian approach is employed; for the latter, this number is reduced by increasing the number of bins.Note
The metric
CRPSis only available for batch evaluation, i.e., whenonline=False.Note
If a value for
seedis given, it will take precedence over anyseedvalue given in the call tofit.