The crepes.extras module#
- crepes.extras.hinge(X_prob, classes=None, y=None)[source]#
Computes non-conformity scores for conformal classifiers.
- Parameters:
X_prob (array-like of shape (n_samples, n_classes)) – predicted class probabilities
classes (array-like of shape (n_classes,), default=None) – class names
y (array-like of shape (n_samples,), default=None) – correct target values
- Returns:
scores – non-conformity scores. The shape is (n_samples, n_classes) if classes and y are None.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_classes)
Examples
Assuming that
X_probis an array with predicted probabilities andclassesandyare vectors with the class names (in order) and correct class labels, respectively, the non-conformity scores are generated by:from crepes.extras import hinge alphas = hinge(X_prob, classes, y)
The above results in that
alphasis assigned a vector of the same length asX_probwith a non-conformity score for each object, here defined as 1 minus the predicted probability for the correct class label. These scores can be used when fitting aConformalClassifieror calibrating aWrapClassifier. Non-conformity scores for test objects, for whichyis not known, can be obtained from the corresponding predicted probabilities (X_prob_test) by:alphas_test = hinge(X_prob_test)
The above results in that
alphas_testis assigned an array of the same shape asX_prob_testwith non-conformity scores for each class in the columns for each test object.
- crepes.extras.margin(X_prob, classes=None, y=None)[source]#
Computes non-conformity scores for conformal classifiers.
- Parameters:
X_prob (array-like of shape (n_samples, n_classes)) – predicted class probabilities
classes (array-like of shape (n_classes,), default=None) – class names
y (array-like of shape (n_samples,), default=None) – correct target values
- Returns:
scores – non-conformity scores. The shape is (n_samples, n_classes) if classes and y are None.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_classes)
Examples
Assuming that
X_probis an array with predicted probabilities andclassesandyare vectors with the class names (in order) and correct class labels, respectively, the non-conformity scores are generated by:from crepes.extras import margin alphas = margin(X_prob, classes, y)
The above results in that
alphasis assigned a vector of the same length asX_probwith a non-conformity score for each object, here defined as the highest predicted probability for a non-correct class label minus the predicted probability for the correct class label. These scores can be used when fitting aConformalClassifieror calibrating aWrapClassifier. Non-conformity scores for test objects, for whichyis not known, can be obtained from the corresponding predicted probabilities (X_prob_test) by:alphas_test = margin(X_prob_test)
The above results in that
alphas_testis assigned an array of the same shape asX_prob_testwith non-conformity scores for each class in the columns for each test object.
- crepes.extras.binning(values, bins=10)[source]#
Provides bins for a set of values.
- Parameters:
values (array-like of shape (n_samples,)) – set of values
bins (int or array-like of shape (n_bins,), default=10) – number of bins to use for equal-sized binning or threshold values to use for binning
- Returns:
assigned_bins (array-like of shape (n_samples,)) – bins to which values have been assigned
boundaries (array-like of shape (bins+1,)) – threshold values for the bins; the first is always -np.inf and the last is np.inf. Returned only if bins is an int.
Examples
Assuming that
sigmasis a vector with difficulty estimates, then Mondrian categories (bins) can be formed by finding thresholds for 20 equal-sized bins by:from crepes.extras import binning bins, bin_thresholds = binning(sigmas, bins=20)
The above results in that
binsis assigned a vector of the same length assigmaswith label names (integers from 0 to 19), whilebin_thresholdsdefine the boundaries for the bins. The latter can be used to assign bin labels to another vector, e.g.,sigmas_test, by providing the thresholds as input tobinning():test_bins = binning(sigmas_test, bins=bin_thresholds)
Here the output is just a vector
test_binswith label names of the same length assigmas_test.Note
A very small random number is added to each value when forming bins for the purpose of tie-breaking.
- class crepes.extras.DifficultyEstimator[source]#
A difficulty estimator outputs scores for objects to be used by normalized conformal regressors and predictive systems.
Methods
apply([X])Apply difficulty estimator.
fit([X, f, y, residuals, learner, k, ...])Fit difficulty estimator.
- fit(X=None, f=None, y=None, residuals=None, learner=None, k=25, scaler=False, beta=0.01, oob=False)[source]#
Fit difficulty estimator.
- Parameters:
X (array-like of shape (n_samples, n_features), default=None) – set of objects
f (function which given an array-like of shape (n_samples, n_features)) – should return a vector of shape (n_samples,) of type int or float, default=None function used to compute difficulty estimates
y (array-like of shape (n_samples,), default=None) – target values
residuals (array-like of shape (n_samples,), default=None) – true target values - predicted values
learner (an object with attribute
learner.estimators_, default=None) – an ensemble model where each model m inlearner.estimators_has a methodm.predict(used only if f=None)k (int, default=25) – number of neighbors (used only if f=None and learner=None)
scaler (bool, default=True) – use min-max-scaler on the difficulty estimates
beta (int or float, default=0.01) – value to add to the difficulty estimates (after scaling)
oob (bool, default=False) – use out-of-bag estimation
- Returns:
self – Fitted DifficultyEstimator.
- Return type:
object
Examples
Assuming that
X_prop_trainis a proper training set, then a difficulty estimator using the distances to the k nearest neighbors can be formed in the following way (here using the defaultk=25):from crepes.extras import DifficultyEstimator de_knn_dist = DifficultyEstimator() de_knn_dist.fit(X_prop_train)
Assuming that
y_prop_trainis a vector with target values for the proper training set, then a difficulty estimator using standard deviation of the targets of the k nearest neighbors is formed by:de_knn_std = DifficultyEstimator() de_knn_std.fit(X_prop_train, y=y_prop_train)
Assuming that
X_prop_resis a vector with residuals for the proper training set, then a difficulty estimator using the mean of the absolute residuals of the k nearest neighbors is formed by:de_knn_res = DifficultyEstimator() de_knn_res.fit(X_prop_train, residuals=X_prop_res)
Assuming that
learner_propis a trained model for whichlearner.estimators_is a collection of base models, each implementing thepredictmethod; this holds e.g., forRandomForestRegressor, a difficulty estimator using the variance of the predictions of the constituent models is formed by:de_var = DifficultyEstimator() de_var.fit(learner=learner_prop)
The difficulty estimates may be normalized (using min-max scaling) by setting
scaler=True. It should be noted that this comes with a computational cost; for estimators based on the k-nearest neighbor, a leave-one-out protocol is employed to find the minimum and maximum distances that are used by the scaler. This also requires that a set of objects is provided for the variance-based approach (to allow for finding the minimum and maximum values). Hence, if normalization is to be employed for the latter, objects have to be included:de_var = DifficultyEstimator() de_var.fit(X_proper_train, learner=learner_prop, scaler=True)
Difficulty estimates may also be computed by an externally defined function. Assuming that
diff_modelis a fitted regression model, for which thepredictmethod gives estimates of the absolute error for the objects inX_proper_train, then normalized difficulty estimates can be obtained from the following difficulty estimator:de_mod = DifficultyEstimator() de_mod.fit(X_proper_train, f=diff_model.predict, scaler=True)
The
DifficultyEstimatorcan also support the construction of conformal regressors and predictive systems that employ out-of-bag calibration. For the k-nearest neighbor approaches, the difficulty of each object in the provided training set will be computed using a leave-one-out procedure, while for the variance-based approach the out-of-bag predictions will be employed. This is enabled by settingoob=Truewhen calling thefit()method, which also requires the (full) training set (X_train), and for the variance-based approach a corresponding trained model (learner_full) to be provided:de_var_oob = DifficultyEstimator() de_var_oob.fit(X_train, learner=learner_full, scaler=True, oob=True)
A small value (beta) is added to the difficulty estimates. The default is
beta=0.01. In order to make the beta value have the same effect across different estimators, you may consider normalizing the difficulty estimates (using min-max scaling) by settingscaler=True. Note that beta is added after the normalization, which means that the range of scores after normalization will be [0+beta, 1+beta]. Below, we usebeta=0.001together with 10 neighbors (k=10):de_knn_mod = DifficultyEstimator() de_knn_mod.fit(X_prop_train, k=10, beta=0.001, scaler=True)
Note
The use of out-of-bag calibration, as enabled by
oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.
- apply(X=None)[source]#
Apply difficulty estimator.
- Parameters:
X (array-like of shape (n_samples, n_features), default=None) – set of objects
- Returns:
sigmas – difficulty estimates
- Return type:
array-like of shape (n_samples,)
Examples
Assuming
deto be a fittedDifficultyEstimator, i.e., for whichfit()has earlier been called, difficulty estimates for a set of objectsXis obtained by:difficulty_estimates = de.apply(X)
If
de_oobis aDifficultyEstimatorthat has been fitted with the optionoob=Trueand a training set, then a call toapply()without any objects will return the estimates for the training set:oob_difficulty_estimates = de_oob.apply()
For a difficulty estimator employing any of the k-nearest neighbor approaches, the above will return an estimate for the difficulty of each object in the training set computed using a leave-one-out procedure, while for the variance-based approach the out-of-bag predictions will instead be used.
- class crepes.extras.MondrianCategorizer[source]#
A MondrianCategorizer outputs categories for objects to be used by Mondrian conformal classifiers, regressors and predictive systems.
Methods
apply(X)Apply Mondrian categorizer.
fit([X, f, de, learner, no_bins, oob])Fit Mondrian categorizer.
- fit(X=None, f=None, de=None, learner=None, no_bins=10, oob=False)[source]#
Fit Mondrian categorizer.
- Parameters:
X (array-like of shape (n_samples, n_features), default=None) – set of objects
f (function which given an array-like of shape (n_samples, n_features)) – should return a vector of shape (n_samples,) of type int or float, default=None function used to compute Mondrian categories
de (a
DifficultyEstimator, default=None) – a fitted difficulty estimator (used only if f is not None)learner (an object with the method
learner.predict, default=None) – a fitted regression model (used only if de and f are not None)no_bins (int, default=10) – no. of Mondrian categories
oob (bool, default=False) – use out-of-bag estimation (not used if f is not None)
- Returns:
self – Fitted MondrianCategorizer.
- Return type:
object
Examples
Assuming that
X_trainis an array of shape (n_samples, n_features) andget_valuesis a function that givenX_trainreturns a vector of values of shape (n_samples,), then a Mondrian categorizer can be formed in the following way, where the boundaries for the Mondrian categories are found by partitioning the values in the vector into five equal-sized bins:from crepes.extras import MondrianCategorizer mc = MondrianCategorizer() mc.fit(X, f=get_values, no_bins=5)
- apply(X)[source]#
Apply Mondrian categorizer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – set of objects
- Returns:
bins – Mondrian categories
- Return type:
array-like of shape (n_samples,)
Examples
Assuming
mcto be a fittedMondrianCategorizer, i.e., for whichfit()has earlier been called, Mondrian categories for a set of objectsXis obtained by:categories = mc.apply(X)