crepes.extras#

crepes.extras.hinge(X_prob, classes=None, y=None)[source]#

Computes non-conformity scores for conformal classifiers.

Parameters:

X_prob (array-like of shape (n_samples, n_classes)) – predicted class probabilities
classes (array-like of shape (n_classes,), default=None) – class names
y (array-like of shape (n_samples,), default=None) – correct target values

Returns:

scores – non-conformity scores. The shape is (n_samples, n_classes) if classes and y are None.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_classes)

Examples

Assuming that X_prob is an array with predicted probabilities and classes and y are vectors with the class names (in order) and correct class labels, respectively, the non-conformity scores are generated by:

from crepes.extras import hinge

alphas = hinge(X_prob, classes, y)

The above results in that alphas is assigned a vector of the same length as X_prob with a non-conformity score for each object, here defined as 1 minus the predicted probability for the correct class label. These scores can be used when fitting a ConformalClassifier or calibrating a WrapClassifier. Non-conformity scores for test objects, for which y is not known, can be obtained from the corresponding predicted probabilities (X_prob_test) by:

alphas_test = hinge(X_prob_test)

The above results in that alphas_test is assigned an array of the same shape as X_prob_test with non-conformity scores for each class in the columns for each test object.

crepes.extras.margin(X_prob, classes=None, y=None)[source]#

Computes non-conformity scores for conformal classifiers.

Parameters:

X_prob (array-like of shape (n_samples, n_classes)) – predicted class probabilities
classes (array-like of shape (n_classes,), default=None) – class names
y (array-like of shape (n_samples,), default=None) – correct target values

Returns:

scores – non-conformity scores. The shape is (n_samples, n_classes) if classes and y are None.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_classes)

Examples

from crepes.extras import margin

alphas = margin(X_prob, classes, y)

The above results in that alphas is assigned a vector of the same length as X_prob with a non-conformity score for each object, here defined as the highest predicted probability for a non-correct class label minus the predicted probability for the correct class label. These scores can be used when fitting a ConformalClassifier or calibrating a WrapClassifier. Non-conformity scores for test objects, for which y is not known, can be obtained from the corresponding predicted probabilities (X_prob_test) by:

alphas_test = margin(X_prob_test)

The above results in that alphas_test is assigned an array of the same shape as X_prob_test with non-conformity scores for each class in the columns for each test object.

crepes.extras.binning(values, bins=10, min_size=None, epsilon=1e-09, seed=None)[source]#

Provides bins for a set of values.

Parameters:

values (array-like of shape (n_samples,)) – set of values
bins (int or array-like of shape (n_bins,), default=10) – number of bins to use for equal-sized binning or threshold values to use for binning, used only if min_size=None
min_size (int, default=None) – equal-sized binning with the largest number of bins such that the minimum number of values in each bin is at least min_size
epsilon (float, default=1e-9) – number to multiply with random uniformaly sampled values from (0,1)
seed (int, default=None) – set random seed

Returns:

assigned_bins (array-like of shape (n_samples,)) – bins to which values have been assigned
boundaries (array-like of shape (bins+1,)) – threshold values for the bins; the first is always -np.inf and the last is np.inf. Returned only if bins is an int.

Examples

Assuming that sigmas is a vector with difficulty estimates, then Mondrian categories (bins) can be formed by finding thresholds for 20 equal-sized bins by:

from crepes.extras import binning

bins, bin_thresholds = binning(sigmas, bins=20)

The above results in that bins is assigned a vector of the same length as sigmas with label names (integers from 0 to 19), while bin_thresholds define the boundaries for the bins. The latter can be used to assign bin labels to another vector, e.g., sigmas_test, by providing the thresholds as input to binning():

test_bins  = binning(sigmas_test, bins=bin_thresholds)

Here the output is just a vector test_bins with label names of the same length as sigmas_test.

Note

Any specified value for min_size will take precedence over any value for bins, including the default.

Note

A vector of uniformly sampled random values is multiplied with epsilon and added to the values before forming bins for the purpose of tie-breaking. To override this, set epsilon to 0.

Note

When forming bins, a warning will be issued if the provided number of bins is larger than the number of values (or the number of unique values if epsilon=0).

class crepes.extras.DifficultyEstimator[source]#

A difficulty estimator outputs scores for objects to be used by normalized conformal regressors and predictive systems.

Methods

`apply`([X])	Apply difficulty estimator.
`fit`([X, f, y, residuals, learner, k, ...])	Fit difficulty estimator.

fit(X=None, f=None, y=None, residuals=None, learner=None, k=25, scaler=False, beta=0.01, oob=False)[source]#

Fit difficulty estimator.

Parameters:

X (array-like of shape (n_samples, n_features), default=None) – set of objects
f (function which given an array-like of shape (n_samples, n_features)) – should return a vector of shape (n_samples,) of type int or float, default=None function used to compute difficulty estimates
y (array-like of shape (n_samples,), default=None) – target values
residuals (array-like of shape (n_samples,), default=None) – true target values - predicted values
learner (an object with attribute learner.estimators_, default=None) – an ensemble model where each model m in learner.estimators_ has a method m.predict (used only if f=None)
k (int, default=25) – number of neighbors (used only if f=None and learner=None)
scaler (bool, default=True) – use min-max-scaler on the difficulty estimates
beta (int or float, default=0.01) – value to add to the difficulty estimates (after scaling)
oob (bool, default=False) – use out-of-bag estimation

Returns:

self – Fitted DifficultyEstimator.

Return type:

object

Examples

Assuming that X_prop_train is a proper training set, then a difficulty estimator using the distances to the k nearest neighbors can be formed in the following way (here using the default k=25):

from crepes.extras import DifficultyEstimator

de_knn_dist = DifficultyEstimator()
de_knn_dist.fit(X_prop_train)

Assuming that y_prop_train is a vector with target values for the proper training set, then a difficulty estimator using standard deviation of the targets of the k nearest neighbors is formed by:

de_knn_std = DifficultyEstimator()
de_knn_std.fit(X_prop_train, y=y_prop_train)

Assuming that X_prop_res is a vector with residuals for the proper training set, then a difficulty estimator using the mean of the absolute residuals of the k nearest neighbors is formed by:

de_knn_res = DifficultyEstimator()
de_knn_res.fit(X_prop_train, residuals=X_prop_res)

Assuming that learner_prop is a trained model for which learner.estimators_ is a collection of base models, each implementing the predict method; this holds e.g., for RandomForestRegressor, a difficulty estimator using the variance of the predictions of the constituent models is formed by:

de_var = DifficultyEstimator()
de_var.fit(learner=learner_prop)

The difficulty estimates may be normalized (using min-max scaling) by setting scaler=True. It should be noted that this comes with a computational cost; for estimators based on the k-nearest neighbor, a leave-one-out protocol is employed to find the minimum and maximum distances that are used by the scaler. This also requires that a set of objects is provided for the variance-based approach (to allow for finding the minimum and maximum values). Hence, if normalization is to be employed for the latter, objects have to be included:

de_var = DifficultyEstimator()
de_var.fit(X_proper_train, learner=learner_prop, scaler=True)

Difficulty estimates may also be computed by an externally defined function. Assuming that diff_model is a fitted regression model, for which the predict method gives estimates of the absolute error for the objects in X_proper_train, then normalized difficulty estimates can be obtained from the following difficulty estimator:

de_mod = DifficultyEstimator()
de_mod.fit(X_proper_train, f=diff_model.predict, scaler=True)

The DifficultyEstimator can also support the construction of conformal regressors and predictive systems that employ out-of-bag calibration. For the k-nearest neighbor approaches, the difficulty of each object in the provided training set will be computed using a leave-one-out procedure, while for the variance-based approach the out-of-bag predictions will be employed. This is enabled by setting oob=True when calling the fit() method, which also requires the (full) training set (X_train), and for the variance-based approach a corresponding trained model (learner_full) to be provided:

de_var_oob = DifficultyEstimator()
de_var_oob.fit(X_train, learner=learner_full, scaler=True, oob=True)

A small value (beta) is added to the difficulty estimates. The default is beta=0.01. In order to make the beta value have the same effect across different estimators, you may consider normalizing the difficulty estimates (using min-max scaling) by setting scaler=True. Note that beta is added after the normalization, which means that the range of scores after normalization will be [0+beta, 1+beta]. Below, we use beta=0.001 together with 10 neighbors (k=10):

de_knn_mod = DifficultyEstimator()
de_knn_mod.fit(X_prop_train, k=10, beta=0.001, scaler=True)

Note

The use of out-of-bag calibration, as enabled by oob=True, does not come with the theoretical validity guarantees of the regular (inductive) conformal regressors and predictive systems, due to that calibration and test instances are not handled in exactly the same way.

apply(X=None)[source]#

Apply difficulty estimator.

Parameters:: X (array-like of shape (n_samples, n_features), default=None) – set of objects
Returns:: sigmas – difficulty estimates
Return type:: array-like of shape (n_samples,)

Examples

Assuming de to be a fitted DifficultyEstimator, i.e., for which fit() has earlier been called, difficulty estimates for a set of objects X is obtained by:

difficulty_estimates = de.apply(X)

If de_oob is a DifficultyEstimator that has been fitted with the option oob=True and a training set, then a call to apply() without any objects will return the estimates for the training set:

oob_difficulty_estimates = de_oob.apply()

For a difficulty estimator employing any of the k-nearest neighbor approaches, the above will return an estimate for the difficulty of each object in the training set computed using a leave-one-out procedure, while for the variance-based approach the out-of-bag predictions will instead be used.

class crepes.extras.MondrianCategorizer[source]#

A MondrianCategorizer outputs categories for objects to be used by Mondrian conformal classifiers, regressors and predictive systems.

Methods

`apply`(X)	Apply Mondrian categorizer.
`fit`([X, f, de, learner, no_bins, oob])	Fit Mondrian categorizer.

fit(X=None, f=None, de=None, learner=None, no_bins=10, oob=False)[source]#

Fit Mondrian categorizer.

Parameters:

X (array-like of shape (n_samples, n_features), default=None) – set of objects
f (function which given an array-like of shape (n_samples, n_features)) – should return a vector of shape (n_samples,) of type int or float, default=None function used to compute Mondrian categories
de (a DifficultyEstimator, default=None) – a fitted difficulty estimator (used only if f is not None)
learner (an object with the method learner.predict, default=None) – a fitted regression model (used only if de and f are not None)
no_bins (int, default=10) – no. of Mondrian categories
oob (bool, default=False) – use out-of-bag estimation (not used if f is not None)

Returns:

self – Fitted MondrianCategorizer.

Return type:

object

Examples

Assuming that X_train is an array of shape (n_samples, n_features) and get_values is a function that given X_train returns a vector of values of shape (n_samples,), then a Mondrian categorizer can be formed in the following way, where the boundaries for the Mondrian categories are found by partitioning the values in the vector into five equal-sized bins:

from crepes.extras import MondrianCategorizer

mc = MondrianCategorizer()
mc.fit(X, f=get_values, no_bins=5)

apply(X)[source]#

Apply Mondrian categorizer.

Parameters:: X (array-like of shape (n_samples, n_features)) – set of objects
Returns:: bins – Mondrian categories
Return type:: array-like of shape (n_samples,)

Examples

Assuming mc to be a fitted MondrianCategorizer, i.e., for which fit() has earlier been called, Mondrian categories for a set of objects X is obtained by:

categories = mc.apply(X)

Note

The array used when calling fit() must have the same number of columns (n_features) as the array used as input to apply().