shapiq.imputer.BaselineImputer¶

class shapiq.imputer.BaselineImputer(model, data, x=None, *, categorical_features=None, normalize=True, random_state=None)[source]¶

Bases: Imputer

The baseline imputer for the shapiq package.

The baseline imputer is used to impute the missing values of a data point by using predefined values (baseline values). If no baseline values are given, the imputer uses the mean (for numerical features) or the mode (for categorical features) of the background data.

Variables:
  • baseline_values – The baseline values to use for imputation.

  • empty_prediction – The model’s prediction on an empty data point (all features missing).

Parameters:
  • model (TModel)

  • data (ndarray)

  • x (np.ndarray | None)

  • categorical_features (list[int] | None)

  • normalize (bool)

  • random_state (int | None)

Examples

>>> model = lambda x: np.sum(x, axis=1)  # some dummy model
>>> data = np.random.rand(1000, 4)  # some background data
>>> x_to_impute = np.array([[1, 1, 1, 1]])  # some data point to impute
>>> imputer = BaselineImputer(model=model, data=data, x=x_to_impute)
>>> # get the baseline values
>>> imputer.baseline_values
array([[0.5, 0.5, 0.5, 0.5]])  # computed from data
>>> # set new baseline values
>>> baseline_vector = np.array([0, 0, 0, 0])
>>> imputer.init_background(baseline_vector)
>>> imputer.baseline_values
array([[0, 0, 0, 0]])  # given as input
>>> # get the model prediction with missing values
>>> imputer(np.array([[True, False, True, False]]))
np.array([2.])  # model prediciton with the last baseline value

Initializes the baseline imputer.

Parameters:
  • model (Any) – The model to explain as a callable function expecting a data points as input and returning the model’s predictions.

  • data (ndarray) – The background data to use for the explainer as either a vector of baseline values or a two-dimensional array with shape (n_samples, n_features). If data is a matrix, the baseline values are calculated from the data.

  • x (ndarray | None) – The explanation point to use the imputer to.

  • categorical_features (list[int] | None) – A list of indices of the categorical features in the background data. If no categorical features are given, all features are assumed to be numerical or in string format (where np.mean fails) features. Defaults to None.

  • normalize (bool) – A flag to normalize the game values. If True, then the game values are normalized and centered to be zero for the empty set of features. Defaults to True.

  • random_state (int | None) – The random state to use for sampling. Defaults to None.

calc_empty_prediction()[source]¶

Runs the model on empty data points (all features missing) to get the empty prediction.

Return type:

float

Returns:

The empty prediction.

init_background(data)[source]¶

Initializes the imputer to the background data.

Parameters:

data (ndarray) – The background data to use for the imputer. Either a vector of baseline values of shape (n_features,) or a matrix of shape (n_samples, n_features). If the data is a matrix, the baseline values are calculated from the data.

Return type:

BaselineImputer

Returns:

The initialized imputer.

Examples

>>> import numpy as np
>>> from shapiq.games.imputer import BaselineImputer
>>> data = np.array([[1, 2, "a"], [2, 3, "a"], [2, 4, "b"]], dtype=object)
>>> x = np.array([1, 2, 3])
>>> imputer = BaselineImputer(model=lambda x: np.sum(x, axis=1), data=data, x=x)
>>> imputer.baseline_values
array([[1.66, 3, 'a']], dtype=object)  # computed from data
>>> baseline_vector = np.array([0, 0, 0])
>>> imputer.init_background(baseline_vector)
>>> imputer.baseline_values
array([[0, 0, 0]])  # given as input
value_function(coalitions)[source]¶

Imputes the missing values of a data point and calls the model.

Parameters:

coalitions (ndarray) – A boolean array indicating which features are present (True) and which are missing (False). The shape of the array must be (n_subsets, n_features).

Return type:

ndarray

Returns:

The model’s predictions on the imputed data points. The shape of the array is

(n_subsets, n_outputs).