shapiq.imputer.MarginalImputer¶

class shapiq.imputer.MarginalImputer(model, data, *, x=None, sample_size=100, categorical_features=None, joint_marginal_distribution=True, normalize=True, random_state=None)[source]¶

Bases: Imputer

The marginal imputer for the shapiq package.

The marginal imputer replaces missing features of the explanation point x by values sampled from the background data. When joint_marginal_distribution=True, rows are sampled jointly (i.e., from the empirical joint marginal); when False, each feature column is independently shuffled to break dependencies (feature-wise marginals).

This corresponds to interventional imputation (often called marginal fANOVA in the literature), as opposed to observational imputers that condition on observed features.

Examples

>>> model = lambda x: np.sum(x, axis=1)  # some dummy model
>>> data = np.random.rand(1000, 4)  # some background data
>>> x_to_impute = np.array([[1, 1, 1, 1]])  # some data point to impute
>>> imputer = MarginalImputer(model=model, data=data, x=x_to_impute, sample_size=100, random_state=42)
>>> # get the model prediction with missing values
>>> imputer(np.array([[True, False, True, False]]))
np.array([2.01])  # some model prediction (might be different)
>>> # exchange the background data
>>> new_data = np.random.rand(1000, 4)
>>> imputer.init_background(data=new_data)

See also

shapiq.imputer.ConditionalImputer for the conditional imputer.
shapiq.imputer.BaselineImputer for the baseline imputer.
shapiq.imputer.base.Imputer for the base imputer class.

Initializes the marginal imputer.

Parameters:

model (Any) – The model to explain as a callable function expecting a data points as input and returning the model’s predictions.
data (ndarray) – The background data to use for the explainer as a two-dimensional array with shape (n_samples, n_features).
x (ndarray | None) – The explanation point to use the imputer on either as a 2-dimensional array with shape (1, n_features) or as a vector with shape (n_features,). If None, the imputer must be fitted before it can be used.
sample_size (int) – The number of samples to draw from the background data. Increasing this value will linearly increase the runtime of the explainer.
categorical_features (list[int] | None) – A list of indices of the categorical features. If None, all features are treated as continuous.
joint_marginal_distribution (bool) – A flag to sample the replacement values from the joint marginal distribution. If False, the replacement values are sampled independently for each feature. If True, the replacement values are sampled from the joint marginal distribution.
normalize (bool) – A flag to normalize the game values. If True, then the game values are normalized and centered to be zero for the empty set of features.
random_state (int | None) – The random state to use for sampling. If None, the random state is not fixed.

calc_empty_prediction()[source]¶

Runs the model on empty data points (all features missing) to get the empty prediction.

Return type:: float
Returns:: The empty prediction of the model provided only missing features.

init_background(data)[source]¶

Initializes the imputer to a background data set.

The background data is used to sample replacement values for the missing features. To change the background data, use this method.

Parameters:: data (ndarray) – The background data to use for the imputer. The shape of the array must be (n_samples, n_features).
Return type:: MarginalImputer
Returns:: The initialized imputer.

Examples

>>> model = lambda x: np.sum(x, axis=1)
>>> data = np.random.rand(10, 3)
>>> imputer = MarginalImputer(model=model, data=data, x=data[0])
>>> new_data = np.random.rand(10, 3)
>>> imputer.init_background(data=new_data)

Raises:: UserWarning – If the sample size is larger than the number of data points in the background data. In this case, the sample size is reduced to the number of data points in the background data.
Parameters:: data (ndarray)
Return type:: MarginalImputer

value_function(coalitions)[source]¶

Imputes the missing values of a data point and calls the model.

Parameters:

coalitions (ndarray[tuple[Any, ...], dtype[bool]]) – A boolean array indicating which features are present (True) and which are missing (False). The shape of the array must be (n_subsets, n_features).

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

The model’s predictions on the imputed data points. The shape of the array is: (n_subsets, n_outputs).

joint_marginal_distribution: bool¶: A flag indicating whether to sample from the joint marginal distribution (True) or independently for each feature (False).