shapiq.explainer.tree#

This module contains the tree explainer implementation.

class shapiq.explainer.tree.TreeExplainer(model, max_order=2, min_order=1, interaction_type='k-SII', class_label=None, output_type='raw', **kwargs)[source]#

Bases: Explainer

explain(x)[source]#
Return type:

InteractionValues

class shapiq.explainer.tree.TreeModel(children_left, children_right, features, thresholds, values, node_sample_weight, empty_prediction=None, leaf_mask=None, n_features_in_tree=None, max_feature_id=None, feature_ids=None, root_node_id=None, n_nodes=None, nodes=None, feature_map_original_internal=None, feature_map_internal_original=None, original_output_type='raw')[source]#

Bases: object

A dataclass for storing the information of a tree model.

The dataclass stores the information of a tree model in a way that is easy to access and manipulate. The dataclass is used to convert tree models from different libraries to a common format.

children_left#

The left children of each node in a tree. Leaf nodes are -1.

children_right#

The right children of each node in a tree. Leaf nodes are -1.

features#

The feature indices of the decision nodes in a tree. Leaf nodes are assumed to be -2 but no check is performed.

thresholds#

The thresholds of the decision nodes in a tree. Leaf nodes are set to NaN.

values#

The values of the leaf nodes in a tree.

node_sample_weight#

The sample weights of the nodes in a tree.

empty_prediction#

The empty prediction of the tree model. The default value is None. Then the empty prediction is computed from the leaf values and the sample weights.

leaf_mask#

The boolean mask of the leaf nodes in a tree. The default value is None. Then the leaf mask is computed from the children left and right arrays.

n_features_in_tree#

The number of features in the tree model. The default value is None. Then the number of features in the tree model is computed from the unique feature indices in the features array.

max_feature_id#

The maximum feature index in the tree model. The default value is None. Then the maximum feature index in the tree model is computed from the features array.

feature_ids#

The feature indices of the decision nodes in the tree model. The default value is None. Then the feature indices of the decision nodes in the tree model are computed from the unique feature indices in the features array.

root_node_id#

The root node id of the tree model. The default value is None. Then the root node id of the tree model is set to 0.

n_nodes#

The number of nodes in the tree model. The default value is None. Then the number of nodes in the tree model is computed from the children left array.

nodes#

The node ids of the tree model. The default value is None. Then the node ids of the tree model are computed from the number of nodes in the tree model.

feature_map_original_internal#

A mapping of feature indices from the original feature indices (as in the model) to the internal feature indices (as in the tree model).

feature_map_internal_original#

A mapping of feature indices from the internal feature indices (as in the tree model) to the original feature indices (as in the model).

original_output_type#

The original output type of the tree model. The default value is “raw”.

compute_empty_prediction()[source]#

Compute the empty prediction of the tree model.

The method computes the empty prediction of the tree model by taking the weighted average of the leaf node values. The method modifies the tree model in place.

Return type:

None

reduce_feature_complexity()[source]#

Reduces the feature complexity of the tree model.

The method reduces the feature complexity of the tree model by removing unused features and reindexing the feature indices of the decision nodes in the tree. The method modifies the tree model in place. To see the original feature mappings, use the feature_mapping_old_new and feature_mapping_new_old attributes.

For example, consider a tree model with the following feature indices: :rtype: None

[0, 1, 8]

The method will remove the unused feature indices and reindex the feature indices of the decision nodes in the tree to the following:

[0, 1, 2]

Feature ‘8’ is ‘renamed’ to ‘2’ such that in the internal representation a one-hot vector (and matrices) of length 3 suffices to represent the feature indices.

children_left: ndarray[int]#
children_right: ndarray[int]#
empty_prediction: Optional[float] = None#
feature_ids: Optional[set] = None#
feature_map_internal_original: Optional[dict[int, int]] = None#
feature_map_original_internal: Optional[dict[int, int]] = None#
features: ndarray[int]#
leaf_mask: Optional[ndarray[bool]] = None#
max_feature_id: Optional[int] = None#
n_features_in_tree: Optional[int] = None#
n_nodes: Optional[int] = None#
node_sample_weight: ndarray[float]#
nodes: Optional[ndarray[int]] = None#
original_output_type: str = 'raw'#
root_node_id: Optional[int] = None#
thresholds: ndarray[float]#
values: ndarray[float]#
class shapiq.explainer.tree.TreeSHAPIQ(model, max_order=2, min_order=1, interaction_type='k-SII', verbose=False)[source]#

Bases: object

The explainer for tree-based models using the TreeSHAP-IQ algorithm. For a detailed presentation of the algorithm, see the original paper: https://arxiv.org/abs/2401.12069.

TreeSHAP-IQ is an algorithm for computing Shapley Interaction values for tree-based models. It is heavily based on the Linear TreeSHAP algorithm (outlined in https://proceedings.neurips.cc/paper_files/paper/2022/hash/a5a3b1ef79520b7cd122d888673a3ebc-Abstract-Conference.html) but extended to compute Shapley Interaction values up to a given order. TreeSHAP-IQ needs to visit each node only once and makes use of polynomial arithmetic to compute the Shapley Interaction values efficiently.

Parameters:
  • model (Union[dict, TreeModel, Any]) – A single tree-based model to explain. Note unlike the TreeExplainer class, TreeSHAP-IQ only supports a single tree model. The tree model can be a dictionary representation of the tree, a TreeModel object, or any other tree model supported by the shapiq.explainer.tree.validation.validate_tree_model function.

  • max_order (int) – The maximum interaction order to be computed. An interaction order of 1 corresponds to the Shapley value. Any value higher than 1 computes the Shapley interaction values up to that order. Defaults to 2.

  • min_order (int) – The minimum interaction order to be computed. Defaults to 1.

  • interaction_type (str) – The type of interaction to be computed. The interaction type can be “k-SII” (default), “SII”, “STI”, “FSI”, or “BZF”. All indices apart from “BZF” will reduce to the “SV” (Shapley value) for order 1.

  • verbose (bool) – Whether to print information about the tree during initialization. Defaults to False.

Note

This class is not intended to be used directly. Instead, use the TreeExplainer class to explain tree-based models which internally uses then the TreeSHAP-IQ algorithm.

explain(x)[source]#
Computes the Shapley Interaction values for a given instance x and interaction order.

This function is the main explanation function of this class.

Parameters:

x (np.ndarray) – Instance to be explained.

Returns:

The computed Shapley Interaction values.

Return type:

InteractionValues

Modules

shapiq.explainer.tree.base

This module contains the base class for tree model conversion.

shapiq.explainer.tree.conversion

shapiq.explainer.tree.explainer

This module contains the TreeExplainer class making use of the TreeSHAPIQ algorithm for computing any-order Shapley Interactions for tree ensembles.

shapiq.explainer.tree.treeshapiq

This module contains the tree explainer implementation.

shapiq.explainer.tree.utils

This module contains utility functions for dealing with trees or tree structures.

shapiq.explainer.tree.validation

This module contains conversion functions for the tree explainer implementation.