.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/tabular/plot_explaining_tabpfn.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_tabular_plot_explaining_tabpfn.py: Explaining TabPFN ================== `TabPFN `_ is a foundation model for tabular data that uses **in-context learning** -- fitting is just storing the training data, and inference contextualises new inputs against that context. ``shapiq`` provides a dedicated :class:`~shapiq.TabPFNExplainer` that exploits this property with a **remove-and-recontextualize** strategy: instead of imputing missing features, it simply drops feature columns from the training *and* test data and re-fits the model. This is both faithful to the model and inexpensive, because TabPFN's "retraining" is just an in-context forward pass. .. GENERATED FROM PYTHON SOURCE LINES 16-30 .. code-block:: Python from __future__ import annotations import os # Prevent OpenMP/MKL thread conflicts with TabPFN's PyTorch backend os.environ.setdefault("OMP_NUM_THREADS", "1") os.environ.setdefault("MKL_NUM_THREADS", "1") import numpy as np from sklearn.model_selection import train_test_split import shapiq .. GENERATED FROM PYTHON SOURCE LINES 31-35 Prepare a Small Dataset ----------------------- We use the California housing dataset with a tiny split so that TabPFN runs quickly on CPU. .. GENERATED FROM PYTHON SOURCE LINES 35-48 .. code-block:: Python x_data, y_data = shapiq.datasets.load_california_housing() feature_names = list(x_data.columns) x_train, x_test, y_train, y_test = train_test_split( x_data.values, y_data.values, train_size=30, test_size=50, random_state=42, ) print(f"Train: {x_train.shape}, Test: {x_test.shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none Train: (30, 8), Test: (50, 8) .. GENERATED FROM PYTHON SOURCE LINES 49-54 Fit TabPFN ---------- We use ``TabPFNRegressor`` with ``n_estimators=1`` and ``fit_mode="low_memory"`` to minimise runtime. Fitting is instant -- TabPFN just stores the training context. .. GENERATED FROM PYTHON SOURCE LINES 54-67 .. code-block:: Python import tabpfn model = tabpfn.TabPFNRegressor( model_path="tabpfn-v2-regressor.ckpt", n_estimators=1, fit_mode="low_memory", ) model.fit(x_train, y_train) avg_pred = float(np.mean(model.predict(x_test))) print(f"Average prediction: {avg_pred:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Average prediction: 2.086 .. GENERATED FROM PYTHON SOURCE LINES 68-74 Auto-Detection of TabPFNExplainer ----------------------------------- When you pass a TabPFN model to :class:`~shapiq.Explainer`, ``shapiq`` automatically selects :class:`~shapiq.TabPFNExplainer` and sets up a :class:`~shapiq.TabPFNImputer` under the hood. No special configuration is needed -- just pass the model, training data, and training labels. .. GENERATED FROM PYTHON SOURCE LINES 74-89 .. code-block:: Python x_explain = x_test[0] pred = model.predict(x_explain.reshape(1, -1))[0] print(f"Prediction for instance: {pred:.3f}, Average: {avg_pred:.3f}") explainer = shapiq.Explainer( model=model, data=x_train, labels=y_train, index="SV", max_order=1, empty_prediction=avg_pred, ) print(f"Auto-selected explainer: {type(explainer).__name__}") .. rst-class:: sphx-glr-script-out .. code-block:: none Prediction for instance: 0.859, Average: 2.086 Auto-selected explainer: TabPFNExplainer .. GENERATED FROM PYTHON SOURCE LINES 90-107 How Remove-and-Recontextualize Works -------------------------------------- Traditional model-agnostic explanation *imputes* absent features with background samples (marginal or conditional imputation). This can create out-of-distribution inputs that mislead the model. The :class:`~shapiq.TabPFNImputer` takes a different approach: 1. For each coalition :math:`S \subseteq \{1, \dots, d\}` of features: 2. **Remove** the columns *not* in :math:`S` from both training and test data. 3. **Re-fit** the TabPFN model on the reduced training data (instant, since it is just an in-context forward pass). 4. **Predict** on the reduced test point. This faithfully reflects what the model "knows" when only features in :math:`S` are available, without any distributional assumptions. .. GENERATED FROM PYTHON SOURCE LINES 109-111 Compute Shapley Values ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 111-117 .. code-block:: Python sv = explainer.explain(x_explain, budget=50) print(sv) sv.plot_force(feature_names=feature_names) .. image-sg:: /auto_examples/tabular/images/sphx_glr_plot_explaining_tabpfn_001.png :alt: plot explaining tabpfn :srcset: /auto_examples/tabular/images/sphx_glr_plot_explaining_tabpfn_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none InteractionValues( index=SV, max_order=1, min_order=0, estimated=True, estimation_budget=50, n_players=8, baseline_value=2.0855908393859863, Top 10 interactions: (): 2.0855908393859863 (2,): -0.0074234513730656645 (7,): -0.017530992918561524 (3,): -0.031643556722903324 (4,): -0.07693003662491281 (6,): -0.0900684762483492 (1,): -0.22901624103480134 (0,): -0.2550066261889832 (5,): -0.5189941431956514 ) .. GENERATED FROM PYTHON SOURCE LINES 118-122 Second-Order Interactions (FSII) --------------------------------- We can also compute Faithful Shapley Interaction Index values to see which pairs of features interact. .. GENERATED FROM PYTHON SOURCE LINES 122-136 .. code-block:: Python explainer_fsii = shapiq.Explainer( model=model, data=x_train, labels=y_train, index="FSII", max_order=2, empty_prediction=avg_pred, ) fsii = explainer_fsii.explain(x_explain, budget=50) print(fsii) fsii.plot_force(feature_names=feature_names) .. image-sg:: /auto_examples/tabular/images/sphx_glr_plot_explaining_tabpfn_002.png :alt: plot explaining tabpfn :srcset: /auto_examples/tabular/images/sphx_glr_plot_explaining_tabpfn_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none InteractionValues( index=FSII, max_order=2, min_order=0, estimated=True, estimation_budget=50, n_players=8, baseline_value=2.0855908393859863, Top 10 interactions: (): 2.0855908393859863 (0, 5): 0.3265402343124412 (1, 6): 0.22656658090307272 (1, 2): -0.16400393967669868 (0, 7): -0.17672435717262952 (6,): -0.22271323208873753 (0, 1): -0.2328441622485711 (7,): -0.23996245630938012 (0,): -0.3723259258469466 (5,): -0.6902544586112724 ) .. GENERATED FROM PYTHON SOURCE LINES 137-143 References ---------- This example uses TabPFN :footcite:t:`Hollmann.2025` with the remove-and-recontextualize strategy from :footcite:t:`Rundel.2024`. .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 17.660 seconds) .. _sphx_glr_download_auto_examples_tabular_plot_explaining_tabpfn.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_explaining_tabpfn.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_explaining_tabpfn.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_explaining_tabpfn.zip `