Browsing by Subject "Explanation evaluation"

Now showing items 1-1 of 1

Interpreting "Black Box" Classifiers to Evaluate Explanations of Explanation Methods

Murtaza, Adnan (2020)

Interpretability in machine learning aims to provide explanations on the behaviors of complex predictive models, widely refer as black-boxes. Generally, interpretability means the understanding of how the models work internally, whereas, explanations are the one way to make machine learning models interpretable, e.g., using transparent and simple models. Numerous approaches have been proposed as explanation methods which strive to interpret black-box models. These explanation methods mainly try to approximate the local behavior of a model, and then explain it in a human-understandable way. The primary reason to explain the local-behavior is that explaining the global behavior of a black-box is difficult, and it remains an unsolved challenge. Moreover, there is another challenge which argues on the quality and stability of the generated explanations. One way to evaluate the quality of explanations is by using robustness as a property. In this work, we define the explanation evaluation framework, which attempts to measure the robustness of explanations. The framework consists of two distance-based measures stability and separability. We explore and use stability measure from existing literature and introduce our new separability measure, which goes along with stability measure in order to quantify the robustness of explanations. We examine model-agnostic (LIME, SHAP) and model-dependent (DeepExplain) explanation methods to interpret the predictions for various supervised predictive models, especially classifiers. We build classifiers by using UCI classification benchmark datasets and MNIST handwritten digits dataset. Our results illustrate that current model-agnostic and model-dependent explanation methods do not perform adequately with respect to our explanation evaluation framework. Our results show that these explanation methods are not robust to variations in features values and often produce different explanations for similar values and similar explanations for different values, which leads to unstable explanations. Our results and outcomes demonstrate that the developed explanation evaluation framework is useful to assess the robustness of explanations and inspire further exploration and work.

Now showing items 1-1 of 1

Browsing by Subject "Explanation evaluation"

Yhteystiedot

HELSINGIN YLIOPISTO