From LIME explanations to justifications (part 2/3)

This is the second part in a series of blog posts about my thesis project at BrainCreators.

In the introductory part, I covered some basic concepts regarding AI explainability, the LIME algorithm, gave a short introduction to the python LIME library, and discussed some modifications of it I used during the project.

In this part, I will show how we can move from producing an “explanation” to producing a “justification” using LIME, and discuss some possible uses this might have.

Finally (and hopefully …) after completing my thesis project, I will write a third part to show how this concept can be incorporated in practice for a better and more beneficial human-AI dialogue.

Introduction: The “justified true belief” analysis of knowledge, and it’s application to AI

One of the most influential epistemological frameworks for analyzing what knowledge is, is to describe knowledge as a “Justified True Belief” (see, for example, this entry in the Stanford Encyclopedia of Philosophy). This framework, in a nutshell, holds that to KNOW something is true requires that (a) that something will actually be true (b) the “knower” believes that it is true (c) the “knower” has good holding reasons to hold this belief (and doesn’t believe it out of “pure luck”).

Without going too deep into the philosophical aspects of the difference between an “explanation” and a “justification”, for the purpose of this post I would stress that an “explanation” focuses on what is where a justification focuses on what ought to be. In this sense, when we looked at the LIME output as an explanation we took for the current classifier and its explanation as givens (and only asked ourself “should we trust this classifier?”, as the LIME article suggests).

Where, on the contrary, looking at the LIME output as a justification instead enables us to reason about it as a way of affecting the classifier.

These casual diagrams illustrate the different place an “explanation” (left) and a “justification” (right) has in a classification process:

      

While the focus of the LIME is the user’s trust of the model’s classification (indicated in a red circle in the first casual model), and accordingly it uses the user’s agreement with the model’s explanation as a surrogate for this measure of trust; the focus of the later is on using this level of agreement to actually influence the model, and to generate a better one (indicated in a red arrow in the second casual model). Put simply: if an explanation produced by an artificial classifier can make us trust (or distrust) its classification – why not use our level of trust for an explanation in the interaction between the annotator and the classifier in order to get a more trustworthy model?

To illustrate this point, let’s go back to the LIME explanations of a classifier. Two different classifiers correctly classified the following image as containing a tennis ball:

When looking at the most influencing segments of the image to produce this classification (using LIME), the classifiers produced the following explanations:

While both classifiers classify the image correctly, it is easy to see that the second classifier is suffering from a “data leakage” and uses the presence of a dog face in the image to classify it, due to the nature of images it was trained on (while we can easily imagine a similar image with different types of balls).

Going back to the original motivation behind AI explainability – such an explanation would probably discourage a human from “trusting” the second classifier, and encourage giving more trust in the first classifier.

Looking at it as a classifiers justification, one would like to give feedback not only on the produced classification (“Yes, this image indeed contains a tennis ball”) but also on the produced justification (“No, this isn’t a good justification for classifying the image as containing a tennis ball”).

Using LIME to produce a justification (theory and practice)

When looking at the output as an explanation we seem to be binding ourselves to a qualitative evaluation of it from the human perspective to which the explanation is given (which, in turn, results in fine tuning hyper-parameters such as the number of image segments to include in an explanation, as was discussed in the previous post).

However, looking at the output as a justification, we can now think of several ways of producing it, given an image (I), which was classified to a class (C) by any classifier.

A sufficient justification – This would correspond to any part of the image (I) which is enough, by itself, to be classified as (C) by the classifier. There of course could be more than one sufficient justifications in (I) for that end. Producing such a justification is possible by iterating over the top segments produced as an explanation by LIME, and passing them to the original classifier until reaching an area which is enough to be classified as (C) by it.

A sufficient & necessary justificationThis would correspond to any part of the image (I) without which the image can not be classified as (C) by the classifier. Producing such a justification is possible by iterating over the top segments produced as an explanation by LIME, blurring them out, and passing the blurred variant to the original classifier until reaching a variant which could no longer be classified as (C) by it.

To give a concrete example, given this image, which was classified correctly to be of class “ping-pong ball”:

A ResNet classifier was able to identify the following part as a “sufficient justification” for such a classification (enough to classify the image as “ping-pong ball”):

And this as a sufficient & necessary justification (impossible to classify the image as “ping-pong ball” without it):

Some final remarks regarding the production of justification based on LIME

As this method of producing a justification is built on LIME, it is Model-Agnostic, and can be used on any classifier regardless of it’s architecture (it is Model-Agnostic). In principle it could also be used on textual and tabular data.

One advantage of using this within a human-AI classifier dialogue is that this approach automatically adjusts the number of image segments produced in the justification without the need to manually fine-tune it (while enabling to play with the thresholds of “confidence” in justification as an influencing hyper-parameter).

A second advantage is that this setting also allows us to use a justification not only for qualitative evaluation, but to also test it automatically using a second classifier – be it human or a bot (e.g. assessing whether a justification produced by one classifier is indeed sufficient, whether it is indeed necessary) and as an evaluation measure on it’s own (i.e. comparing two models not only based on their accuracy, but also based on their justifications given a “ground truth” justification set).

Lastly, note that producing a justification is not dependent in essence on image segmentation at all (where the attempt to produce a “small justification” or the “most convincing” justification might be). Thus, if flexible on these attributes we can even generate a list of “justifications” by a classifier by passing it random crops of the original image and caching those which are sufficient for it to classify the image the way it does.

This concludes the second post, which focused on using LIME to produce justifications.

In the next post, I will hope to discuss about the usage of these “justifications” in practice during my thesis project; and how they can be used to improve a classifier’s accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *