Submission Type:
Article in Special Issue (note in cover letter)
Abstract:
With the widespread adoption of Deep Learning techniques, the need for explainability and trustworthiness is increasingly
critical, especially in safety-sensitive applications and for improved debugging, given the black-box nature of these
models. The Explainable AI (XAI) literature offers various helpful techniques; however, many approaches use a secondary deep
learning-based model to explain the primary model’s decisions or require domain expertise to interpret the explanations. A relatively
new approach involves explaining models using high-level, human-understandable concepts. While these methods have
proven effective, an intriguing area of exploration lies in using a white-box technique to explain the probing model.
We present a novel, model-agnostic, post-hoc Explainable AI method that provides meaningful interpretations for hidden neuron
activations. Our approach leverages a Wikipedia-derived concept hierarchy, encompassing approximately 2 million classes
as background knowledge, and uses deductive reasoning-based Concept Induction to generate explanations. Our method demonstrates
competitive performance across various evaluation metrics, including statistical evaluation, concept activation analysis,
and benchmarking against contemporary methods. Additionally, a specialized study with Large Language Models (LLMs) highlights
how LLMs can serve as explainers in a manner similar to our method, showing comparable performance with some
trade-offs. Furthermore, we have developed a tool called ConceptLens, enabling users to test custom images and obtain explanations
for model decisions. Finally, we introduce an entirely reproducible, end-to-end system that simplifies the process of
replicating our system and results.