Towards a Neurosymbolic Understanding of Hidden Neuron Activations

Tracking #: 800-1791

Flag : Review Assignment Stage

Authors: 

Abhilekha Dalal
Rushrukh Rayan
Adrita Barua
Samatha Ereshi Akkamahadevi
Avishek Das
Cara Widmer
Eugene Y. Vasserman
Md Kamruzzaman Sarker
Pascal Hitzler

Responsible editor: 

Guest Editors NeSy 2024

Submission Type: 

Article in Special Issue (note in cover letter)

Abstract: 

With the widespread adoption of Deep Learning techniques, the need for explainability and trustworthiness is increasingly critical, especially in safety-sensitive applications and for improved debugging, given the black-box nature of these models. The Explainable AI (XAI) literature offers various helpful techniques; however, many approaches use a secondary deep learning-based model to explain the primary model’s decisions or require domain expertise to interpret the explanations. A relatively new approach involves explaining models using high-level, human-understandable concepts. While these methods have proven effective, an intriguing area of exploration lies in using a white-box technique to explain the probing model. We present a novel, model-agnostic, post-hoc Explainable AI method that provides meaningful interpretations for hidden neuron activations. Our approach leverages a Wikipedia-derived concept hierarchy, encompassing approximately 2 million classes as background knowledge, and uses deductive reasoning-based Concept Induction to generate explanations. Our method demonstrates competitive performance across various evaluation metrics, including statistical evaluation, concept activation analysis, and benchmarking against contemporary methods. Additionally, a specialized study with Large Language Models (LLMs) highlights how LLMs can serve as explainers in a manner similar to our method, showing comparable performance with some trade-offs. Furthermore, we have developed a tool called ConceptLens, enabling users to test custom images and obtain explanations for model decisions. Finally, we introduce an entirely reproducible, end-to-end system that simplifies the process of replicating our system and results.

Full PDF Version: 

Tags: 

  • Under Review