Towards a Neurosymbolic Understanding of Hidden Neuron Activations

Tracking #: 800-1791

Flag : Review Received

Authors:

Abhilekha Dalal

Rushrukh Rayan

Adrita Barua

Samatha Ereshi Akkamahadevi

Avishek Das

Cara Widmer

Eugene Y. Vasserman

Md Kamruzzaman Sarker

Pascal Hitzler

Responsible editor:

Guest Editors NeSy 2024

Submission Type:

Article in Special Issue (note in cover letter)

Abstract:

With the widespread adoption of Deep Learning techniques, the need for explainability and trustworthiness is increasingly critical, especially in safety-sensitive applications and for improved debugging, given the black-box nature of these models. The Explainable AI (XAI) literature offers various helpful techniques; however, many approaches use a secondary deep learning-based model to explain the primary model’s decisions or require domain expertise to interpret the explanations. A relatively new approach involves explaining models using high-level, human-understandable concepts. While these methods have proven effective, an intriguing area of exploration lies in using a white-box technique to explain the probing model. We present a novel, model-agnostic, post-hoc Explainable AI method that provides meaningful interpretations for hidden neuron activations. Our approach leverages a Wikipedia-derived concept hierarchy, encompassing approximately 2 million classes as background knowledge, and uses deductive reasoning-based Concept Induction to generate explanations. Our method demonstrates competitive performance across various evaluation metrics, including statistical evaluation, concept activation analysis, and benchmarking against contemporary methods. Additionally, a specialized study with Large Language Models (LLMs) highlights how LLMs can serve as explainers in a manner similar to our method, showing comparable performance with some trade-offs. Furthermore, we have developed a tool called ConceptLens, enabling users to test custom images and obtain explanations for model decisions. Finally, we introduce an entirely reproducible, end-to-end system that simplifies the process of replicating our system and results.

Full PDF Version:

nai-paper-800.pdf

Cover Letter:

Submission to the NeSy 2024 special issue. Please note that one of the authors has a conflict of interest due to his position as EiC at the journal. The paper will need to be handled in such a way that anonymity of the reviewers is preserved if they choose that. I.e. * contact reviewers *by email* (outside the system). You can still point to the journal website for the paper link of course. Ask them whether they want to remain anonymous. * If they do *not* want to remain anonymous, then you can invite them through the website interface as reviewers. * If they want to stay anonymous, then they need to send reviews to you by email, and you will have to upload the review for them (technically, under your name, but state at the beginning of the review that this is an anonymous review). Thanks! Pascal.

Approve Decision:

Approved

Revised Version:

Towards a Neurosymbolic Understanding of Hidden Neuron Activations

Tags:

Reviewed

Decision:
Minor Revision

Solicited Reviews:

Review #1 submitted on 10/Jan/2025

By Alessandra Mileo
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

This paper discusses a new approach for interpretability of CNN based on concept induction for disentangled representations. The study also provides a thorough comparison by assessing ECII heuristics vs Clip-Dissect and GPT-4 to validate the quality of the extracted concepts.

State of the art is in places a list of approaches without a clear indication of what is missing in those approaches or the advantage of the proposed framework for understanding neuron activations via concept induction. Some of the approaches listed in Sec 2 are improving on existing comparative baseline such as CBM, but it is not clear why concept induction would be a better way of identifying relevant concepts than SAM or Q-SENN or Label-free CBM. This should be clear from the beginniing while at the moment it is only reported in a scattered manner after the evaluation experiments.
A table illustrating a more quantitative comparative analysis with the desirable features and how they are missing (or not) in recent approaches to concept attribution would clearly help.

Contributions are well defined and discussed, and there is a honest account of how much of previous work has been reconsidered in this paper. It would be good to more clearly indicate what is the exact addition of this paper to all previous conference contributions, as opposed to just saying it provides a joint perspective, discussion and a demonstrator that was only pre-print: what experiments are new/not previously reported? What exact components are used to bridge these separate conference contributions in this paper? The more concrete the better.

Overall this line of research is an interesting direction for investigation although it would still be difficult to transfer it to specific domains such as health and diagnostics, without considerable effort, since the generality of concepts would not fit the specific domain. What would be required for such domain adaptation? Also in terms of evaluation and benchmark, how could this represent a benchmark for other domains?

Specific comments:
For the sake of readability and focus, I think Section 4 represents a different investigation/analysis than the one presented as a core in the paper, and should therefore be a separate submission.
The paper is way too long as a 42 pages and there is a neat split between the first 28 pages and the direction in which LLM is replacing ECII concept induction. The evaluation is different (humans are used here) and the goal is different (producing and evaluating explanations for humans).
Unlike Section 5 and 6, where a tool is presented that follows the steps and methods provided in the first 28 pages, Section 4 is a clear diversion so I suggest removing it.

Questions:
Selection of thresholds seems to be experimentally determined (e.g. error margin threshold). Can authors suggest a specific rationale for the choice of values?

A brief comparison with disentangled representation approaches is missing, for example those where hidden units are associated with concepts via Network Dissection and work building on this approach.

Observation:
The concepts identified in the three methods are obviously different. It would be good to discuss a bit more what are the possible issues of your comparative evaluation from different sets of concepts: despite the method to confirm candidate labels is the same, it might well be that the wikipedia concept hierarchy has better (or worse) concepts than the most common 20K English words. Would you be able to quantify in some way the suitability of the concepts used as hypothesis to see what’s the effect of this onto the overall evaluation? Specifically, the concept accuracy is assessed separately for the three methods, while if we look at Neuron 1, for example, the concept from Concept Induction (cross_walk) and the concept from GPT-4 (Street Scene) have similar results. Would it be beneficial to combine the strengths of the three approaches by comparing/combining concepts determined by different methods? Beyond a global score there is also the possibility that one is better at detecting certain type of concepts, while another is better at combined or hierarchical concepts.
This is linked to the authors’ account of the limitations of looking only at dense layers. It could be the case that the dense layer relates to concepts that are hierarchical combinations of simpler concepts, and this is where Concept Induction might be more effective also given the use of a hierarchical KB as a pool of candidate concepts. Looking at other layers might reveal that CLIP-dissect performs better and therefore a modular analysis of neurons’ activations comparing concept extraction techniques might reveal important insights.

In this regards, I think that it would be beneficial for ConceptLens to be able to see concepts generated by concept induction but also compare with concepts generated by Clip-Dissect (and GPT-4) as a starting point that could also be used to gather human feedback to design effective combination strategies.

Typos/notes:
Is be (page 3 line 11)
CCN training (should be CNN training) (page 5 line 38)
Responses to be returns (should be returnED) (page 10 line 36)
The conceptLens page link (footnote 7 page 37) seems to have problems loading so none of the operations showed in the demo video can be performed.

Review #2 submitted on 04/Feb/2025

By Ilaria Tiddi
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

A Neurosymbolic XAI approach based on Concept Induction as symbolic reasoning to explain results of a DL (CNN) image classifier. The idea is to rely on an ontology to induce concepts used as explanations for a predicted label. It is particularly worthy as the ontology is not manually built (as often seen in NeSy solutions) but automatically extracted from a large-scale KG (wikidata). This is compared with the ability of ClipDissect (a XAI method) and of an LLM (GPT-4) to discover concepts using a crowdsourcing study. The Concept Induction method keeps relatively more high-relevant concepts than the DL-based methods. The paper extends and combines a number previous works, resulting in a coherent story and an interesting nesy approach for image classification tasks.

The introduction is good, although it reads quite long. If keeping it so, it may be good to clearly state the problem, research question and goal of the paper as a very first thing (eg a first paragraph of what is coming), and then deepdive into the rest. A working example is often a good idea. Also, considering the length of the paper, it may be good to have a coincise description of what goes where and how can the reader go through.

The related work section is complete to my (up to a certain point expert) knowledge, but at the end of the section I am a bit left unclear on how the paper stands out. I would suggest to clearly state, for each body of work, how the presented paper improves upon.

The approach section is good and the evaluation is thorough. The paper concludes with a few tools demonstrating the approach in practice.

Minors:
- When referring to specific sections >> "Section" with capital 's' (Section 4, Section 5, etc)
- Section 2 : called it 'related work' ? (literature review may be misleading, as one would think of a systematic review of the field)
- Section 3.1 : "Preliminaries" ?
- For opening quotes : use this character 2 times in latex : ` >> eg ``kitchen'' (to close quotes : 2 times this char : ' )

Review #3 submitted on 20/Jan/2025

By Luca Bergamin
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

The paper discusses a collection of techniques apt to understand what high-level concept a neuron attends, considering common convolution-based architectures. In particular, the authors leverage both symbolic, ontology-based techniques and LLM-based adaptations. Their findings highlight experimental evidence which compares the strengths and weaknesses of both LLM and symbolic reasoners. The paper is a result of an extension of several conference papers published at NeSy 24 by the authors.

The work is indisputably valuable and of interest to the present journal. Nonetheless, it requires a careful presentation due to its sizable number of contributions in multiple aspects, which makes the paper quite hard to digest at first, particularly for readers who could need a refresher on the concepts used throughout the paper.

In the following points, I refer to specific parts of the paper using P11,r22, where 11 is the page and 22 is the row.

1. The literature review would benefit from being presented in a table/graph form, comparing the main axes (such as the neural/symbolic nature, the degree of supervision required, etc.) and possibly proposing a proper taxonomy of the works cited.

2. Also, the relationship between the related works cited and the present work could be discussed further, e.g., by discussing under which aspect your proposal improves the weaknesses of each method.

3. Regarding the discussion of “explaining a neural network through concepts” (cfr. p3,r11), reporting some works related to “having a neuron active for many concepts at once” could be beneficial. To this extent, the literature on disentangled representations (Bengio et al., 2013; Locatello et al., 2019) could be useful. Other useful keywords are “polysemantic neurons” (i.e., neurons that fire under multiple stimuli).

4. While I understand the utility of having the notions related to each section structured to give the background needed at the beginning of each section, some common preliminary notions could be moved to a background section before entering Section 3. This section could also help provide a visual example to help understand all the inputs/outputs involved in the system. In my opinion, this would help to make the paper less of a collection of existing published papers and more of a comprehensive work on the topic.

5. (cfr. p26,r36) It is quite strange that only the Resnet50V2 achieved high validation accuracy scores, while other architectures show a big gap with the training accuracy, especially when using early stopping. Do other metrics highlight this issue (e.g., top-k accuracy) as well? Could you compare the confusion matrices? Also, is patience=3 / learning rate=0.001 sufficient/necessary to fine-tune this task? Usually, you could get better results in fine-tuning with lower learning rates and/or providing more epochs. While I understand the argument of the low need for high accuracy, the explanations should be made on a sufficiently reliable/performant model, and I can't see how Resnet50v2 has such a wide margin compared to the classic Resnet50.

6. Regarding the statistical testing: in p13,r23 you state the usage of the Mann-Whitney U test that does not require normal distributions. It is unclear to me whether this test should be corrected or not (due to the multiple analyses performed) and why. Also, you mention there is no reason to assume that activation values follow a normal distribution; can you show an example?

7. Some formalizations of the methods described could help in making the paper self-contained, even at the cost of redundancy with already published material. In particular, the statistical analysis tools used extensively throughout the paper could be introduced; it’s unclear how ECII works without exploring the cited literature; some details of the inner workings of CAV/CAR could be provided as well.

8. I am not sure of the usefulness of Table 6-7-8. In particular, they show the raw performance in both training and test settings. Wouldn’t a chart be more informative, especially while comparing the results of GPT/CLIP/Concept Induction? Those tables could be moved to an Appendix if possible. Also, I am unsure of the utility of having the training accuracy reported as well, if not discussed in the paper.

9. Regarding the “Further discussion” subsection, there are a couple of claims that could be discussed better:
9a. P27,r3: “it is unclear how to craft the pool of candidate concepts”; can you expand on this topic?
9b. P27,r5: “tailored to the application scenario”; can you provide an example?
9c. P27,r9: “it is equally vital to thoughtfully design this pool”; could you better explain what are the risks of a poorly designed pool? 10. How would this extend to other datasets? Can you make an example?

10.The limitations of the work could be summed up in a specific section at the end of the paper (e.g.: activation patterns involving more than one neuron, requirement of labeled data, single dataset analysis, concept formation across multiple layers). Mitigations and/or suggestions for implementing these improvements could be reported as well.

Minors:
1. Regarding the CAR non-linear kernel, some details (e.g., the value chosen for the bandwidth of the RBF kernel) are missing.
2. The Levenshtein string similarity metric is undefined (p29,r41)

Grammar and general layout:
1. P3,r11: “Neural Network through concepts is be a two-step process” -> “[...] is a two-step process”
2. p5,r43: should have a brief discussion before creating the subsubsection 3.1.1, to avoid the empty subsection.
3. P20,r28: k-fold cross validation vs p22,r37, K-fold cross validation; keep a consistent notation
4. P37,r44: necessitate -> necessitates
5. P29,r3: beforew -> before

Towards a Neurosymbolic Understanding of Hidden Neuron Activations

Tracking #: 800-1791

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Abstract:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 800-1791

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Abstract:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links