By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: No
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Excellent
Detailed Comments:
The authors describe an interactive approach for adapting/aligning neuro-symbolic models via knowledge acquisition that builds on interpretability of high-level concepts acquired by NeSy architectures. They also discuss two successful examples of this framework.
The paper is well written and easy to follow. The applications are also promising, I do not see any major issue with them (but see below). I am a strong supporter of concept-based explainability and of interactive explanation-driven adaptation, and I think this paper is going in the right direction.
The only real issue I see with the paper is it is relatively light in terms of pointers to relevant literature. I also see a couple of potential issues in the core design, detailed below in point 4.
1) A very significant recent paper is:
- Kambhampati S, Sreedharan S, Verma M, Zha Y, Guan L. "Symbols as a lingua franca for bridging human-AI chasm for explainable and advisable AI systems." AAAI 2022.
Many of the points raised therein strongly resonate with the arguments found in this paper.
2) The idea of building explanations on top of concepts either output by or extracted from a ML/DL model is quite well studied. Particularly relevant is the class of "concept-based models" which, essentially, implement a bare-bones neuro-symbolic pipeline in which the reasoning step is constrained to be interpretable. For a recent overview, see:
Schwalbe, "Concept Embedding Analysis: A Review", 2022.
More recent models (not covered in this overview, if I remember correctly) and techniques include:
- Espinosa Zarlenga et al. "Concept embedding models: Beyond the accuracy-explainability trade-off." NeurIPS 2022.
- Marconato et al. "GlanceNets: Interpretable, leak-proof concept-based models." NeurIPS 2022.
- Yang et al., "Language in a bottle: Language model guided concept bottlenecks for interpretable image classification". CVPR 2023.
- Oikarinen et al. "Label-Free Concept Bottleneck Models." arXiv 2023.
The last two architectures leverage LLMs to implement the concepts. The bridge between rule-based reasoning and self-explainable neural networks has been considered in:
- Lee et al. "Self-explaining deep models with logic rule reasoning." NeurIPS 2022.
3) The idea of interacting with models through explanations for the purpose of adapting the model itself is also quite well known, see this overview:
- Teso et al. "Leveraging explanations in interactive machine learning: An overview." Frontiers in AI 2023.
Particularly relevant are:
- Stammer et al. "Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations." CVPR 2021.
- Bontempelli et al. "Concept-level Debugging of Part-Prototype Networks." NeurIPS 2022.
These two works focus on correcting NeSy and concept-based models via explanations/rules, and the overall idea is very much related to what is being proposed here.
4) The big issue I was referring to is the following: in order for concept-based communication to be successfull, machine concepts need to be aligned with the human's. This is a very non-trivial problem.
There are at least two threads of research that are very relevant here. One is that on concept leakage:
- Mahinpei et al. "Promises and pitfalls of black-box concept learning models." arXiv 2021.
- Margeloiu et al. "Do concept bottleneck models learn as intended?." arXiv 2021.
- Havasi et al. "Addressing leakage in concept bottleneck models." NeurIPS 2022.
- Marconato et al. "GlanceNets: Interpretable, leak-proof concept-based models." NeurIPS 2022.
In short, concept leakage entails that a concept (say, that of ``red'' or of ``digit 4'') unintentionally captures information about additional, unobserved concepts (``blue'' or ``digit 0''). Hence, the meaning that the machine to that concept differs from what the human . This is a general problem.
In the context of NeSy approaches, there is a different source of semantic misalignment, namely reasoning shortcuts:
- Marconato et al. "Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts." arXiv 2023.
Reasoning shortcuts occur when NeSy models learn concepts that, despite perfectly satisfying the logical specification, are very different from the ones we would expect them to learn. This paper shows that this phenomenon is quite common in NeSy applications and, specifically, that it affects LTNs.
Reasoning shortcuts are especially concerning in interactive adaptation, where the user's provided knowledge does not have the intended effect on alignment. This could thwart attempts at communicating with users, and ultimately compromise alignment.
For points 1-3, I would appreciate if the authors could briefly position their work against this literature (or explain why they believe they are not relevant). For point 4 I'd be grateful if they could mention these two roadblocks to aligning concept semantics -- leakage and reasoning shortcuts -- and (briefly) discuss how they impact the proposed approach.