A Neurosymbolic Approach to AI Alignment

Tracking #: 688-1668

Flag : Review Received

Authors:

Benedikt Wagner

Artur Garcez

Responsible editor:

Luc De Raedt

Submission Type:

Other (note in cover letter)

Full PDF Version:

nai-paper-688.pdf

Cover Letter:

Position paper for inaugural issue

Approve Decision:

Approved

Revised Version:

A Neurosymbolic Approach to AI Alignment

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 13/Dec/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Bad
Organization of the paper: Poor
Level of English: Satisfactory
Overall presentation: Weak

Detailed Comments:

The authors provide a big picture on neurosymbolic AI.
They seem to claim that the logic should allow to align neurosymbolic AI
with human values. They focus on explainability and on fairness.

This is an interesting and defendable position, but unfortunately, the discussion is not concrete enough to be fully convincing.
Many terms are used and connnections are made, but it is not always clear what is intended.
Logic Tensor Networks are used in two illustrations to provide evidence for the claims.
This makes it hard to appreciate the paper in its current form.

This paper needs to be seriously rewritten.
I would propose :
- to clearly articulate in a few highlighted lines what the key claim or position is that the authors
want to make or defend
- to make and highlight the assumptions explicitly -- like that the predicates / concepts are what makes
- to remove those connections that are not essential to this position,
e.g. the communication channels, do you really need FOL here ?
- to clearly define or introduce concepts such as grounding (which seems to be different
than symbol grounding in an interactive environment), concepts, explanations, semantic representation space,
- to very briefly illustrate what LTNs do (to make the position paper stand by itself); what are the inputs/outputs?
(what is a Broden set, what are the members of G(logic) ... that is not said, are these images, examples,
later on it is said it is the degree of truth)
- to discuss whether alternative systems than LTN could be used.
- to introduce a more detailed and more precise example of what is done and why, at the start of the paper,
now one has to wait too long; the example could also go more stepwise.
- to connect to recent work on concept-based and explainalbe neurosymbolic systems, there has been
a whole line of research on this that is not mentioned here (on concept embedding models and related)

Maybe the claim could be a variant of

logic formulae and predicates can be used as the interface between neural networks and humans,
they can be used to guide the neural network while learning and to provide explanations
for classifications to humans, also in an interactive manner.

The interaction seems to be important. It may be best to first talk about LTN, and then aobut how to use it interactively
for a kind of layered learning.

The fairness needs more explanation (like what is RMI and RFI).

Also, what do fairness and explanation have in common ? And are these the only things that matter for value alignment.
It still seems a big step to go from these two to value alignment. This is insufficiently argued.

Once the key message is clear one could potentially talk about using LLM as an interface.

To sum up, while the idea has some merit, the paper should be seriously rewritten to better articulate and to simplify
the message.

Review #2 submitted on 28/Nov/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: No

Detailed Comments:

The paper presents a neurosymbolic perspective/solution to the AI alignment problem. The proposed model is queried using a symbolic logic language, enabling user interaction to confirm or reject model revisions in the form of logic constraints that could potentially be imposed on the neural network.

While acknowledging the paper's exploration of an important issue in AI and the introduction of a (possibly) interesting technique, I believe the paper, in its current state, is not ready for publication. Therefore, I recommend rejection. Below, I will provide specific feedback for improvements.

POSITIONING: As a position paper, key questions remain unanswered, in my opinion. Why does the author's position differ from other approaches to the AI alignment problem? What makes this perspective interesting? Why should this viewpoint be further investigated? While the paper introduces a distinct approach, it fails to address the scientific interest or novelty of this approach. This absence weakens the overall position.

VAGUENESS: Although the problem addressed is relevant, the author approaches it from a too distant angle. The introduction, primarily based on historical arguments, lacks focus. Consequently, the intended message from the authors becomes obscured (see writing below). Many statements delve into neurosymbolic as a field rather than its impact on the alignment problem. Then, the following sections abruptly shift to highly technical details, which, to me, appear less pertinent and divert attention from the main argument. Section 2 attempts to explain LTNs. Then, in Section 3, one would expect the main argument for the paper. Section 3, however, refers mainly to an existing paper that remains undiscussed. Although Figure 2 might relate to that work, no explicit reference to the figure is provided. This leaves the impression that the actual proposal is never truly discussed, but mainly hinted in different parts of the paper. Which brings to the next point.

WRITING: The writing lacks structure. The introduction lacks the conventional style of a scientific paper/essay, appearing more as a chain of thought. This lack of structure confuses the reader. Other sections share similar issues, although their technical structure mitigate the issue to some extent.

NOVELTY: Expanding on the "POSITIONING" point, the paper lacks a section that evaluates the state of the art, comparing and contrasting the author's position with existing ones. The latter part of the introduction implies deficiencies in current explainability methods, yet the paper fails to discuss its relationship with other models addressing interactive components for explainability purposes. (e.g. Explanatory Interactive Machine Learning, Teso, S. and Kersting K.), symbolic components (e.g. the entire field of Concept Based Models, Koh, Pang Wei, et al. "Concept bottleneck models." International conference on machine learning. PMLR, 2020.), or, even closer, neural symbolic interactive explainable models (GlanceNets: Interpretabile, Leak-proof Concept-based Models, Marconato et al, NeurIPS 2022, Interpretable Neural-Symbolic Concept Reasoning, Barbiero et al, ICML 2023). Is the paper advocating a similar approach? Or, what distinguished it?

Review #3 submitted on 22/Nov/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: No

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Excellent

Detailed Comments:

The authors describe an interactive approach for adapting/aligning neuro-symbolic models via knowledge acquisition that builds on interpretability of high-level concepts acquired by NeSy architectures. They also discuss two successful examples of this framework.

The paper is well written and easy to follow. The applications are also promising, I do not see any major issue with them (but see below). I am a strong supporter of concept-based explainability and of interactive explanation-driven adaptation, and I think this paper is going in the right direction.

The only real issue I see with the paper is it is relatively light in terms of pointers to relevant literature. I also see a couple of potential issues in the core design, detailed below in point 4.

1) A very significant recent paper is:

- Kambhampati S, Sreedharan S, Verma M, Zha Y, Guan L. "Symbols as a lingua franca for bridging human-AI chasm for explainable and advisable AI systems." AAAI 2022.

Many of the points raised therein strongly resonate with the arguments found in this paper.

2) The idea of building explanations on top of concepts either output by or extracted from a ML/DL model is quite well studied. Particularly relevant is the class of "concept-based models" which, essentially, implement a bare-bones neuro-symbolic pipeline in which the reasoning step is constrained to be interpretable. For a recent overview, see:

Schwalbe, "Concept Embedding Analysis: A Review", 2022.

More recent models (not covered in this overview, if I remember correctly) and techniques include:

- Espinosa Zarlenga et al. "Concept embedding models: Beyond the accuracy-explainability trade-off." NeurIPS 2022.

- Marconato et al. "GlanceNets: Interpretable, leak-proof concept-based models." NeurIPS 2022.

- Yang et al., "Language in a bottle: Language model guided concept bottlenecks for interpretable image classification". CVPR 2023.

- Oikarinen et al. "Label-Free Concept Bottleneck Models." arXiv 2023.

The last two architectures leverage LLMs to implement the concepts. The bridge between rule-based reasoning and self-explainable neural networks has been considered in:

- Lee et al. "Self-explaining deep models with logic rule reasoning." NeurIPS 2022.

3) The idea of interacting with models through explanations for the purpose of adapting the model itself is also quite well known, see this overview:

- Teso et al. "Leveraging explanations in interactive machine learning: An overview." Frontiers in AI 2023.

Particularly relevant are:

- Stammer et al. "Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations." CVPR 2021.

- Bontempelli et al. "Concept-level Debugging of Part-Prototype Networks." NeurIPS 2022.

These two works focus on correcting NeSy and concept-based models via explanations/rules, and the overall idea is very much related to what is being proposed here.

4) The big issue I was referring to is the following: in order for concept-based communication to be successfull, machine concepts need to be aligned with the human's. This is a very non-trivial problem.

There are at least two threads of research that are very relevant here. One is that on concept leakage:

- Mahinpei et al. "Promises and pitfalls of black-box concept learning models." arXiv 2021.

- Margeloiu et al. "Do concept bottleneck models learn as intended?." arXiv 2021.

- Havasi et al. "Addressing leakage in concept bottleneck models." NeurIPS 2022.

- Marconato et al. "GlanceNets: Interpretable, leak-proof concept-based models." NeurIPS 2022.

In short, concept leakage entails that a concept (say, that of ``red'' or of ``digit 4'') unintentionally captures information about additional, unobserved concepts (``blue'' or ``digit 0''). Hence, the meaning that the machine to that concept differs from what the human . This is a general problem.

In the context of NeSy approaches, there is a different source of semantic misalignment, namely reasoning shortcuts:

- Marconato et al. "Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts." arXiv 2023.

Reasoning shortcuts occur when NeSy models learn concepts that, despite perfectly satisfying the logical specification, are very different from the ones we would expect them to learn. This paper shows that this phenomenon is quite common in NeSy applications and, specifically, that it affects LTNs.

Reasoning shortcuts are especially concerning in interactive adaptation, where the user's provided knowledge does not have the intended effect on alignment. This could thwart attempts at communicating with users, and ultimately compromise alignment.

For points 1-3, I would appreciate if the authors could briefly position their work against this literature (or explain why they believe they are not relevant). For point 4 I'd be grateful if they could mention these two roadblocks to aligning concept semantics -- leakage and reasoning shortcuts -- and (briefly) discuss how they impact the proposed approach.

Tracking #: 688-1668

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 688-1668

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links