By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
In this paper, the authors provide an overview of the existing literature on neuro-symbolic methods for trustworthiness, where this notion is seen as composed of five dimensions: interpretability, safety, robustness, fairness, and privacy.
A strength of the paper is its clarity, especially with regard to the methodological choices made for selecting the articles to consider in the survey.
Nonetheless, the work in its present form has some crucial weaknesses to be addressed:
\begin{itemize}
\item[--] First and foremost, the article lacks of a proper discussion of the papers surveyed. This is usually done by providing a synthetic, though effective, description of each paper surveyed. Alternatively, the survey can highlight common perspectives, strategies, results, etc. among the papers. This is done in part in Sections 4.1 and 4.2. However, we believe that the level of detail of these sections is not sufficiently fine-grained to fully deliver the potential of a survey paper, where the reader should gain specific knowledge about the surveyed articles.
\item[--] There seems to be an inconsistency in the information about the timeframe considered for the survey. Although it is initially declared that the work constitutes a systematic review of the recent literature from 2021 to 2022 (page 2 line 8), the authors then say that they ``focused on papers published in top academic venues from 2021 to 2023, including those available up to May 2023" (page 5 line 8). Moreover, Figure 1(a) is in line with the former statement and contains no information on 2023 papers.
\item[--] Finally, the general impression is that the actual scope of this survey is interpretability alone. As the authors themselves point out, ``interpretability is the most extensively addressed aspect of trustworthiness" (page 3, line 24). The other four aspects -- robustness, fairness, privacy, and safety -- have a marginal discussion in the work. This is due to two different reasons. As far as robustness is concerned, the authors intentionally left out studies mainly centered on robustness, arguing that the concept is overly intertwined with that of accuracy. Regarding fairness, privacy, and safety instead, we have to wait until Section 4.4 to learn that there is a scarcity of neuro-symbolic applications for these topics (only one paper is mentioned, in relation to fairness).
\end{itemize}
Questions:
\begin{itemize}
\item[--] What justifies the methodological choice of focusing on proceedings papers only? Neuro-symbolic AI is a hybrid research topic, in the middle ground between computer science and logic, a field where journal papers have high relevance (e.g. the journal Neurosymbolic AI to mention one).
\item[--] Why the following paper was excluded by the survey? Wagner, B. \& d'Avila Garcez, A. S. (2021). Neural-symbolic integration for fairness in AI. In: CEUR Workshop Proceedings. AAAI 2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE 2021), 22-24 Mar 2021, California, USA.
\end{itemize}
Finally, we suggest some minors improvements to the text:
\begin{itemize}
\item[--] Legend of Figure 1(b) contains the typo ``where taken";
\item[--] Figure 3: we suggest changing the abbreviation ``AMB" for ``ambiguous" with a more self-explanatory expression - ``ambiguous" or ``unclear" should be fine.
\end{itemize}
=== final meta-level thoughts ==
I think it would be good to have some sort of comparison to existing works that discuss XAI and trustworthiness from a non NeSy AI context, and particularly to human modelling, and discussions on well-established surveys and experimental evaluations (e.g., stakeholder discussion cf Adrian Weller, Trustworthiness and XAI taxonomy by Nathalie Rodriguez and colleagues). There is an attempt to classify existing works which is appreciated but further work is needed in terms of updating the reader on non NesyAI solutions. Therefore, how well are NeSy AI solutions addressing these concerns? What's hard to do (see work on XAI planning and explaining in dynamic domains).