By Alessandro Oltramari
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Average
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
The authors investigate the emerging area of "Neuro-Symbolic Knowledge Engineering" by conducting a data-driven analysis of the state of the art. In particular, they leverage a large scale Systematic Mapping Study, where semantic technologies are initially used to identify 1986 relevant papers. These initial results are further narrowed down to a corpus of 476 papers, using "selection and exclusion criteria".
Question 1: Can the authors list these criteria and explain them? This would be an instrumental piece of information to understand how the data have been obtained and whether this part of the study - which seems qualitative in nature - underwent any validation stage (e.g., inter-annotator agreement measure).
--
The paper indicates that neuro-symbolic approaches are mostly focused on knowledge graphs, and only marginally to ontologies and taxonomies, reflecting a change in the KE area of focus (this hardly comes as a suprising result, as the topics distribution of a top-tier conference like ISWC suggests). The study described in this paper also shows that neural networks are the most frequent method used for knowledge engineering tasks, which clearly correlates with the prominent role that neural approaches play in machine learning these days. In this regard, the "elephant in the room" here is Large Language Models (LLMs): as this fairly recent paper shows [1], there's an increasing trend of using LLMs as knowledge bases [2], or as sources to construct/complete knowledge graphs, which are typical KE tasks, as argued by the authors of this manuscript, which leads me to another major point:
Question 2: Can the authors "unpack" the category of "neural networks", and address the impact that LLMs are specifically having in performing/supporting KE tasks? Introducing such key information would help to make the study described in the paper more relevant and useful for the community.
---
Other issues:
- p.3, line 21: it can be misleading to put "text" and "images" alongside "embeddings" as examples of "non-symbolic data", as opposed to symbolic structures like "semantic entities or relations". Textual elements, i.e., "words" are typically considered symbolic, whereas their corresponding vectorial representations are not. A similar argument holds for images. I'm assuming that the authors are implicity referring to sub-symbolic representations of text and images, but this should be clarified.
- p.3, line 40: although it is clear why the 123 papers are more relevant for the authors to analyze, it seems interesting to also understand which applications areas "Neuro-Symbolic Knowledge Engineering" is emerging from. Details about the papers on the domain-specific tasks (incidentally, the majority of the initially selected 476 papers!) would be quite interesting - especially to practitioners.
p.4, line 31-41: turning the bullet points into a table may improve the readability of the paper. In fact, as one proceeds with the following sections, and reads about the patterns, it feels natural to go back to the bullet points and consult them.
References:
[1] Allen, B.P., Stork, L. and Groth, P., 2023. Knowledge Engineering using Large Language Models. arXiv preprint arXiv:2310.00637.
[2] Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H. and Riedel, S., 2019. Language models as knowledge bases?. arXiv preprint arXiv:1909.01066.