By Jacopo de Berardinis
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Bad
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average
Detailed Comments:
This article contributes new design patterns for LLM-based Neuro-Symbolic systems by extending the Boxology approach. After conceptualising the basic patterns for current Transformer architectures, and extending them within the context of language models (prompting, instructions, etc.) the authors taxonomise LLM-based NeSy models into 3 categories: KG-enhanced LLMs, LLM-augmented KGs, and synergized LLMs and KGs; and propose a referential pattern for each of them, making distinction between the training and the inference stages. The article concludes by presenting 7 applications of the proposed patterns to specific NeSy (and non-NeSy) systems. Overall, this work contributes an original approach to document, analyse, and compare NeSy models and systems; and appears to provide a flexible paradigm to conceptualise current (and potentially future) methods, as demonstrated in the use cases section. Nevertheless, while the core contribution of this work is sound, I believe that this manuscript necessitates substantial work to improve the introduction, the motivations, and most importantly, to explain the authors' work at a reasonable level of detail and demonstrate practical applications beyond showing its expressiveness to represent current LLM-based NeSy models and systems.
Strengths
---------
- The approach is sound and intuitive, with diagrams supporting the understanding of the various patterns. The proposition is well-scoped and contextualised (especially 3.2.1. which describes prompts and instruction), and the authors demonstrate the expressiveness of their paradigm in representing various models at different stages and for different processes.
- The continuity with the ongoing Boxology research efforts adds value to the proposal, and shows that there is an overall vision behind this contribution.
- The article is well aligned to current GenAI + NeSy approaches, and the contributions mentioned are all relevant. However, as described below, a more general introduction of Symbolic + NeSy models and systems to remark their nature and their strengths (even before the inception of LLMs) would have helped to better contextualise this work.
Weaknesses
----------
- Introduction. The first paragraph of the introduction sounds a bit high-level. Overall, it needs to be significantly expanded and improved. Expressions like "OpenAI’s ChatGPT system has changed world of text generation forever" are also a bit informal. Also, "neuro-symbolic approaches" have been there for a while, and the reader may get the impression that they have been introduced as a response to LLMs' weaknesses. Moreover, the intended meaning of trustworthiness should be made clearer, as some definitions of trustworthiness already include explainability as a requirement (see, for example, as https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trust...). Also, I would have appreciated a brief intro on NeSy models (how they are designed, examples of how inference is performed, their key differences from purely statistical approaches, etc.) and systems before starting to introduce Boxology. Finally, when outlining the main contribution of the article, I could not grasp how the proposed approach actually contributes to addressing the challenge of making models/systems more explainable, etc.
- Related work. This section jumps directly to Transformers, as the key technology most current LLMs use. I would recommend to elaborate this starting from a more foundational perspective. At the core, current LLMs are autoregressive sequence models, which can be implemented with a self-attention network (SAN), where Transformers are a specific type of SAN. Are these patterns applicable to Transformers-based only? Also, given the focus on LLMs, it would be interesting to expand the current formulation with a Mixture of Experts (MoE) approach.
- Symbolic AI is not the only way to address the "side effects" of LLMs, and there has been substantial research in NLP to making these models explainable, safer, and more factual. I would recommend mentioning some of these efforts, while making clear distinctions with NeSy approaches as mentioned before.
- Section 2.2 (Processes). I struggled to understand the difference between generation, transformation, and inference, as I guess they could all be seen as instances of transformations? For example, a generation can be seen as a transformation of an input vector (a seeding/priming token, input noise, etc.), so there is often some form of data as input to a generation process. The same applies to inference.
- Section 3. I was expecting more of a step-by-step explanation of the each pattern, going through each component (at least for the first patterns introduced), so that the reader can appreciate the overall formulation/definition, and easily understand the logic and the intuition behind each pattern. Instead, the description of these patterns appeared to me very fragmented and dispersive, often assuming technical knowledge of previous works.
Also, a lot of relevant models from the literature are briefly presented (for example, in Section 3.4.3) but the little information given makes it difficult to understand how they specifically relate to the patterns. In sum, I would recommend focusing this part to explain how each pattern can generalise well and apply to each of these models/systems presented. For example, "BERT-MK has a similar dual encoder, but adds additional information from the neighbouring entities ..."; how is this captured by the pattern you are presenting?
- Figure 3B. Why is the classification head on top of the encoder producing a "symbol" as output of the "infer:deduce" process? From my understanding, if the model is trained for classification, what the model is learning is a probability distribution; so, at inference time, you would still have vector "data" (e.g. the logits) from which a "symbol" can be sampled. I believe more information is needed to clarify this.
- Section 4. To fully appreciate the expressiveness of the patterns in the use cases, more information on the NeSy systems is needed. This is the case of RAG. Instead, when more information about the system is given, such as for "KnowBERT", then the explanation of how the architecture relates to the pattern is a bit succinct ("The Boxology pattern for KnowBERT is depicted in Figure 13"). Ideally, each subsection should have both: a reasonable primer of the system (as done for KnowBERT), as well as a step-by-step explanation of the corresponding pattern, and how this is expected to generalise to other/similar systems within the same group. Section 4.6 is a good example of this.
- Section 4. ChatGPT. While there is published material and resources on the GPT family of models (as the authors rightly mention in the article), to the best of my knowledge, we do not have a full transparent and comprehensive understanding of the (overall) ChatGPT system (which is indeed more than a model per se). Therefore, I would have chosen an open LLM-based system for this section, such as Llama 3. Also, the opening of Section 4 mentions that the focus of this part is on LLM-based NeSy systems; and finding ChatGPT as the first use case was a bit counterintutitive to me.
- The transition from section 4 to the conclusions felt a bit sharp, and I would recommend adding a discussion section to cover some questions that may arise at that stage. For example, what can be said about the limitations of these patterns in terms of what they are capable to capture or not? What is an intended use case where the documentation of a NeSy system is bringing value for transparency, etc. as originally stated in the abstract and in the introduction (the motivations behind the paper). Which level of granularity does this approach offer and in which cases this may not be suitable? Most importantly, after reading the article, I was still wondering what the application of patterns actually enables; and clarifying or restating this is important to link back to the motivations outlined in the introduction.
- I got the impression that the current conclusions are a bit rushed and they do not mention the original motivations behind this work; the connection with Boxology and the extension to represent LLM-based NeSy systems; and how the description of such models and systems enable new avenues and opportunities to address the challenges presented by the authors in the introduction. Overall, the current conclusions are diminishing the potential of this work, and I strongly encourage the authors to expand them accordingly, shedding more light to their contribution.
Questions
---------
- How would this effort contextualise with related ML documentation efforts, such as, the Machine Learning Model Cards? (see for example https://modelcards.withgoogle.com/about)
- The current positioning of the article is on LLM-based systems, but I got the impression that the proposed approach could easily scale and be adapted (with little effort) to other generative systems? Given its potential, and the expressiveness of the approach, I would suggest to elaborate this in a discussion (or conclusions) section.
- Page 3, Line 46. What is the systematic study of ~500 papers for? I understand the authors are trying to make an argument on the flexibility and the maturity of the Boxology approach; however, it would be interesting to have more details here. Also, after Page 3 Line the 47, the narrative diverges a bit, and it becomes hard to see which points the authors are trying to make in relation to the work presented in this article; without more context, the reader may get the impression that all this work is not relevant. Others remarks, instead, sound a bit speculative at this stage (such as "This framework and implementation could be used in the implementation of the design patterns")
- Page 3, Line 48. What is EASY-AI?
- Page. Can Boxology be seen as a formal language to describe ML models or is it mainly a visual paradigm?
- Section 3.1. Could these patterns be used to describe Multimodal LLMs?
- Page 5, Line 7. I would recommend referring to classification models/heads rather than "classification systems", which may suggest a more complex process defined on top of a predictive model.
- Sections 3.4.1. and 3.4.2. What is the intended meaning of "KG model"? I had the impression that it is sometimes used to refer to a KG (3.4.1, Figure 4) but also to a model for KG embeddings (3.4.2.). I would recommend to introduce this beforehand in order to avoid confusion!
- Page 9, Line 51. "The selected papers are chosen ... but also to act as a fluent language interface or a formal language interface". This sentence is not clear to me, and I would appreciate additional context to explain this part.
Minor comments
--------------
- Page 3, line 3. Missing citation?
- Page 3, Line 4. Which authors?
- Page 3, Line 30. Broken citation.
- Page 3. Machine Learning is used before but the acronym is only introduced in page 4. Also, ML is never used in the paper. Similarly, the "Hets" acronym is introduced in line 44 page 4 but never used.
- Page 5, Line 19 "classical machine learning systems". Do you mean support vector machines, decision trees / random forests?
- Page 5, Line 25. vision transformers, please provide some references and explain how this would translate to such architectures?
- While readable, figures' resolution could be improved, and their captions, especially for Figures 1-3, could be expanded to provide more information on how to read them (even if this may sound a bit redundant with the sections).
- The use of sub-labelling for images is not consistent. Sometimes, a sub-label is capitalised (Figure 3A), other times is not (Figure 1a).
- Page 6 Line 44. RAG is mentioned, but not introduced to the reader.
- Page 8 Line 28. "LLM-based NeSy Design Patterns in Application". Do the authors refer to inference?
- Page 8 Line 48, can *be* used