By Floris van der Hengst
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Content:
Technical Quality of the paper: Weak
Originality of the paper: No
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Satisfactory
Level of English: Unsatisfactory
Overall presentation: Weak
Detailed Comments:
# Summary of paper in a few sentences.
This paper argues for the use of requirements throughout the machine learning model life cycle based on an analysis of two important and safety-critical application domains with requirements. A cyclical development model for machine learning systems is proposed in which requirements can be updated and used in each of its five stages. Within this model, neuro-symbolic approaches are identified as a key solution within the model creation and model training stages.
# Reasons to accept
The inclusion of (safety) requirements is an important challenge for machine learning practitioners and researchers. It makes sense to consider neuro-symbolic approaches for this challenge.
# Reasons to reject
The paper looks like a position paper and these may not be in scope for this journal.
The main contribution of the paper seems to be the pyramid model in Figure 4, but this model is very general, its connection to neuro-symbolic AI is poorly argued and it lacks novelty (see other remarks).
# Criteria
## Significance
This paper argues that requirements need to be incorporated into the entire machine learning development pipeline. Because some requirements can be expressed with knowledge bases / symbolic AI, the significance to NAIJ is reasonable.
## Background
This works cites an appropriate amount of works in which some neuro-symbolic approach is used to include requirements into the model creation and model training stages (using terminology from the paper). However, the main contribution of this paper is the pyramid model for the machine learning (system) development process. This calls for an embedding of this novel model into existing models of this process. Many reference models are cyclical rather than linear as suggested in this paper in e.g. Figure 1. And many of which explicitly mention the role of requirements in throughout the process.
- Martínez-Plumed, Fernando, et al. "CRISP-DM twenty years later: From data mining processes to data science trajectories." IEEE Transactions on Knowledge and Data Engineering 33.8 (2019): 3048-3061.
- Ashmore, Rob, Radu Calinescu, and Colin Paterson. "Assuring the machine learning lifecycle: Desiderata, methods, and challenges." ACM Computing Surveys (CSUR) 54.5 (2021): 1-39.
- Haakman, Mark, et al. "AI lifecycle models need to be revised: An exploratory study in Fintech." Empirical Software Engineering 26 (2021): 1-29.
The paper would also benefit from a discussion on *how* requirements can effectively be modeled (the leftmost arrow in the pyramid model in Figure 4) and an analysis on how this impacts the right-hand side of the model. This may also strengthen the link with neuro-symbolic AI.
## Novelty
This paper brings little novel insights or viewpoints. The paper mentions the cyclical nature and the role of requirements of the pyrimad model as innovations, but both are established (see earlier remarks). What is left is (i) the analysis of two domains and (ii) the link between requirements in ML and neuro-symbolic AI: (i) the two analyses are fully based on existing work and (ii) the link between requirements in ML and neuro-symbolic AI is an established one as evidenced by the examples cited in the paper.
## Technical quality
The paper is neither a survey nor a research paper to me, making technical quality hard to assess. However, there is room for improvement in the argumentation in this work. To name some issues:
- in the introduction it is claimed that requirements in any application domain can be obtained from an existing body of knowledge in this application domain and that there will be a continued push for adoption of systems even when these may have unintended consequences (ln15-18). Claims like these need to be supported
- there are many mentions of 'facts' and 'obvious' things which are not supported by evidence
- the presented pyramid model is contrasted to a "traditional 'performance-driven' [..] pipeline". Where is this traditional pipeline suggested (see note on background). Furthermore, the idea that predictive performance is the only metric to optimize for "traditionally" is immediately contradicted in the work itself, i.e. by mentioning fairness, robustness, explainability ... So the current model is already in use, what novelty remains?
- p4 ln41: "threshold is often picked equal to 0.5" this threshold is usually set based on specifics of the problem (misclassification costs, class imbalance, etc.)
## Presentation
The organisation of the text is good, the figures are clear and support the story.
The text could benefit from some additional language editing. Some sentences are too long and vague.
Sentences that require some attention:
- p2 ln 3: "the reasons" what reasons/reasons for what?
- p2 ln 11: "positive performance" high performance
- p2 ln 13: "spelled requirements" requirements
- p2 ln 14: "Though" Although
- p2 ln 33: "pros of [..] requirement" benefits of [...] requirements
- p6 ln 34: lower control
- p7: "requirement over" requirement on
## Length
Some repetitions could be eliminated to shorten the paper.
## Data availability
N/A