By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
I'm recommending major revision, however, not because of any fundamental flaws of the proposed approach. I think authors' approach is interesting and coherent, probably in all three proposed variants. I'm also largely satisfied with the quality of presentation of the theoretical background and author's method. My main complaint concerns (1) the unfortunate organization of the experimental part of the paper and (2) the quite chaotic and partially redundant presentation of experimental problems and partial results. See my detailed remarks below.
The experimental section is organized in a way that makes ingestion of this material difficult. Why put the burden on the reader by presenting all datasets at once, given also that they come from two unrelated domains, i.e. computer vision and predictive maintenance (trucks)? This part of the paper should be reorganized, with two separate sections concerning the CV problem and the predictive maintenance problem. Each of those sections should start with problems statement, description of data, description of how authors' method has been adapted to the problem, presentation of results and discussion.
(Frankly, the truck benchmark seems problematic in many respects, so I would even consider dropping it entirely, while improving the presentation of the CV problem and results)
Also, it would be desirable to quote the performance metric of some baseline methods on the considered benchmarks.
I tend to believe that implementing my suggestions should bring this manuscript to publishable quality, probably without the need of running any additional experiments.
Detailed remarks
--------------------
The argumentation for modularity on p. 8 could be shortened -- these advantages are widely accepted and not controversial.
> on image scene classification
Consider 'scene classification'.
> The approach presented in the original paper is inserted within a framework
integrating deep learning
Perhaps 'embedded' rather than 'inserted'?
The sentences in bullets on p. 3 should end with periods.
> (∀x∈X)(F(x) ̸= ∅)
Unnecessary parentheses; consider ∀x∈X, F(x) ̸= ∅
p. 8
> The two components are discussed in Sections ?? and .
Missing references
p. 11
> a deep learning model following one of the integration strategies depicted in Fig. 1.
Correct to Fig. 2. Also, it would be convenient to name those strategies (or at least enumerate them e.g. A, B, C) and use those names/symbols throughout the paper.
> The first integration strategy (see Section ?? and [MBT25] for more details)
Missing references
Important: the section "Neuro-LENS: Neuro-symbolic integration" feels strongly repetetive with respect to the previous sections "Neural component". Those two sections should be fused. At the moment, the section that comes later in the text feels very redundant and repetitive with respect to the one that comes earlier. Also, this part of the paper should be shortened, or perhaps some of the thoughts discussed there should be moved to a later Discussion section.
p. 12
> Ground truth is not available, neither for the object detection task or the abandoned bag scene classification task.
What are those tasks? Please define them more precisely before referring to them.
p. 13
> All data is anonymized.
This remark does not make much sense here -- this is not not sensitive or personal data.
In the description of the truck failure task, the nature of the task is not explained in the text -- only after seeing Table 1 I realized that this is a classification problem. Task statement should be made more clear in the text.
Also, a few paragraphs later, the authors mention 5 decision classes, which is inconsistent with Table 1.
Figure 4 is quite cryptic. I think the authors should discard all considerations concerning the anonimization of data -- in my understanding, those are irrelevant, i.e. the dataset is what it is, and should be taken as granted. I don't think discussing those aspects is relevant for authors' approach.
p. 14 and further
Section Image scene classification: Neural-to-symbolic chaining
The text in this section should be rewritten to logically follow the pipeline shown in Fig. 6. Currently, the authors first present the pipeline in points 1, 2, 3, but then come back again to the neural component, detailing the neural nets. This is not an orderly presentation in my opinion.
Also, this description should be shortened and made more to-the-point, while avoiding redundant repetitions.
Fig. 9 does not seem to convey any useful information -- if all the values in the table are 'True', what's the point of presenting them in a table?
> Subsequently, the multi-valued mapping described in ??
Missing ref.
The Conclusion section should be shortened to convey the essence of results, without verbose repetitions of what has been already stated earlier.