By Johannes Langer
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: No
Presentation:
Adequacy of the abstract: No
Introduction: background and motivation: Bad
Organization of the paper: Needs improvement
Level of English: Unsatisfactory
Overall presentation: Weak
Detailed Comments:
The submission proposes a hybrid architecture in which the classification head of a CNN is replaced with a TAO learned decision tree. They improve previous work by introducing a hyperparameter to control node sparsity to the TAO algorithm. In experiments, the authors show that this approach maintains classification performance compared to the original MLP classification head and demonstrate how this approach can be used to make class feature relevance interpretable.
In terms of methodology, this paper is solid, with room for improvement. Reviews for the shorter conference paper already pointed out that the approach should be tested on a more complicated dataset than FashionMNIST, as classification on this is fairly easy, and the reported errors are still surprisingly high (over 10% on the test set). Further improvements could include to conduct a performance analysis using a k-fold cross validation for statistical support, and to properly evaluate faithfulness of the generated explanations, not only claiming that they must be faithful due to how they were constructed (even if I do agree with the authors that this is the case).
While this is an incremental improvement over existing work, the submission is theoretically sound and relevant to the NeSy community.
Reproducibility is not easily possible as no code for the experiments or models has been made available. The descriptions are detailed enough that I believe this work could be feasibly replicated, but only with significant implementation effort.
But, here are some big issues with the submission as it stands:
The abstract does a great job of summarising what’s in the submission. However, the descriptions could be a bit more precise, which makes the findings sound a little more vague than they actually are, and makes the abstract a bit hard to understand without having read the whole paper beforehand.
I’m really concerned about the language in the introduction, especially the choice of words. Also, the descriptions could be clearer for the scientific community, and some parts feel more like science communication than scientific writing. In other sections, some decision tree terms are introduced quickly without much explanation or a link to later parts of the paper. For example, the authors only mention counterfactual explanations here and never again, just referencing their own previous work.
This brings me to the biggest issue: the language. The way the paper is written is not up to par. I won’t go into every single instance, but just from the introduction, things like „say, …“, „surgical precision,“ or the term „overkill“ in the footnote stand out. Also, make sure to check for missing articles. To get accepted, the paper needs a major language overhaul so it presents its content more clearly and accurately.
Plus, there are some formatting, structure, and convention issues.
- The references list conference proceedings as their own source and then cites these in other bibliography entries.
- Footnotes are not numerical but using different special characters.
- In-text citations are used in a way that doesn’t make sense (trying to include too many sources at once), which leads to awkward collections of references in parentheses (see the last paragraph of page 3 for an example).
- Abbreviations are often not introduced (like SGD) or not explained well (like CNN).
- Titles don’t follow title case for capitalisation.
- Figures in the experimental section aren’t ordered properly. Figure 5 is mentioned before Figure 3. A better order would make the paper easier to read.
Some important references are missing. The paragraph „The Tree Alternating Optimisation (TAO) algorithm: review“ is missing its reference to the original TAO paper. While it’s great that the original paper is mentioned in the introduction and the author is involved, it would be even better if the paper itself was cited here. The current version implies that the described algorithm was a brand-new introduction, which isn’t entirely accurate. Mentioned intrepretability attempts in the first paragraph of the experiments section are not referenced (this is done much later in the section; the first claim is unreferenced).
The related work section starts with a historical overview of heuristic search-based approaches for incorporating trees into neural networks. While this is informative, it might not be the most precise way to understand what’s been done in that area, and it could be considered unnecessary for the rest of the submission.
On the other hand, related work on explainability (evaluation, faithfulness, and especially methods like LRP) is completely missing.
Lastly, I would like to see an improved description of the section „Where is a specific neuron, and a specific decision node, looking at“. NEURONS are specifically not arranged in a grid, and they also do not only receive input from only a specific subset of the outputs of the previous layer. This is due to convolution layers operating with sliding kernels. Instead, the authors trace back localised activations through the layers to the original image. This distinction is especially important, as no code is available to further understand how the RFs were generated.
Disclaimer: I used local AI-tools to re-formulate parts of this review. I never supplied the submission or even parts of it as context, and reviewed the changes thoroughly.