By Steven Schockaert
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Average
Detailed Comments:
This paper presents a kind of overview of the field of "deep deductive reasoning", which is understood to mean the use of deep learning techniques for (approximating) formal deductive reasoning. The introduction nicely motivates the importance of this field. One small point: it is suggested (page 2 line 8) that DDR would lead to better tolerance of errors and noise. Is it obvious that this would actually be the case? If we have a model that is able to do deductive reasoning with near-perfect accuracy, wouldn't we expect that this model would then also be intolerant to errors/noise? All the work on adversarial examples in image classification shows that small amounts of noise can have a big impact on the behaviour of neural networks.
I also liked the way in which the related work is summarised in Section 2. One small issue here is that the discussion is mostly about semantic web languages. This could give the impression that deep deductive reasoning for propositional logic is somehow uninteresting or solved, neither of which I think is true. It might also be useful to add a paragraph on the use of embeddings for automated theorem proving, where a kind of hybrid approach is sometimes taken (e.g. using neural network encodings for premise selection, while relying on standard deduction for the actual inference steps).
Section 3 presents an interesting, albeit perhaps overly dense overview of work on understanding theoretical limitations.
However, I found Section 4 hard to follow. I like the idea of presenting negative results, which is indeed appropriate for a paper of this kind, and inevitably it is impossible to provide the full details of the previously published papers, but in the current form, I think the main message of Section 4 is lost. For instance:
Section 4.1:
* Why is the use of embeddings a priori avoided. Might it not be possible to use embeddings to encode the "reasoning structure" in some way?
* It was not clear to me what the role of the support sets was. Are these used to create training examples?
* page 7 line 24: what is the meaning of this 4-tuple?
In Section 4.2, I could not follow the explanation of pointer networks, or why they would be particularly suitable for reasoning about RDFS.
To improve the paper, one possibility might be to only focus on one of these two cases, but to present it in more detail. It might also be useful to provide an example to illustrate the different steps (Table 2 and Figure 1 do this to some extent, but these examples are not complete, nor self-contained).
The conclusions talk about scaling op DDR to the order of millions of triples. At this scale, a large part of the problem is about efficient methods for selecting relevant premises, so we would presumably imagine a kind of premise retrieval engine that works in unison with the actual reasoning module. So perhaps the work on formula embeddings for automated theorem proving is again relevant here.