** By ***Anonymous User*

Review Details

Reviewer has chosen to be *Anonymous*

**Overall Impression:** Average

**Content:**

**Technical Quality of the paper:** Good

**Originality of the paper:** Yes, but limited

**Adequacy of the bibliography:** Yes

**Presentation:**

**Adequacy of the abstract:** No

**Introduction: background and motivation:** Good

**Organization of the paper: **Needs improvement

**Level of English:** Satisfactory

**Overall presentation:** Average

**Detailed Comments: **

Paper summary:

This is an interesting paper and should be published. Some justification for the spot scores are in order: the abstract would be good if unfounded claims are removed or toned down; organisation and overall presentation would be good if the cluttered Maths were moved to an Appendix and a few more details or examples were included.

The paper's main contribution is a novel algorithm, based on Newton's method, to compute stable models of propositional ASP logic programs represented as matrices. In particular, it aims to avoid computing supported models by including loop constraints in the representation. In fact, there is a slight relaxation in the loop constraints included so that the method maintains scalability and it is only near the end of the paper in a paragraph on P13 that it is admitted that in some cases some of the returned models are supported but not stable.

The paper is reasonably clear, although it could be made clearer in quite a few places (see below). In particular, some additional exemplifying examples would be welcome.

Comments:

1.20 (P1 L20) Does the pre-computation method of Section 4.3 involve a symbolic solver?

1.23 The last sentence of the abstract does not seem to have much justification in the text. It should be made clear this would be for future work to decide.

2.21 Maybe make it clear in the introduction what the syntax accepted by the method is. E.g. some of the examples involve an encoding of a choice rule, so it appears it is normal logic programs without classical negation?

2.45 Although the authors use the negation symbol, as the programs are logic programs does this indicate negation as failure? Doesn't the semantics of stable models indicate this?

3.17 Perhaps not all readers will be familiar with loop formulas, so for the sake of completeness of the paper they should be introduced here.

3.32 The paper says later that a loop L={a} requires a self-referencing rule. Perhaps point that out here.

4.18 Examples such as this to exemplify the representation are very helpful. E.g. one would be useful around P5 L16 ff (could be in an Appendix) and p11 L 23.

5.7 Why do the authors use De Morgan's law to rewrite conjunctions? It doesn't seem to save much and could be confusing given rules are always written with conjunctions.

6.33 Could the proof of Proposition 2 be moved to an Appendix and be given in full?

7.34 and 8.10 and 9.30 and L7 in Algorithm 1 Are these the same J_{SU}? I think the 0.5 cancels out in the computation somewhere? In any case I would suggest moving the whole sequence of computations of derivatives on P8 and 9, and the derivation of J_{ac} (which is not given) to the appendix and including more explanation.

7.49 Are these standard results? If so give a citation, else prove at least one of them in an appendix.

Equation (12). This looks very much like gradient descent. Given only vectors are involved, I guess the Jacobian is not a matrix but a vector. Can this be clarified.

Is the claim on 10.21 that the update decreases the vector u obvious? I suppose in practical terms it should, since the gradient descent rule is applied. Given the loss function I would expect it to decrease J_{SU+c} (not the vector u). Is there any theoretical guarantee?

10.36 I m not sure if Section 3.6 adds much to understanding the paper. Maybe it could be put in a discussion section after the evaluation?

Equation (13) This equivalent form off loop formulas from [24] is less familiar than the original. Assuming loop formulas are moved to the preliminaries section, an example would be helpful to clarify the definition and revised definition.

13.6 This comment seems to be very pertinent to the claim in the paper that the method finds stable models and rules out supported models. In addition, even if loop formulas are included in full, Algorithm 1 may not reach a global minimum, so the computation may not converge. Perhaps this should be pointed out in the Introduction.

13.9-13.11 Can the authors elaborate on this claim? In particular, the authors claim they demonstrate some partial loop formulas help guide the gradient descent process to a root of J_{SU+c+LF}, which is the cost function with the full loop formulas, i.e. the root corresponds to a stable model. But looking at the results of experiment 5.3, which is, as I understand it, the only one where loop formulas are needed, and looking at figure 5, it would appear that not using the loop formulas (no_LF) they were able to find a stable model faster than using some of them (LF_max and LF_min). How does this fit with the claim in 13.9-13.11?

13.32 How is LM(P+) computed? Is it done symbolically?

14.14 Can this proof be given in full in the Appendix?

15.47 "seems rather high considering there are six solutions" English is odd - implies that fewer different solutions should have been found. Is this the intention? I guess it is related to the notion of "another solution constraint" mentioned on P17. Without this constraint it might be surprising that almost all solutions were found.

16.46 Why are (4) and (6) not encoded as (1)? I can understand perhaps for (6), but not for (4). This is a case where more details in an Appendix would be helpful as it is a good example for understanding the method. The details should include the content of the 197 rules, etc. Re (1), since the limit on the size of the set of H_{i,j} is j_{k}, which depends on I, why are there 36 H_{i,j} atoms? I would have understood if (1) had been written with limit up to N.

In Figure 4 does the precomp time include the time for pre-computation?

18.22 Looking at an instance with 4 nodes, {a0 .. a3}, P_{4n}, n=3, appears to have a minimal loop between a0 and a2 and a0 and a1. Is what is written correct? Given that the results concerning the LF heuristics are counter-intuitive (see 19.37) this is a case where a bit more detail would help.

20.23 Is this point important to mention when Algorithm 1 is introduced?

20.42 -20.49 It should be made clearer that by using an ASP solver to find stable models the problems addressed in this paper are not encountered. If the relevance is the neural front end of the two systems discussed, please point out that that part is purely concerned with image perception. Perhaps in future work the authors' could discuss whether such a front-end is possible with their system?

In Section 7 it would be nice to see some pointers for future work, which would enhance the significance of this work.

Minor items:

3.20 If the notion "full" doesn't appear elsewhere in the paper it could be omitted.

5.46 Would this example be clearer set out as the earlier example is?

12.48 (P12 L48) Only one "heuristics" needed

13.47 G^{-} is not defined anywhere I think

16.47 There is some confusion about naming programs here (Should P2 and C2 be P3 and C3?)

18.27 Should be LF_{min}