Dear Editor,
We are pleased to submit the revised manuscript titled "Towards end-to-end ASP computation" for consideration in the Journal of Neurosymbolic Artificial Intelligence. This revision addresses the comments and suggestions provided by the reviewers, and we have made significant improvements to enhance clarity and strengthen the manuscript.
The manuscript presents an end-to-end approach for answer set programming (ASP) in logic programming, leveraging vector space formulations and stable model computation through cost minimization with loop formulas. The approach, grounded in a ReLU neural network inspired by Boolean formulas in DNF form, offers a novel contribution to the intersection of logic programming and neural computation.
We confirm that this manuscript has not been published elsewhere, is not under consideration by another journal, and has been approved by all coauthors for submission.
Thank you for considering our revised manuscript. We look forward to your feedback.
Sincerely,
Akihiro Takemura
------------------------------------------------------------------------------
We appreciate the time and effort that you and the reviewers have dedicated to providing your valuable feedback on our manuscript.
We have revised the manuscript, incorporating most of the reviewers' suggestions.
Here is a point-by-point response to the reviewers’ comments.
------------------------------------------------------------------------------
[By Anonymous User]
Comments:
1.20 (P1 L20) Does the pre-computation method of Section 4.3 involve a symbolic solver?
Response: No, it does not. It is based on the iterative applications of matrix-vector multiplications to filter out atoms that are false in all stable models.
1.23 The last sentence of the abstract does not seem to have much justification in the text. It should be made clear this would be for future work to decide.
Response: We have deleted this line from the abstract (P1 L22), and the corresponding line in the introduction was edited (P2 L1), accordingly.
2.21 Maybe make it clear in the introduction what the syntax accepted by the method is. E.g. some of the examples involve an encoding of a choice rule, so it appears it is normal logic programs without classical negation?
Response: We have reworded this sentence to make it clear that the program we refer to in this paper is propositional normal logic program (P2 L21).
2.45 Although the authors use the negation symbol, as the programs are logic programs does this indicate negation as failure? Doesn't the semantics of stable models indicate this?
Response: The literals that are preceded by the ‘\neg’ symbols are indeed negation as failure (NAF) literals. We have added a sentence in the Preliminaries to clarify this (P2 L41).
3.17 Perhaps not all readers will be familiar with loop formulas, so for the sake of completeness of the paper they should be introduced here.
Response: We have added the definition of (conjunctive) loop formula to (P3 L31).
3.32 The paper says later that a loop L={a} requires a self-referencing rule. Perhaps point that out here.
Response: We have added a sentence to clarify this (P3 L28).
4.18 Examples such as this to exemplify the representation are very helpful. E.g. one would be useful around P5 L16 ff (could be in an Appendix) and p11 L 23.
Response: We have added examples for evaluating the formula at P5 L31 (Section 3.2), evaluation of a reduct (P6 L39) and representing the loop formula at P10 L21 (Section 4.1)
5.7 Why do the authors use De Morgan's law to rewrite conjunctions? It doesn't seem to save much and could be confusing given rules are always written with conjunctions.
Response: With the re-written conjunctions (as disjunctions), we only require a single false literal in the body to falsify the rule (P5 L23), which simplifies the computation, specifically the thresholding. Previously, for example in [Takemura and Inoue 2022], the thresholding required row-by-row thresholding based on the number of true literals in the body, which is not needed in this case.
6.33 Could the proof of Proposition 2 be moved to an Appendix and be given in full?
Response: The proof of Proposition 2 is now moved to Appendix A.1.
7.34 and 8.10 and 9.30 and L7 in Algorithm 1 Are these the same J_{SU}?
Response: Yes, and missing 0.5 is added back into L_{SU} in P7 L51.
I think the 0.5 cancels out in the computation somewhere?
Response: Indeed, 0.5 (or 1/2) cancels out in the derivation of the Jacobian.
In any case I would suggest moving the whole sequence of computations of derivatives on P8 and 9, and the derivation of J_{ac} (which is not given) to the appendix and including more explanation.
Response: We have moved the derivations to Appendix B, which now includes the derivation of Jacobians for the supported model computation, constraints and loop formula.
7.49 Are these standard results? If so give a citation, else prove at least one of them in an appendix.
Response: We have added a note on these notations in Appendix B.
Equation (12). This looks very much like gradient descent. Given only vectors are involved, I guess the Jacobian is not a matrix but a vector. Can this be clarified.
Response: Yes, it is actually a gradient vector, and we have added a sentence to clarify this (P8 L10).
Is the claim on 10.21 that the update decreases the vector u obvious? I suppose in practical terms it should, since the gradient descent rule is applied. Given the loss function I would expect it to decrease J_{SU+c} (not the vector u). Is there any theoretical guarantee?
Response: The update rule decreases L_{SU+c} (now P9 L31), not u, since it takes a step based on the gradient information.
10.36 I m not sure if Section 3.6 adds much to understanding the paper. Maybe it could be put in a discussion section after the evaluation?
Response: Thank you for the suggestion. We have moved this part to the related work section (P20 L41).
Equation (13) This equivalent form off loop formulas from [24] is less familiar than the original. Assuming loop formulas are moved to the preliminaries section, an example would be helpful to clarify the definition and revised definition.
Response: We have moved the contents of the footnote to the main text (P10 L20), and added the definition of the conjunctive loop formula in the Preliminaries. We have also added an example here (P10 L21) to further clarify the matrix encoding.
13.6 This comment seems to be very pertinent to the claim in the paper that the method finds stable models and rules out supported models. In addition, even if loop formulas are included in full, Algorithm 1 may not reach a global minimum, so the computation may not converge. Perhaps this should be pointed out in the Introduction.
Response: In addition to the comment “… although LF_max and LF_min can exclude (some of) non-stable models, they do not necessarily exclude all of non-stable models.” (P12 L210), we have added a sentence in the Introduction (P2 L28) to clarify that the computation may not always converge to a global minimum.
13.9-13.11 Can the authors elaborate on this claim? In particular, the authors claim they demonstrate some partial loop formulas help guide the gradient descent process to a root of J_{SU+c+LF}, which is the cost function with the full loop formulas, i.e. the root corresponds to a stable model. But looking at the results of experiment 5.3, which is, as I understand it, the only one where loop formulas are needed, and looking at figure 5, it would appear that not using the loop formulas (no_LF) they were able to find a stable model faster than using some of them (LF_max and LF_min). How does this fit with the claim in 13.9-13.11?
Response: As we noted in the paragraph starting at P18 L43, using the LF heuristics is not always a good policy. In particular, while LF_{max} does guide the search process to the correct stable model, it causes extra computation extra computation and makes the process slower. On the other hand, as noted LF_{min} works adversely and guides the process away from the model.
13.32 How is LM(P+) computed? Is it done symbolically?
Response: No, it is not computed symbolically. LM(P^{+}) is computed by the repeated application of matrix-vector multiplications.
14.14 Can this proof be given in full in the Appendix?
Response: We have added a few additional lines to this proof in the main text.
15.47 "seems rather high considering there are six solutions" English is odd - implies that fewer different solutions should have been found. Is this the intention? I guess it is related to the notion of "another solution constraint" mentioned on P17. Without this constraint it might be surprising that almost all solutions were found.
Response: We have re-worded this sentence to make the intention clearer (P15 L1).
16.46 Why are (4) and (6) not encoded as (1)? I can understand perhaps for (6), but not for (4). This is a case where more details in an Appendix would be helpful as it is a good example for understanding the method. The details should include the content of the 197 rules, etc. Re (1), since the limit on the size of the set of H_{i,j} is j_{k}, which depends on I, why are there 36 H_{i,j} atoms? I would have understood if (1) had been written with limit up to N.
Response: This is an encoding based on the SAT encoding by Zhou 2020 [1], and the main idea in their paper is that these 6 conditions guarantee that the resulting graph is Hamiltonian. (1) is an XOR on the outgoing edges, (4) is an XOR on the incoming edges, and (6) is an XOR on the visitation count (visited exactly once). Since the full program matrix is too verbose, we included a sketch in Appendix C.
[1] N. Zhou. In Pursuit of an Efficient SAT Encoding for the Hamiltonian Cycle Problem. CP 2020.
In Figure 4 does the precomp time include the time for pre-computation?
Response: Yes, it includes the time for precomputation.
18.22 Looking at an instance with 4 nodes, {a0 .. a3}, P_{4n}, n=3, appears to have a minimal loop between a0 and a2 and a0 and a1. Is what is written correct? Given that the results concerning the LF heuristics are counter-intuitive (see 19.37) this is a case where a bit more detail would help.
Response: We only consider even n here (P17 L31). We have added an example of P_{4\_n} where n=4 in P18 L1.
20.23 Is this point important to mention when Algorithm 1 is introduced?
Response: This is a specific case where precomputation managed to remove all false atoms (successfully).
20.42 -20.49 It should be made clearer that by using an ASP solver to find stable models the problems addressed in this paper are not encountered. If the relevance is the neural front end of the two systems discussed, please point out that that part is purely concerned with image perception. Perhaps in future work the authors' could discuss whether such a front-end is possible with their system?
Response: We have added a paragraph in the related work section (P20 L25), which clarifies the former part. As for the latter part, we have added it to the future work (P21 L23).
In Section 7 it would be nice to see some pointers for future work, which would enhance the significance of this work.
Response: We have added a few sentences at the end as the pointers for future work (P21 L23).
Minor items:
3.20 If the notion "full" doesn't appear elsewhere in the paper it could be omitted.
Response: This “full” notion is now deleted.
5.46 Would this example be clearer set out as the earlier example is?
Response: We have expanded this example (P5 L32).
13.47 G^{-} is not defined anywhere I think
Response: We have added the definition in the preliminaries. G^{-} is the set of negative literals in the body (P2 L42).
Response: In addition, the following are edited accordingly.
12.48 (P12 L48) Only one "heuristics" needed
16.47 There is some confusion about naming programs here (Should P2 and C2 be P3 and C3?)
18.27 Should be LF_{min}
------------------------------------------------------------------------------
[By Arseny Skryagin]
Suggestions on Improvement:
I would like to use the opportunity to offer a few suggestions on improving the overall reading experience.
1. To recall the depicted notation P=(D, Q) (l. 16, p. 5), I used the mnemonic device of PQ method (from numerics) and reduced quadratic equation (from solving quadratic equation). It would be otherwise difficult to remember that the matrix Q stands for conjunctions of the program P. I suggest to use instead P=(C, D) (conjunctions, disjunctions). It feels more natural to read and to remember the matricized formulation of a program P. To prevent any potential confusion, I would rather write \widehat{\cdot} to annotate the constraints in any matrix.
Response: Thank you for the suggestion. P=(D,Q) is now written as P=(C,D) where C is the matrix representing the conjunctions. As suggested, we use \widehat{C} for representing the constraint matrix. Because C was originally used to denote the set of constraints (in the program), we now use K instead to refer to the constraints in the program. Furthermore, all matrix and vector notations are now in *bold* as to avoid confusion, for example, the program P (italic P) and the program matrix P (bold upper case).
2. Likewise, it would be much easier to remember the binary vector s_{I} as a vectorization of I instead of u_{I} (l. 13, p. 5). The benefit comes in double for the Algorithm 1 (top of p. 10) since the goal of the method is to find either a supported or stable model of the program P.
Response: We use s_{I} instead of u_{I} throughout this revised manuscript.
3. A similar remark goes for the matrix M^{(1)} (l. 24, p. 6). I would prefer to use T denoting the truth values of rule bodies in P^{I} evaluated by I. To denote the transpose of a matrix D, I would use D^{t} to prevent any confusion.
Response: We now use C^{pos} or C^{neg} to refer to the left (positive literal part) and right (negative literal part) of the matrix, respectively. We believe this is easier to understand than the overloaded T, t, or (1) notation, and is closer to the original meaning in respective cases. For example, in the original text, M^{(2)} was used in the computation of the reduct (remove any rules whose negative literals are false in I), and M^{(1)} referred to the evaluation of the positive parts of the rule bodies.
4. Using L to annotate cost functions ((5) l. 34-35, p. 7, (14) l. 33 p. 11, etc.) is renown across many scientific communities. For me personally, it was somewhat cumbersome to understand the derivations on p. 8 and the update u_new in (12) (l. 45-48 p. 9) since J is usually used to annotate a Jacobian matrix (operator).
Response: We now use L instead of J for loss functions. The notation for the Jacobian (Ja) is kept as is.
5. To prevent the overloading, I would rather write S (as in spiral) instead of L for loops (l. 13 p. 11). Further, using u, v and w instead of k, s and t for indices.
Response: As suggested, we use u, v, w instead of k, s, t, where appropriate. In cases where k was used in the context of constraints, we kept k for consistency since capital K now refers to the set of constraints.
Questions:
Although, I read the work multiple times, I still have some questions, which I could not answer myself. Thus, I would like to ask the authors for help to further my understanding.
1. What expresses the term J_{a_{SU_{*c}}} (l. 42-43, p. 9)?
Response: This was a typo and should be “+” instead of “*”, it is the Jacobian (gradient vector) of J_{a_{SU_{+c}}}.
2. On pp. 9-10, it had been explained that for the random initialization of u(i) ~ N(0, 1) + 0.5 was used, where N(0, 1) is the standard normal distribution. My question is, if and how one can prove that using the suggested perturbation of u ← 0.5(u+delta+0.5) (l.26-2u, p.10) is enough to guarantee for the Algorithm 1 to find all supported/stable models regardless of the drawn initialization of u?
Response: The perturbation on its own cannot guarantee to find all models, since, as the reviewer points out, it is a random initialization method. We use the “another solution constraint” which should guide the algorithm to another solution, which we expect to be helpful when enumerating solutions. However, when the algorithm fails to find another model even with another solution constraint, it is not necessarily a proof that another model does not exist, since this is an incomplete solver.
3. In the section 5, there are quite a few examples to be found. I was unable to explain myself the value choice for l2 and l3 constants. I am convinced that any interested reader will pose the same question. Could you possibly elaborate on the choices presented for the different tasks?
Response: For the examples presented in the paper, the values for l2 and l3 were set consistently at 0.1. The value of l4 in the loop formula example was set at 1. As for the choice of the values themselves, we generally choose 0 \leq l2(l3) \leq 1 since we expect them to work like regularization terms and satisfying them (setting the regularization terms to 0) does not necessarily guarantee that the solution is a model, thus the first term (supported model) is more important.
4. If a program P has numerous (100 thousands and more) stable models, I do not see any other solution other than trying/excluding using LF-heuristic and adding any previously found stable model as another constraints to the program and starting many trials of Algorithm 1 in parallel. Is there a way to guarantee that all support/stable models are indeed found and stop the exhaustive search?
Response: It is not possible to guarantee that all models are enumerated using just Algorithm 1, because it is incomplete. The heuristic using the another solution constraint involves knowing all possible models beforehand which may not be feasible for a large program.
------------------------------------------------------------------------------
This is a RESUBMIT of #662-1642