Towards end-to-end ASP computation

Tracking #: 662-1642

Flag : Review Received

Authors:

Taisuke Sato

Akihiro Takemura

Katsumi Inoue

Responsible editor:

Raghava Mutharaju

Submission Type:

Regular Paper

Abstract:

We propose an end-to-end approach for answer set programming (ASP) and linear algebraically compute stable models satisfying given constraints. The idea is to implement Lin-Zhao’s theorem [1] together with constraints directly in vector spaces as numerical minimization of a cost function constructed from a matricized normal logic program, loop formulas in LinZhao’s theorem and constraints, thereby no use of symbolic ASP or SAT solvers involved in our approach. We also propose precomputation that shrinks the program size and heuristics for loop formulas to reduce computational difficulty. We empirically test our approach with programming examples including the 3-coloring and Hamiltonian cycle problems. As our approach is purely numerical and only contains vector/matrix operations, acceleration by parallel technologies such as many-cores and GPUs is expected.

Full PDF Version:

nai-paper-662.pdf

Cover Letter:

July 8, 2023 Editor-in-Chiefs Dear Sirs, My coauthors and I would like you to consider the manuscript entitled "Towards end-to-end ASP computation" for publication in the Journal of Neurosymbolic Artificial Intelligence. We propose an end-to-end approach for answer set programming (ASP) in logic programming by formulating ASP in vector spaces. Unlike previous approaches, we compute stable models of programs by cost minimization that considers special logical formulas called loop formulas. Our approach is based on a logically inspired ReLU neural network derived from Boolean formulas in DNF form. We believe that our proposal is relevant to the scope of your journal and will be of interest to your readership. The content of this manuscript has not been published elsewhere in part or entirety and is not under consideration by another journal. All authors have have approved the manuscript for submission to your journal. Thank you for your consideration. Sincerely yours, Taisuke Sato, National Institute of Informatics (NII), Japan E-mail: taisukest@gmail.com

Approve Decision:

Approved

Revised Version:

Towards end-to-end ASP computation

Tags:

Reviewed

Decision:
Minor Revision

Solicited Reviews:

Review #1 submitted on 22/Nov/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: No
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

Paper summary:
This is an interesting paper and should be published. Some justification for the spot scores are in order: the abstract would be good if unfounded claims are removed or toned down; organisation and overall presentation would be good if the cluttered Maths were moved to an Appendix and a few more details or examples were included.

The paper's main contribution is a novel algorithm, based on Newton's method, to compute stable models of propositional ASP logic programs represented as matrices. In particular, it aims to avoid computing supported models by including loop constraints in the representation. In fact, there is a slight relaxation in the loop constraints included so that the method maintains scalability and it is only near the end of the paper in a paragraph on P13 that it is admitted that in some cases some of the returned models are supported but not stable.

The paper is reasonably clear, although it could be made clearer in quite a few places (see below). In particular, some additional exemplifying examples would be welcome.

Comments:
1.20 (P1 L20) Does the pre-computation method of Section 4.3 involve a symbolic solver?
1.23 The last sentence of the abstract does not seem to have much justification in the text. It should be made clear this would be for future work to decide.
2.21 Maybe make it clear in the introduction what the syntax accepted by the method is. E.g. some of the examples involve an encoding of a choice rule, so it appears it is normal logic programs without classical negation?
2.45 Although the authors use the negation symbol, as the programs are logic programs does this indicate negation as failure? Doesn't the semantics of stable models indicate this?
3.17 Perhaps not all readers will be familiar with loop formulas, so for the sake of completeness of the paper they should be introduced here.
3.32 The paper says later that a loop L={a} requires a self-referencing rule. Perhaps point that out here.
4.18 Examples such as this to exemplify the representation are very helpful. E.g. one would be useful around P5 L16 ff (could be in an Appendix) and p11 L 23.
5.7 Why do the authors use De Morgan's law to rewrite conjunctions? It doesn't seem to save much and could be confusing given rules are always written with conjunctions.
6.33 Could the proof of Proposition 2 be moved to an Appendix and be given in full?
7.34 and 8.10 and 9.30 and L7 in Algorithm 1 Are these the same J_{SU}? I think the 0.5 cancels out in the computation somewhere? In any case I would suggest moving the whole sequence of computations of derivatives on P8 and 9, and the derivation of J_{ac} (which is not given) to the appendix and including more explanation.
7.49 Are these standard results? If so give a citation, else prove at least one of them in an appendix.

Equation (12). This looks very much like gradient descent. Given only vectors are involved, I guess the Jacobian is not a matrix but a vector. Can this be clarified.
Is the claim on 10.21 that the update decreases the vector u obvious? I suppose in practical terms it should, since the gradient descent rule is applied. Given the loss function I would expect it to decrease J_{SU+c} (not the vector u). Is there any theoretical guarantee?
10.36 I m not sure if Section 3.6 adds much to understanding the paper. Maybe it could be put in a discussion section after the evaluation?

Equation (13) This equivalent form off loop formulas from [24] is less familiar than the original. Assuming loop formulas are moved to the preliminaries section, an example would be helpful to clarify the definition and revised definition.

13.6 This comment seems to be very pertinent to the claim in the paper that the method finds stable models and rules out supported models. In addition, even if loop formulas are included in full, Algorithm 1 may not reach a global minimum, so the computation may not converge. Perhaps this should be pointed out in the Introduction.
13.9-13.11 Can the authors elaborate on this claim? In particular, the authors claim they demonstrate some partial loop formulas help guide the gradient descent process to a root of J_{SU+c+LF}, which is the cost function with the full loop formulas, i.e. the root corresponds to a stable model. But looking at the results of experiment 5.3, which is, as I understand it, the only one where loop formulas are needed, and looking at figure 5, it would appear that not using the loop formulas (no_LF) they were able to find a stable model faster than using some of them (LF_max and LF_min). How does this fit with the claim in 13.9-13.11?
13.32 How is LM(P+) computed? Is it done symbolically?
14.14 Can this proof be given in full in the Appendix?
15.47 "seems rather high considering there are six solutions" English is odd - implies that fewer different solutions should have been found. Is this the intention? I guess it is related to the notion of "another solution constraint" mentioned on P17. Without this constraint it might be surprising that almost all solutions were found.
16.46 Why are (4) and (6) not encoded as (1)? I can understand perhaps for (6), but not for (4). This is a case where more details in an Appendix would be helpful as it is a good example for understanding the method. The details should include the content of the 197 rules, etc. Re (1), since the limit on the size of the set of H_{i,j} is j_{k}, which depends on I, why are there 36 H_{i,j} atoms? I would have understood if (1) had been written with limit up to N.
In Figure 4 does the precomp time include the time for pre-computation?
18.22 Looking at an instance with 4 nodes, {a0 .. a3}, P_{4n}, n=3, appears to have a minimal loop between a0 and a2 and a0 and a1. Is what is written correct? Given that the results concerning the LF heuristics are counter-intuitive (see 19.37) this is a case where a bit more detail would help.
20.23 Is this point important to mention when Algorithm 1 is introduced?
20.42 -20.49 It should be made clearer that by using an ASP solver to find stable models the problems addressed in this paper are not encountered. If the relevance is the neural front end of the two systems discussed, please point out that that part is purely concerned with image perception. Perhaps in future work the authors' could discuss whether such a front-end is possible with their system?
In Section 7 it would be nice to see some pointers for future work, which would enhance the significance of this work.

Minor items:
3.20 If the notion "full" doesn't appear elsewhere in the paper it could be omitted.
5.46 Would this example be clearer set out as the earlier example is?
12.48 (P12 L48) Only one "heuristics" needed
13.47 G^{-} is not defined anywhere I think
16.47 There is some confusion about naming programs here (Should P2 and C2 be P3 and C3?)
18.27 Should be LF_{min}

Review #2 submitted on 12/Nov/2023

By Arseny Skryagin
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Excellent

Content:
Technical Quality of the paper: Excellent
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Excellent

Detailed Comments:

The overall summary:
In their work, Mr. Sato and his colleagues are proposing the novel non-symbolic ASP-solver. The main idea is using the matricized programs to formulate cost function. By calculating its roots using Newtons' method, one obtains, depending on the formulation of the underlying cost function, either a supported or a stable models. The main benefit of the approach is that once the translation is complete, finding the supported/stable model is reduced to matrix operations, which are easily scalable.

The evaluation:
The manuscript is carefully crafted, both regarding the language and the mathematical derivations, including proofs. The proposed algorithm is adjustable to both support and stable model search in the non-symbolic iterable way. Furthermore, authors are offering pre-computation to speed up the search. In my impression, the matricized program expressed as P=(D, Q) offers the insight, that both matrices D and Q have not-necessary a full rank. Consequentially, I strongly encourage authors, following the Sec. 3.6 to explore sparse formulation busting the computational speed. In doing so, it should become possible to cut the complexity of the vector space formulation from O(m*n) to hopefully O(m+n). Of course, seeing formulations of all syntactic elements listed in ASP-Core-2 by Calimeri et al. would convince many ASP-users to try out the proposed method. Overall, the work offers multiple contributions to the ASP-solving and thus to the neuro-symbolic AI in general.

I vote for the acceptance of this work and formally believe that it is a great contribution to the field of NeSy AI.

Suggestions on Improvement:
I would like to use the opportunity to offer a few suggestions on improving the overall reading experience.
1. To recall the depicted notation P=(D, Q) (l. 16, p. 5), I used the mnemonic device of PQ method (from numerics) and reduced quadratic equation (from solving quadratic equation). It would be otherwise difficult to remember that the matrix Q stands for conjunctions of the program P. I suggest to use instead P=(C, D) (conjunctions, disjunctions). It feels more natural to read and to remember the matricized formulation of a program P. To prevent any potential confusion, I would rather write \widehat{\cdot} to annotate the constraints in any matrix.
2. Likewise, it would be much easier to remember the binary vector s_{I} as a vectorization of I instead of u_{I} (l. 13, p. 5). The benefit comes in double for the Algorithm 1 (top of p. 10) since the goal of the method is to find either a supported or stable model of the program P.
3. A similar remark goes for the matrix M^{(1)} (l. 24, p. 6). I would prefer to use T denoting the truth values of rule bodies in P^{I} evaluated by I. To denote the transpose of a matrix D, I would use D^{t} to prevent any confusion.
4. Using L to annotate cost functions ((5) l. 34-35, p. 7, (14) l. 33 p. 11, etc.) is renown across many scientific communities. For me personally, it was somewhat cumbersome to understand the derivations on p. 8 and the update u_new in (12) (l. 45-48 p. 9) since J is usually used to annotate a Jacobian matrix (operator).
5. To prevent the overloading, I would rather write S (as in spiral) instead of L for loops (l. 13 p. 11). Further, using u, v and w instead of k, s and t for indices.

Questions:
Although, I read the work multiple times, I still have some questions, which I could not answer myself. Thus, I would like to ask the authors for help to further my understanding.
1. What expresses the term J_{a_{SU_{*c}}} (l. 42-43, p. 9)?
2. On pp. 9-10, it had been explained that for the random initialization of u(i) ~ N(0, 1) + 0.5 was used, where N(0, 1) is the standard normal distribution. My question is, if and how one can prove that using the suggested perturbation of u ← 0.5(u+delta+0.5) (l.26-2u, p.10) is enough to guarantee for the Algorithm 1 to find all supported/stable models regardless of the drawn initialization of u?
3. In the section 5, there are quite a few examples to be found. I was unable to explain myself the value choice for l2 and l3 constants. I am convinced that any interested reader will pose the same question. Could you possibly elaborate on the choices presented for the different tasks?
4. If a program P has numerous (100 thousands and more) stable models, I do not see any other solution other than trying/excluding using LF-heuristic and adding any previously found stable model as another constraints to the program and starting many trials of Algorithm 1 in parallel. Is there a way to guarantee that all support/stable models are indeed found and stop the exhaustive search?

Tracking #: 662-1642

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Abstract:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 662-1642

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Abstract:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links