Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations

Tracking #: 932-1955

Flag : New Paper Submitted

Authors: 

Bradley Allen
Prateek Chhikara
Thomas Ferguson
Filip Ilievski
Paul Groth

Submission Type: 

Article in Special Issue (note in cover letter)

Full PDF Version: 

Cover Letter: 

Dear Editors and Reviewers, We thank you for your thoughtful and constructive feedback on our manuscript "Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations" (Tracking #901-1914), submitted for the Special Issue on NeSy 2025 Extended Papers. We have carefully addressed all concerns raised in the reviews. Below we summarize the key revisions. To support ease in reviewing the submissions, we have indicated where substantive changes have been made in the submission by highlighting the relevant passages in red. REVIEWER #1 CONCERNS 1. Coverage Trade-off Insufficiently Addressed We added a dedicated subsection "The accuracy-coverage trade-off" (pp. 11-13) that frames coverage reduction as a feature of selective classification rather than a limitation. We also added Table 14 (p. 42) showing truth value distributions across 10 topic categories, revealing which domains produce more gaps versus gluts. 2. Insufficient Comparison with Related Work We expanded Section 2 to clarify why direct benchmark comparison is methodologically inappropriate: our approach addresses a different problem than systems like Logic-LM or LINC—namely, how to formally integrate unreliable knowledge sources while preserving soundness and completeness guarantees, rather than benchmark accuracy on clean reasoning tasks. We added Table 7 (p. 21) comparing ACrQ features with Description Logics. 3. Limited Verbalization Strategy We added an extensive "Verbalization" subsection in Section 7 (pp. 19-20) that formally characterizes when formulas are evaluable, distinguishes entity recognizability from predicate interpretability, provides concrete examples, discusses three verbalization strategies, and offers practical guidelines for knowledge base design. 4. Scalability and Computational Complexity We added a "Computational complexity" subsection (pp. 17-18) with formal complexity analysis, Table 6 showing timing breakdown, and discussion of optimization strategies from DL reasoners that could scale throughput by 1-2 orders of magnitude. 5. API Costs and Latency The complexity section now discusses parallelization strategies, local LLM deployment options, and estimates that knowledge bases with 10^4-10^5 ground statements are tractable with standard optimizations. 6. Questions for Authors All four questions are now addressed: verbalization automation (Section 7), belief revision (p. 19-20), worst-case complexity (pp. 17-18), and error propagation (p. 21). REVIEWER #2 CONCERNS 1. Problem Statement Clarity The problem is stated in the abstract ("How can we harness LLMs' broad-coverage parametric knowledge in formal reasoning despite their inconsistency?"), elaborated in Section 2's taxonomy of approaches (a)-(d), and concretely demonstrated through the medication safety use case. The core insight — "instead of trying to get an LLM to reason using logic, we get a logic to reason using an LLM" — distinguishes our approach from prior work. 2. Disconnection Between Experiments and Reasoner We substantially expanded Section 6 with a medication safety reasoning use case (pp. 15-17) using a knowledge base of 940 statements (228 asserted, 712 inferred). The system detected 92 gluts corresponding to medically significant errors, demonstrating concrete integration between bilateral evaluation and tableau reasoning. 3. Discussion of Neural Reasoning Approaches (Chain of Thought) Section 2's first paragraph explicitly categorizes chain-of-thought methods (Wei et al., 2022; Kojima et al., 2022) and Tree of Thoughts (Yao et al., 2023) as type (a) prompt-based approaches, contrasting them with our interpretation-based approach (d). 4. Tractability and Complexity Addressed in the new "Computational complexity" subsection (pp. 17-18). 5. Translation Failures and Misleading Interpretations The new "Verbalization" subsection (pp. 19-20) extensively discusses conditions for evaluability, examples of failures, and guidelines for knowledge base design. 6. Deeper Discussion of Results Table 14 with topic-by-topic analysis is now in Appendix D.4 (p. 42), with discussion of model-specific epistemic signatures and domain-specific patterns. We believe these revisions comprehensively address the reviewers' concerns while strengthening the paper's theoretical and empirical contributions. Again, we thank the reviewers for leading us to what we believe is a submission worthy of the journal's standards. Sincerely, Bradley P. Allen (on behalf of the authors)

Previous Version: