Neural Markov Prolog

Tracking #: 813-1805

Flag : Review Received

Authors:

Alexander Thomson

David Page

Responsible editor:

Luc De Raedt

Submission Type:

Regular Paper

Full PDF Version:

nai-paper-813.pdf

Cover Letter:

Special Issue on Neurosymbolic Generative Models

Approve Decision:

Approved

Previous Version:

Neural Markov Prolog

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 05/Mar/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Bad
Organization of the paper: Poor
Level of English: Satisfactory
Overall presentation: Bad

Detailed Comments:

This paper introduces Neural Markov Prolog, a language based on Markov logic and Prolog, designed to construct neural network models while providing insights into their statistical semantics.
The paper offers an interesting perspective that could be valuable to the neurosymbolic community. However, its contribution is significantly undermined by poor structure and unclear writing. In its current form, the paper is difficult to follow, and substantial rewriting is necessary to make it readable. I recommend rejection, with the possibility of resubmission after a thorough revision. Below, I provide a detailed review along with suggestions for improvement.

Strengths 
The paper's primary contribution lies in leveraging recent findings by the authors on the connection between Markov network semantics and deep neural networks and applying these insights to logic-based Markov networks, specifically Markov Logic.
A particularly strong point, currently buried in the text, is the following statement:
"Nonetheless, the resulting restricted language can now benefit from the much more efficient training of deep neural networks while retaining a connection to first-order logic through its basis in Markov logic."
This perspective is not revisited later, as the authors shift their focus to using a logic-based language to define existing architectures. This shift is less novel and less compelling. I recommend emphasizing the probabilistic semantics aspect more clearly. Additionally, the paper should reference and discuss related frameworks that have explored similar ideas, such as Lifted Relational Neural Networks (Sourek et al.) and its recent implementation, PyNeuralLogic. 

Weaknesses

Writing and Structure
The primary weakness of this paper is its lack of structure, which leaves the reader unclear about the contribution of each section. Technical terms and concepts are introduced in an ad-hoc manner, often without proper definition. Below are some key issues:
* Section 2: This section is very imprecise. While the connection to another paper of the authors is acknowledged, the fundamental concepts need to be clearly outlined to ensure the paper is self-contained. For example, the exponential distribution is introduced without specifying its domain, and the notation v arrow is unclear. The discussion of SGD is similarly ambiguous: What objective function does it optimize? What data is used? Moreover, the comparison with probabilistic inference is not well-defined—does it refer to marginal inference on outputs? joint inference over all variables? The two process that should coincide (SGD and some inference on the markov network) are not introduced but are discussed and argumented.
* Section 3: This section consists of a dense block of text filled with technical details. I suggest merging it with Section 4 and structuring it around a step-by-step dissection of the example in Section 4 to couple proper definitions with intuitive examples. Additionally, since the paper introduces a new language, a more formal definition of its syntax is expected—either as an extension or restriction of Prolog syntax. If syntax is only relevant in Section 6, the preceding discussion should be reframed to focus on core concepts independent of syntax.
* Section 5: This section appears to outline a stepwise construction of a Markov network but lacks a clear introduction and motivation. It seems to describe the transformation from a Markov logic network to another Markov network that corresponds to a neural network. However, it is unclear which neural network is being constructed—presumably the one in Figure 2, which is only referenced at the end of the section. A clearer roadmap for the transformation process is needed.
* Section 7: While this section is reasonably structured, much of its content would be better suited for an appendix. Instead, the focus should be on highlighting a few key rules or features that enable the described architectures.

Motivation and Novelty
As previously mentioned, the paper’s main motivation is weak. The idea of using a logic-based language to define neural networks is not inherently novel—several frameworks have already explored similar approaches. The real novelty lies in incorporating probabilistic semantics, yet this aspect is largely inherited from prior work rather than developed in this paper. The role of logic in this context is also unclear.
A potentially more compelling direction is the use of neural network learning to improve inference in such models. However, this idea is only briefly mentioned, and it remains unclear to what extent this approach can accelerate existing neurosymbolic models based on statistical relational frameworks. Additionally, the claim that current neural network models lack more integrated description ("these neural networks tend to be treated as entirely separate architectures") is an oversimplification and should be revised or substantiated with further discussion. As a simple example GNNs can be used to describe all simpler architectures (DNN = GNN on a node, RNN = GNN on a chain, CNN = GNN on a grid, Transformer = GNN on a fully connected graph).

Suggested Improvements

1. Restructure the paper for clarity. Provide a clear roadmap for each section, ensuring that concepts are introduced systematically rather than scattered throughout the text.
2. Improve definitions and explanations. Clearly define key terms and notations upon first use and provide step-by-step examples where appropriate.
3. Enhance the formalization of the proposed language. Introduce a precise syntactic and semantic specification, even if as an extension of Prolog and Markov logic
4. Rework Section 5 for better readability. Clearly explain the transformation process from Markov logic networks to neural networks and explicitly relate it to the provided figures.
5. Engage with related work more thoroughly. Discuss similar frameworks such as Lifted Relational Neural Networks and PyNeuralLogic to situate the contribution in a broader context.

Review #2 submitted on 06/Jan/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes
Adequacy of the bibliography: No

Presentation:
Adequacy of the abstract: No
Introduction: background and motivation: Limited
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

Overall, I think the concept of the paper is nice, but the presentation itself needs work.

The first remark I have is that the 'why' of the paper is not very clear from the start. I.e. what was being introduced that cannot be done already with existing frameworks? This is also made worse by the lack of a concrete related work section, in which modern neural symbolic methods are compared, which also leads to a very minimal bibliography.
Throughout the paper, I was unsure whether it introduces a new neural symbolic framework, or whether the proposed language was mostly created to analyze existing neural architectures. However, it seems that is meant to do both, where the former is not yet explored but is seen as future work.

The structure of the paper could also use some improvement. For example, in section 2, Markov logic is already used and discussed, but only more thoroughly defined starting on line 48 of page 2. Similarly, Figure 1 is already referenced in section 2, but only explained in section 3. Ideally, the captions of the figure should also be more stand-alone. E.g., in figure 1, what do the colours mean? What does the 1 in the nodes represent?

I also have some points that I think could use further attention on technical aspects. The paper introduces a new Prolog-like programming language, but already shows two different versions: one for Figure 1 and the simple example, and a newer version with different syntax starting from page 6. This does not help the reader, and rather feels like a chronological report of the development of the language.
Section 5 was also not easy for me to follow. The authors state that SGD on the constructed network will not exactly match the semantics of the Markov network. It would be good to show this to the reader, or give some intuition of how large the difference would be, and what the trade-off between added complexity and accuracy is. This could also have been shown experimentally.
Although I appreciate the connection with the clear semantics of Markov logic, the use of the options in the final syntax seems to break this connection. It would be good to make this clear to the user whether such networks are still covered by the theoretical results earlier in the paper. For example, in algorithm 9, the activation:relu is added to the clause options. What would it mean for this to be added to the only one of the clauses for hidden in algorithm 7? Would this make sense, or should this be considered a syntactical error?
Furthermore, the paper states that the semantics for pure NMP is easily extended to use all of Prolog, but this is a strong claim to make. I.e, how does it deal with non-logical constructs such as cuts, or with negation?

My final point on the paper is that the claims is make are not substantiated. It would be good to include theoretical or even experimental proofs, showing that SGD and Probabilistic inference do line up. A more thorough comparison with existing, modern, neural-symbolic systems is also needed. What is the connection of this system with a system such that constructs arithmetic circuits using logic?

To summarize, I am enthusiastic about the concept of the paper, but I think both the presentation and the technical of aspects of this paper need to be developed further.

Review #3 submitted on 29/Jan/2025

By Connor Pryor
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: No
Introduction: background and motivation: Limited
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

Summary:

The authors present Neural Markov Prolog (NMP), a novel language combining Markov logic and Prolog syntax/semantics to define neural network architectures. This work extends the ideas in [1], which demonstrated how infinite tree-structured PGMs can correspond directly to neural networks. Specifically, the authors leverage the equivalence of derivatives of marginal log-likelihood and cross-entropy loss derivatives in a neural network with a sigmoid. While [1] primarily focused on theoretical equivalence, this paper introduces a concrete language (NMP) to define such neural network architectures and provides examples of its application.

Comments and Questions:

1. Abstract Clarity:
============================================

1a. The abstract needs a clearer summary of the contributions. A significant portion of the paper (around one-third) is dedicated to formulating popular neural architectures within the NMP framework, which is not sufficiently emphasized in the abstract.

2. Related Work:
============================================
I believe the smaller bibliography is appropriate given the focus on constructing neural networks from programmatic symbolic structures, which is a less-explored area of NeSy AI. That said, the introduction briefly mentions other NeSy approaches without sufficiently comparing this work to related systems.

2a. How does NMP compare to some of the more popular NeSy approaches like Neural Machines, Delta ILP, LTNs, DeepProbLog, Semantic Loss, TensorLog, NeuPSL, etc.? Are there overlaps in semantics?

2b. Creating a separate related work section could provide more context to the reader and make it easier to relate the work to the broader NeSy field.

3. Clarifying Contributions:
============================================
The paper would benefit from a clearer outline of its contributions in the introduction.

3a. Like in the abstract, is defining popular neural architectures a contribution, or is it secondary to introducing NMP syntax and semantics? If it is not a contribution, why describe so many systems?

3b. Additionally, clarify whether NMP introduces any novel inference or learning algorithms or if it focuses purely on creating a neural network with symbolic syntax.

3c. Is there a codebase for creating these neural models, i.e., translating the prolog program into a neural model or turning a neural network into logic? In the intro, it states: "Potential benefits include automatic translation of logic or text into neural networks, and of neural networks into logic or text." If not, this seems like a good direction for future work.

4. Example in Section 4:
============================================

4a. The example in Section 4 requires more detail or clarification:
- Is the grounded structure shown in Figure 1 the corresponding neural network?
- I may have missed this, but how do infinite weights propagate information in this setup? A brief explanation would help, and if was described in [1] a bit of background would be nice.
- It’s unclear how Figure 1 supports inference for friend(anna, bob).

5. The Grounded Neural Model:
============================================

5a. I may have missed this, but I am a touch confused about what section 5 is saying with respect to this paper. Is it trying to show how to translate a neural network into a Markov network structure? Or is this the grounded neural structure? Is this a contribution or just background from [1]?

6. Figure 3 Typo?:
============================================
6a. Should some of the grey nodes read input(2, ...) instead of all nodes reading input(1, ...)? Please double-check for accuracy.

7. Scalability of the Framework:
============================================
Section 7 does not discuss the scalability of expressing large neural architectures. While the fully connected network example (two layers, two units each) demonstrates the framework's basic functionality, it leaves questions about how the framework handles larger architectures.

For instance, if a 100-layer fully connected network were required, would each layer need to be explicitly defined, or does the framework provide a shorthand mechanism for specifying such structures? Addressing this point could reinforce the abstract's claim that NMP is a “flexible framework to elegantly express neural architectures.” If such scalability mechanisms exist, they should be explicitly acknowledged (e.g., “This is a simple example; more complex architectures can be defined using [specific method or feature]”). Alternatively, if scalability is a current limitation, the discussion should clearly state this.

9. Overall:
============================================
9a. How does this language handle cyclic dependencies? For instance, if there are two rules: smokes(X):- cancer(X) and cancer(X):-smokes(X), how would the neural structure generate? Obviously, this is just an example for illustration and does not significantly mean anything.

[1] On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models.

Tracking #: 813-1805

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Previous Version:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 813-1805

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Previous Version:

Tags:

Journal Info

Submit

For Reviewers

Links