ReasonX: Declarative Reasoning on Explanations

Tracking #: 948-1972

Flag : Review Received

Authors:

Laura State

Salvatore Ruggieri

Franco Turini

Responsible editor:

Guest Editors X-NeSy

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-948.pdf

Cover Letter:

Dear editors, We are pleased to submit our manuscript “ReasonX: Declarative Reasoning on Explanations” for consideration in the “Special Issue on Explainable Neurosymbolic AI (X-NeSy)” in Neurosymbolic Artificial Intelligence. Explanations play a crucial role in understanding complex machine learning (ML) models. However, current eXplanation in AI (XAI) methods suffer from several limitations, including insufficient abstraction, limited user interactivity, and inadequate integration of background knowledge. To address these issues, we propose ReasonX, an explanation framework based on expressions in an algebra of operators over theories of linear constraints. ReasonX provides declarative, interactive explanations for decision trees, either directly as the models under analysis or as surrogate models for arbitrary black‑box predictors. Users can express background knowledge using linear constraints and Mixed-Integer Linear Programming optimization over features of factual and contrastive instances, and interact with the answer constraints at various levels of abstraction, from fully specified instances to under-specified ones. We present the architecture of ReasonX, which consists of a Python layer, closer to the user, and a Constraint Logic Programming (CLP) layer, which implements a meta-interpreter of the query algebra. The capabilities of ReasonX are demonstrated through qualitative examples, and compared to other XAI tools through quantitative experiments. By using CLP as a core component of our approach, we are “leveraging symbolic methods as a core component of the modeling and interpretation process” (CfP of the special issue), for which we believe the contribution fits well within the scope of the special issue. We look forward to hearing from you. Sincerely, Laura State, Salvatore Ruggieri and Franco Turini

Approve Decision:

Approved

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 30/Mar/2026

By Jiri Nemecek
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Weak

Detailed Comments:

Firstly, I will note that I like the idea of REASONX, tackling often overlooked XAI desiderata like interactivity and working with underspecified samples. That being said, REASONX itself seems to have been published in the earlier works by the authors, and thus is not novel. The remaining contributions of the paper are (1) the algebra of operators over sets of linear constraints and (2) empirical evaluations. I found the algebra technically sound, but poorly motivated and generally lacking a goal. It is unclear to me if it is useful in any way, though admittedly, it is outside my expertise. The qualitative evaluations showcase the usage of REASONX well, though they are very similar to those in previously published work. The quantitative evaluation is novel, but limited to 2 compared methods, one for each use (factual X counterfactual). Chosen methods are too few, old (albeit established), and the results are mostly inconclusive and/or poorly presented.

Going point-by-point:
Major issues:
- claim of model agnosticism - REASONX is not model agnostic. Training a surrogate model and explaining it (without even evaluating feasibility in the original model) is a trivial solution to the problem of having a model-specific explanation tool. This can be easily done with any other model-specific tool. I could not find even a discussion of how the tree surrogates were learned (code in the appendix suggests scikit-learn, i.e., CART), let alone suggestions or studies of different approaches and their consequences. Not evaluating the true fidelity of the generated CEs is unacceptable when the claim is that the approach is model agnostic. The reported feasibility (assumed to be computed on test set data) is not informative, since REASONX generates CEs out of distribution and close to the decision boundary (as exemplified by the qualitative examples). The actual counterfactual feasibility might thus differ a lot.
- weak comparison - Only two established, but old methods DiCE and ANCHORS, even when newer ones are discussed in Related works. Still, from the older methods, Russell, C. (2019) might be more appropriate, as it handles diversity using MILP and might thus lead to similar CEs.
- lack of clarity - The algebra of operators over sets of linear constraints has unclear motivation and benefits, and the empirical evaluation is presented in a confusing way. For example, columns in Table 8 share values of metrics that are (by authors admission) not equivalent, making the comparison difficult. Many common questions are left unanswered. On what data was fidelity computed? How many samples were used to train the models? How many factuals were used to generate CEs? How were the surrogate models trained?
- omitted scalability discussion - trees of what size can REASONX handle? In discussion is the highest depth 5. This is a problem since at that depth, the tree is understandable by itself and leads to at most 32 rules. How many rules can the system handle?

Minor issues and questions:
- The algebra of operators is sound and an interesting perspective, but is it useful in any way? Wouldn't simply considering polyhedra work just as well?
- I found the use of the term "linear constraints" for actual linear constraints and their conjunction (a polyhedron) quite unintuitive.
- Given the method's goal of interactivity, why was a user study not performed?
- Can REASONX consider rule sets, or do the rules need to be "non-overlapping," leading to the choice of trees?
- Would your approach fall into the "formal XAI" (Marques-Silva, 2022) when looking at it as model specific method for decision trees?
- I understand the idea about "highest predictive uncertainty being at the decision splits", but in discrete values (common in tabular data), such a small relaxation can have a major effect. E.g., relaxing a rule x > 0 for a binary x would lead to no split at all. In your case, the ordinal values seem to be encoded in a way that allows for this. Is that not an issue?
- This approach has one further limitation not mentioned (in an otherwise thorough discussion of limitations), namely model privacy. REASONX enables querying the decision boundary directly (by looking at the returned results), including listing the counterfactual leaves. This would be a major hurdle to adoption in client-facing applications, where it could lead to leaking the decision model entirely.
- In the example at the end, in the first reply by REASONX, how can the models assign the same label, but the rules have no intersection? The instance itself is proof of the non-emptiness of the intersection.
- Finally, the usefulness of the counterfactuals found by this method might be hindered by their being in a low-density region (e.g., as shown in Figure 5). Could REASONX include constraints on the (counter)factual plausibility?

Sylistic/typos:
- Font suddenly changes for half of the abstract
- Confusing dot notation, maing j-th feature of i-th instance I_i.x_j, instead of the standard x_ij
- Page 2 - "a linear constraints", then "Thus, linear constraints ... appear a natural..." missing "as"
- Page 24 - missing "s" in "contraint"

Review #2 submitted on 06/Apr/2026

By Martin Krutský
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

1. Despite the authors' disclaimer that the fidelity of surrogate DTs to the original black-box model must be assumed (page 7), the implications of imperfect fidelity require deeper investigation:
- Some (e.g., contrastive) explanations can be invalidated by the discrepancy. The risk is exacerbated by REASONX's capability to reason over under-specified instances - such instances cover broader geometric regions, which are more likely to intersect with areas of low surrogate fidelity.
- The framework could partially address the fidelity issue by, e.g., identifying and filtering out DT's decision paths known to have lower fidelity to the black box (and not using them in the explanation).
Could the authors briefly elaborate on the points mentioned above?

2. While REASONX is technically an agnostic method under the authors' assumptions (page 4, 18), the restriction to (surrogate) DTs is highly restrictive and not fully supported by neurosymbolic literature (some works, e.g., extract logical rules and programs directly from black boxes). Are there other inherently interpretable models that could be compatible with the framework?

3. Since the authors report a generally longer runtime of REASONX compared to other XAI methods (page 34), it would be useful to present basic asymptotic complexity analysis. How does REASONX fare against other methods, disregarding the implementation efficiency?

4. Some of the implementation details seem to hinder efficient interactive querying (meaning, without full recomputation; useful in interactive settings, especially in the case of longer runtime). One of them is probably the usage of the single-answer MILP method bb_inf (page 23). How complex would the changes be, allowing efficient iterative usage of REASONX?

5. The following claim should be softened:
REASONX is working with "any black-box predictor" (abstract) - DTs are inappropriate surrogates for some models, especially for models working with non-tabular data.

Review #3 submitted on 06/Jun/2026

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Poor
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

This submission presents a tool for generating (factual or contrastive) explanations from decision trees, which may or may not derive from a black-box model. The tool ultimately translates the problem of finding such explanations into a CLP problem, to be solved via Prolog's CLP(R) library.

I think that this is a potentially interesting tool that deserves to be presented in this journal; however, I also believe that the quality of the presentation needs to be improved considerably before acceptance can be recommended. The paper spends, in my opinion, too many pages describing implementational details of comparatively little interest (the discussion of the meta-interpreter at pages 20-23 being perhaps an especially egregious example); instead, the "evaluation" part of pages 31-onwards, which should be the most important one since the paper is about presenting a new tool, contains only a limited discussion of the results and leaves many important details in the appendix, to the point of being borderline unparseable as it is.

Some more specific comments (some of which very minor) follow:

* Page 2: "...encodes a linear constraints" - a linear constraint

* Page 4: "---whether the explanation method exploits knowledge about the internals of the black-box or not". If we can see "the internal of the black box", isn't that *not* a black-box by definition?

* Page 6: "Bayes rule". What is meant by this? Bayes' rule isn't about choosing the label with the highest estimated probability, as the phrase would seem to suggest; and it doesn't seem to be relevant to the framework described here, in which the leaves simply assign probabilities to labels.

* Pages 11 and following: a criticism of the authors' notion of "factual explanation" that should be worth addressing, I think, is that depending on the number of variables and on the structure of the decision tree the explanation generated by the algorithm might involve a number of variables that are not *actually* relevant to the decision made and might make it difficult to parse for a human user. One can answer this in a few different ways (for example, by arguing that if so then the problem lies in the fact that the decision tree is badly constructed, or suggesting that this factual explanation might be simplified later on if necessary but at any rate it still need to be constructed first); but in any case, I believe that this should be addressed.

* Also pages 11 and following: something should also be said about how to address the fact that a set of constraints can often be made non-redundant in different ways. Presumably, we want to give the user the constraints in the most readable form possible: but, to make a fairly trivial example, how are we to choose between {x1 >=3, x2 >=2, x1 + x2 <= 5} and {x1 = 3, x2 = 2}? Intuitively in this case it is
clear that the second set of constraints is preferable, but in general one needs a well-defined notion of what this means and a description of what the system will return.

* Page 17: "the factual rules are produced only for c's that appear as c and c' \in Th(S) for some c'". It is not clear to me what is meant here. What is S? Does it refer to the previous point?

* Page 19: "Each declared instance is encoded by a list of CLP(R) variables, one for each feature". Shouldn't it be encoded by a list of variable *values*?

* Pages 21-23: as I was saying, I think that the discussion of these implementational details could have been safely moved to the supplementary material.

* Page 32: I think that more discussion of Table 7 is necessary. Here we do not even see an explanation of what its various columns mean, never mind some insight about what these results tell us about the performance of the proposed tool. Similarly for the other tables in this section (I see that there is some material along these lines in the supplementary material, but it really should be part of the main paper).

* Page 35: "REASONX advances other methods in qualitative terms." Such as?

* Page 48: Much of the "Contrasting to sufficient reasons" section should be moved in the main text, I think; and the same applies also to the "Quantitative Evaluation" section later on.

Tracking #: 948-1972

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 948-1972

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links