A Survey of Neurosymbolic Visual Reasoning with Scene Graphs and Common Sense Knowledge

Tracking #: 689-1669

Flag : Review Received

Authors:

M. Jaleed Khan

Filip Ilievski

John G. Breslin

Edward Curry

Responsible editor:

Md Kamruzzaman Sarker

Submission Type:

Regular Paper

Full PDF Version:

nai-paper-689.pdf

Cover Letter:

Dear Editors, I am submitting our survey paper, “A Survey of Neurosymbolic Visual Reasoning with Scene Graphs and Common Sense Knowledge,” to the Neurosymbolic Artificial Intelligence journal. This paper provides a comprehensive survey of the combination of deep learning, common sense knowledge, and neurosymbolic integration for semantic scene representation and visual reasoning. We emphasize the role of scene graph generation, highlight the infusion of common sense knowledge via heterogeneous knowledge graphs, categorize state-of-the-art neurosymbolic approaches and discuss the downstream reasoning tasks, performance evaluation, challenges and future research directions. Given its relevance to current research trends, we believe our paper will be invaluable to researchers in neurosymbolic AI, computer vision and knowledge graphs. Sincerely, Jaleed Khan

Approve Decision:

Approved

Revised Version:

A Survey of Neurosymbolic Visual Reasoning with Scene Graphs and Common Sense Knowledge

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 29/Sep/2023

By Ernesto Jimenez-Ruiz
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Excellent

Detailed Comments:

Comprehensive review combining NeSy, SGG and Commonsense knowledge. It also includes downstream tasks and a performance evaluation, pointing to state-of-the-art methods and datasets.

The described KGs used in the literature seem to contain mostly triples without clear formal semantics (e.g. formalised with languages like OWL). With OWL one can represent typical source/domain and target/ranges of a predicate, hierarchy of predicates, inverse predicates, etc. that could potentially enhance the reasoning with a KG (if any). See suggested references [a,b]. Which is the position of the authors in this regard?
I believe the suggested reference [c] is also relevant to the mentioned NeSy methods. For example, [c] uses commonsense knowledge encoded in first-order logic.

Section 5.1. I do not correctly agree with the point as it is not a problem of KGs but in particular KGs that do not include proper semantics. Here a taxonomy of types of bird who can swim and/or fly is key. Another issue is whether this type of KGs have been applied or not in the literature for SGG, but in principle KGs (especially those using formal logic) have good potential.

Section 5.2 The NeSy4VRD dataset [a] tried to improve the original dataset with more meaningful and non ambiguous predicates. I agree that this is an important issue if more semantics are to be used.

Minor:
- i.e. e.g. -> i.e., e.g.,
- heterogenous KGs -> missing a clear definition. Does it refer to a KG integrating sub-KGs from different domains? Or a KG integrating knowledge of different modality? Or just combining multiple KGs instead of focusing on one?

Suggested literature:

Datasets:
[a] NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection. CoRR abs/2305.13258 (2023)

Paper2:
[b] On the Benefits of OWL-based Knowledge Graphs for Neural-Symbolic Systems. NeSy 2023: 327-335
[c] Scalable Theory-Driven Regularization of Scene Graph Generation Models. AAAI 2023: 6850-6859

Review #2 submitted on 03/Nov/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

This paper presents a survey on rich scene graph visual reasoning. This is an important and interesting topic, particularly in the generative AI era. The paper focuses on two aspects: neuro-symbolic integration and commonsense knowledge infusion.

Pros:
* This is an important topic, and there is no ready survey paper yet. The survey on this topic, if carefully written, will be a positional survey paper.
* The paper is well structured, and most parts are well-written. Some parts have room for improvement in writing, which I will elaborate on as the sequel.

Cons:
There are different shortcomings with different aspects. The
*Scene graph generation part: The survey on the scene graph generation part is insufficient. This is understandable since it is a fast-moving area, and therefore, the authors are suggested to add more in-time literature to enrich this part.

* Some parts are written either too brief or too superficial. 2.1.4 The DQN part covers very "superficially" without a clear picture of how it connects with the visual understanding.

* Commonsense infusion part: for this part, the structure is not clear. It shifts from categorizing different types of priors to the knowledge graph part. If so, it is better to name this section the "Knowledge Graph" part. Also, this section lacks illustrative figures or summarizing tables.

* Nesys part: This part is not well written. For example, Both sec 2.3.1 and 2.3.2 only have one single gigantic paragraph. Please carefully re-organize to ease reading.

* Some categories are not explicit, arguable, and lack sufficient review. For example, what is the relation between "explainable AI" and "Nesys"? This is a very relevant topic, and the survey paper is supposed to provide some review and insights on applying "explainable AI" and "Nesys" to visual reasoning.

Overall, this survey could be much improved after a careful revision.

Review #3 submitted on 29/Oct/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes
Adequacy of the bibliography: Yes, but see detailed comments

Detailed Comments:

The paper presents an extensive survey on the works regarding visual reasoning with scene graphs and common-sense knowledge, classify them w.r.t architecture used, tasks, knowledge graphs and the loose and tight coupling, evaluation metrics . The paper is relevant since this is certainly an important gap to fill in the literature when it comes to make a survey.

Moreover, I find the tight-coupling and loose-coupling classification useful.

- Language: Manuscript language is certainly satisfactory (no narrative mistakes or typos, in general), but in general style-wise quite dry. In many sections, it mentions a single sentence per citation about what it does, and moves to the next, with the monotone same structure e.g., 3.1. This could be easily fixed.

- Technical style: I believe that the paper's style could be improved if it did introduce the problem technically or half-formally under definitions or boxes. I think this is a must as it would provide substance: What is the task, "Input" , "Output". Moreover, major network architectures like RNN, GNN etc all lack the citation of the paper that introduces it. (Ideally, I would also suggest to put a figure, or input - output schema for each of them. But this one is surely optional.) Again, try to define knowledge graph instead of giving just a verbal example. When you say "For instance, a KG can provide information that "a bird is likely to be found in a tree"", the word "likely" is not natural for a knowledge graph, triggers a statistical prior instead where you have a section for. (the word "in general" would serve better.) Also start the text with an example in the introduction if possible. It would really help and keep the reader.

- Missing major Literature: The survey disregards two important directions of literature completely :

1) Causality-based approaches. I think causality needs its own part under section 2.2. or next to statistical priors as a tool for common-sense reasoning or knowledge ( e.g., Liu et al 2022, Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering, or . Liu et al. Show, Deconfound and Tell: Image Captioning with Causal Inference. Zhou and Yang 2021, Relation Network and Causal Reasoning for Image Captioning . There are others for other tasks (I don't expect the survey to be fully exhaustive, of course). These are also inherently NeSy. It could in addition make challenges section interesting.

2) Hyperbolic embedding approaches (which takes either KG or taxonomy into account) relevant to common-sense reasoning e.g., Xiong et al. 2022, Hyperbolic Embedding Inference for Structured Multi-Label Prediction. Relevant to their section 2.3.2. the hierarchical semantic segmentation. Hyperbolic Image Segmentation Ghadimi et al. 2022. There is actually a great survey: by Mettes et al. 2022 "Hyperbolic Deep Learning in Computer Vision: A Survey".

- ML architecture classification reads exhaustive: I am not sure I would put the deep learning architecture as a exhaustive (also in figures) because there can be new architectures to be used, and this would make your survey more obsolete than it should be when time passes. There could be a subsection "Other" which could explain this fact. I leave it to authors' judgements.

Minor issues:

-It is not clear whether the performance evaluation for instance Table 4, has the authors themselves did or transferred from the papers. Should be clarified.

-lots of top K -> top-$K$

- MNM -> MMN

- Section 1.2 the lines 46 to 51 reads redundant: "deep learning, common sense knowledge and NeSy integration for scene representation and visual reasoning." twice.

- I would expect to see Figure 2, from left to right. (But I guess, authors want us to compare the left bottom to the right bottom.) Still something to reconsider. (optional).

A Survey of Neurosymbolic Visual Reasoning with Scene Graphs and Common Sense Knowledge

Tracking #: 689-1669

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 689-1669

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links