Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Tracking #: 914-1931

Flag : Review Received

Authors:

Jingyuan Sha

Kristian Kersting

Devendra Singh Dhami

Responsible editor:

Guest Editors NeSy 2025

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-914.pdf

Supplementary Files:

nai-supplementary-914.pdf

Approve Decision:

Approved

Revised Version:

Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Previous Version:

Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Tags:

Reviewed

Decision:
Minor Revision

Solicited Reviews:

Review #1 submitted on 20/Feb/2026

By Alessandro Oltramari
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

The revised version of the paper satisfactorily addresses the primary concerns raised in my previous review. In particular, the authors have substantially clarified the issues related to (1) the inconsistent performances across different Gestalt principle groups and (2) the analysis of systematic errors.

First, in the Category Level Evaluation section, the discussion of performance discrepancies across Gestalt groups is now clearer and better structured. The authors provide a more explicit interpretation of why certain principles lead to stronger or weaker model performance, which improves the transparency and interpretability of the benchmark results.

Second, the treatment of systematic errors has been significantly strengthened. The patterns of errors are now clearly presented and analyzed, particularly through the improved discussion accompanying Table 4 and Figure 10. This addition helps contextualize model limitations and makes the evaluation more diagnostic rather than purely performance-driven.

Overall, the resubmission effectively resolves the major concerns previously identified.

Review #2 submitted on 17/Feb/2026

By Lia Morra
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Detailed Comments:

As stated also by other reviewers, despite not presenting new Nesy techniques or evaluation of existing ones, the paper has merit in providing a benchmark on which LLMs and VLMs struggle, as well as showing which principles poses the most challenges . The benchmark is publicly available and synthetic and thus easy to use and extend, and investigates principles complementary to existing ones – as shown in the revised section “Comparison with Existing Datasets”.

I appreciated, in the revised version, additional clarifications on task composition, on the training/validation split, on the training of transformer-based models, and on the computational requirements.

Differences between the extended version and the conference version are also clear in the rebuttal. I still think that the technical contribution beyond the conference version is somewhat limited to new experiments. I suggest in any case to explicitly state the differences in the introduction, especially since the reported results differ between the conference paper and the extended version: the differences are due to evolutions in the benchmark, but could be perceived as inconsistencies by the readers.

However, there are still a few issues in the revised submission.

It seems to me that it is not sufficient for the model to recognize the Gestalt principle, but it must also recognize the underlying rule, potentially conflating Gestalt principles with other forms of spatial reasoning. As an example, in Figure 17, continuity is supposedly tested through intersected splines, which are visible in both positive and negative examples; positive examples include only objects of one shape, whereas negative examples include objects from multiple shapes. However, visually, both positive and negative examples appear as continuous splines: to solve the task the model must recognize whether objects have the same shape or not – the overall arrangement being in this example irrelevant to the task.

The fact that the VLMs are given the Gestalt principles as part of the prompt needs to be further highlighted as it is potentially favoring the VLM over the ViT, which is trained without prior knowledge. It is also interesting to note that providing a verbal description of the Gestalt principle is insufficient to guide the VLM towards the correct solution. I wonder what would happen if the prompt simply included positive and negative examples.

Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Tracking #: 914-1931

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Approve Decision:

Previous Version:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 914-1931

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Approve Decision:

Previous Version:

Tags:

Journal Info

Submit

For Reviewers

Links