By David Tena Cucala
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
I omit the summary of the paper, as this has not changed with respect to my prior review.
In general, this new version of the paper conveys its contribution in a clearer and more understandable way; the purpose of the paper is also more clearly motivated. I have a few further comments (specially about the shape of the rules), but in general, I believe the paper can now be recommended for acceptance – but please clarify the exact form of the rules.
Detailed comments:
--I find it still somewhat strange that a textual version of the (originally visual) RPM task is used. I understand that other approaches leverage vision support and achieve consistently lower performance. However, I miss an argument explaining using text-only data is still an interesting problem to consider, given that (as the paper already says) there exist verbal abstract reasoning benchmarks already. This could simply be a matter of adding one or two sentences in the Introduction.
--I am still not fully clear on what nxn constellations are. So, if I take what the paper says literally, there are 8 candidate answer panels; in the 2x2 constellation, each panel has 4 objects, is that correct? How are those objects related to the 3x3 context matrix?
--Section 4.1: it is still unclear what the Binding and Unbinding operators do. Could you provide some intuition (for example, as you already do with the Bundling operator?)
--Page 8, line 34: should X and O be switched around? If I understood correctly, O should be the first two rows, and X should be the incomplete row.
--I am still confused about the shape of the rules. What is equation (4) and why are there 12 distinct c_i’s? Is this the most general form of the rule that the model learns? Why does it only mention two operators, even though 3 were defined?
--I am confused by line 14 in page 8 and Equation (5). It says “c_i either represents a context panel v_a^(i,j) or the identity”. However, the definition of c_k makes no reference to the row i and column j, as far as I can see, so how to match c_i to a specific position in the context matrix?