By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Excellent
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
Overall I enjoyed reading this paper - some very interesting ideas and mostly written and structured in a way that was easy to follow. However I think some terms need to be introduced a bit sooner to understand some of the finer points, and for a journal paper the conclusion feels a bit sudden following the results section. I think the results and ideas leading up to them present an opportunity for interesting analysis and discussion; mainly the opportunities for future work, but also how one might interpret the factors (I elaborate later). With or without a discussion section, this is still a good paper worth publication in my opinion.
Significance: The work is significant as a means of overcoming the variable binding problem in an efficient way. The relevance to neurosymbolic is clear: the vectors correspond to symbols that may be manipulated and bound in vector-space, and a means of improving the efficiency of a neural network using this representation is proposed.
Background: The method certainly fits in nicely to the background methods listed. The relevance of cited work is clear and comparisons are made later in the paper. That said, I'm not familiar enough with VSAs to comment on whether more could have been included.
Novelty: The new methods are a natural progression from and improvement over those cited in related work.
Technical quality:
Experiment descriptions very detailed to the extent that I believe anybody interested should be able to replicate them. Tests are performed using common architectures and datasets.
I did however wonder if a more suitable dataset might be available. RAVEN has attribute labels (shape, position, etc.) but no class labels for the combinations of attributes; ImageNet and CIFAR have class labels but no attribute labels. The results with these datasets are interesting but what about a dataset where attributes and classes are both defined? Are there no such datasets?
Presentation:
- Figures in themselves are fine though may need to be better placed near relevant text in places.
- Structure mostly okay except section 3 which would be better placed before section 2.
- Otherwise a very pleasant read and well written. The "story" is very clear.
Length: Length is appropriate for the work conducted, though I do feel it could use a page of discussion and/or future work sections.
Data availability: Uses public datasets for all experiments, so availability good.
Possible discussion section:
=====================
What are outstanding issues to be addressed future work?
How do we interpret a "factor" for e.g. ImageNet and CIFAR? Does it have any symbolic meaning? The interpretation of factors and products I think is a particularly interesting discussion point, especially as there are various efforts to identify the concepts learned by neural networks and map them to symbols. (See all references at end). In particular, if RAVEN binds values assigned to A, B, C and D where these are A shape, B position etc... what might A and B correspond to for ImageNet and CIFAR? I appreciate the authors may not have the answer, and perhaps slips outside the scope of the paper, but it would make for fascinating future work and some initial thoughts or discussion on how one might go about finding it would be welcome
Introduction of terms
================
As somebody new to VSA, I would have found it easier to read section 3 before section 2. I read up to section 3, then decided to start from the beginning of the paper again having read 3, before continuing the rest of the paper.
Operational Capacity
----------------------------
This term is defined twice:
- "The ratio between the maximum factorizable problem size and the required vector dimensionality" (p 3)
- "The largest problem size for which BCF achieves an accuracy higher than 99% (p 8)."
I appreciate they may mean different things in practice for different methods but what would be a general definition of the term?
Bundling
-----------
Maybe it's just me but I couldn't quite get an intuitive grasp on what bundling was even though at some level I understood it enough to follow the rest of the paper!
One definition given is "The bundling of two or more vectors is defined as their elementwise addition, followed by a selection function that retains the sparsity by setting the largest element of each block to 1 and the remaining elements to 0" and this makes sense in terms of how one calculates it, but to me it doesn't clarify what it's for or in what sense it is a "bundle"?
Am I correct in thinking it's a sort of block-wise summary or average of vectors in the "bundle"? I.E. a closest possible representation of all vectors?
By extension I also struggled to understand what "bundling capacity" was - is it related to operational capacity?
Sampling Width
---------------------
Sampling width is important to section 4.4. but a definition isn't given until section 4.7: "The sampling width (A) determines how many codevectors will be randomly sampled and bundled in case the thresholded similarity is an all-zero vector"
Other minor points:
===============
Fig 1:
I wasn't completely sure how to interpret the layout of the figure - Does the Binary SBC correspond to the top-right from 1-4 and the GSBC to the bottom-right? In other words, are you binding a binary SBC to a GSBC or are you trying to show that the method could be applied to either? Could it work with 2 GSBCs or 2 Binary SBCs?
Also I thought Fig1. and Table 1 were a single figure at first. I would suggest rearranging the layout
p6, eq. 4: Should the ~ above the x to the right of the equation be a ^?
Section 4.6: Some might argue F=2,3 is a small range to test. How high realistically could it go?
References:
=========
[Chen et al., 2018] Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C. and Su, J.K., 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32.
[Townsend et al. 2020] Townsend, J, Kasioumis, T, Inakoshi, H, (2020) ERIC: Extracting Relations Inferred from Convolutions. In: Ishikawa H, Liu CL, Pajdla T, Shi J, (eds) Computer Vision – ACCV 2020. Lecture Notes in Computer Science, 12624. Springer, Cham.
[Zhang et al. 2018b] Zhang, Q., Yang, Y., Ma, H. and Wu, Y.N., 2019. Interpreting cnns via decision trees. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6261-6270).
[Zhang et al. 2018b] Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S. Zhu, “Interpreting CNN knowledge via an explanatory graph,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.