By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
Thanks for clarifying the “LLM-only” baseline (my original comment 2) and the RAG with CoT setup (my original comment 4). My remaining concerns are my original comments 1 and 3, which I provide here with new comments:
“1. Introduction and related work: I think it may be worth first discussing the problems with cross-lingual information retrieval vs. the problems with retrieval from long documents separately, since they introduce different challenges and the existing methods for addressing these challenges are also different.”
Neither section has been changed much. Long contexts and CLIR each have its own line of work separately from the other that should be discussed in more detail and in a more structured way - e.g. a paragraph for each in the related work section before discussing any issues that arise from combining the two.
“3. CROSS: I have a bit of a hard time understanding how CROSS operates differently from existing cross-lingual RAG methods. I believe existing methods also embed chunks from the documents using multi-lingual embeddings and then retrieve the top k most similar chunks to the query. My understanding is that the contribution of the paper is mostly focused on the reasoning component NSAR, but it would be useful to explicitly mention which parts of the CROSS pipeline are novel. E.g., is the main difference the choice to embed sentences rather than longer chunks? If CROSS is considered one of the contributions of this paper, the retrieval accuracy needs to be evaluated against typical cross-lingual RAG systems (before the generation step), not just against using the LLMs without access to the source documents.”
The cover letter addresses this point: “We have clarified that CROSS is a RAG backbone specifically optimized for sentence-level granularity and massive scale (512k tokens), serving as the necessary prerequisite for the neurosymbolic module.”
a) If the main idea is that the chunks are at the sentence level, the novelty is very limited and other RAG chunking related work should be discussed (see many sentence-level RAGs here: https://arxiv.org/pdf/2312.10997).
b) What do you mean by “massive scale (512k tokens)”? I’m not sure what this number refers to.
Overall, I think the issues are minor and mostly have to do with framing - of CROSS as an algorithmic contribution (ignoring many RAG baselines in that case) and of these two problems as tied together (ignoring rich lines of work on each problem independently).