By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Average
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average
Detailed Comments:
The paper proposes an embedding method for knowledge graphs using graph neural networks in combination with box embeddings. This is demonstrated on the tasks of predictions and evaluation of KG revisions.
The topic is in scope of the Nai-journal and presents an interesting extension for modeling ontological information geometrically. It seems to be a sufficient extension of the conference article.
The overall aim of the paper is not fully clear: whereas the title proposes a general framework for an embedding method, the paper is highly focused on one specific use case. Therefore, it would be necessary to make the distinction between general approach and application to the use case clearer in the whole paper. E.g., Section 3 could be focused on the general approach, whereas section 4 would describe the specific KG. This would also increase the understandability of the approach, as it is partly not clear which design decisions are use case focused and which are general decisions for the approach.
The approach of combining a box embedding and a GNN seems to be promising, however, it needs a more detailed explanation and evaluation. Many preliminaries are not introduced, e.g. Description Logics, GNNs, GraphSAGE, etc. making parts of the paper hard to follow.
Three different use cases of the approach are given, exemplifying the usefulness of the approach. However, the approach could have been evaluated in more detail, especially comparing it to other approaches and using different datasets.
As the idea and the results are promising, I think that the paper could be acceptable based on some revisions, as stated in detail below.
Detailed comments:
# Abstract
- "interpretability techniques to identify co-occurring edges...": This sentence is unclear: What are interpretability techniques, what is an edge? Is a relation meant?
- The abstract should briefly state the results of the experimental section
# Introduction and Related Work
- As this is a journal paper, there is enough space to make an explicit related work section and to discuss the related work in a lot more detail.
- The distinction between KGs as set of (subject, predicate, object)-triples and KGs enriched with ontological or hierarchical information should be made clear in the beginning, such that it gets clear which approach is able to model concept- and which only instance-information.
- A more detailed introduction to the principle of KGE and especially on the idea of representing concepts as boxes (or some other geometric object) is needed, as this is not a straightforward viewpoint.
- Kulmanov and others are not only able to model hierarchies, EL++ is more expressive. It is necessary to discuss here in detail why such an approach that allows for modeling also concept conjunction, role inclusions etc. is not used, but only a hierarchy is modeled.
- "Instead of representing relations as translations of classes" -> also the approaches not using GNNs are not restricted to translations, there are many other ideas, especially when not considering concept information. Here, a more detailed discussion is necessary.
- "Box embeddings have been combined with GNNs..." -> This is unclear: what is the difference? For which purpose have the box embeddings been considered when not for ensuring correctness? What does "semantically correct" actually mean? Being in line with the hierarchy?
- The general introduction of the topic is too short, a general overview of the proposed approach, the goals etc. should be given in more detail before starting to discuss the use case.
- How expressive are the ontologies mentioned? Not all of them are solely hierarchies, how much information is lost if the non-hierarchical part of the ontology is ignored?
- "KG embeddings have, for example, been used by Gualdi et al. (2024) to predict genes associated with diseases from a protein interaction KG." -> What is the difference to the proposed approach?
- in general, the related work should be discussed in more detail: there are many approaches able to model hierarchies and also approaches using box embeddings in a similar domain (for hierarchies, see, e.g., Zhanqiu Zhang,Jianyu Cai,Yongdong Zhang,Jie Wangy: "Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction" and others, for the similar domain, see, e.g., Adel Memariani,Martin Glauer,Simon Flügel,Fabian Neuhaus,Janna Hastings,Till Mossakowski: "Box embeddings for extending ontologies: a data‑driven and interpretable approach")
- Also, Box-embedding approaches such as TransBox, ELBE, etc. need to be discussed in more detail: what are their limitations and why should the proposed approach solve these limitations?
- The introduction needs an overview of the paper ("In Section 2, we present...") and a detailed discussion on the exact tasks performed and the outcomes of the experiments.
# Preliminaries
- What is a "mindeltaBoxTensor constructor"? Why is it used?
- An introduction to description logic is needed, as DL-terminology is used throughout the paper.
- Box embeddings are introduced in section 3, however, when defining the semantic loss, these embeddings are assumed to be known. There should be a short introduction to box embeddings in general in the preliminaries. First, hierarchies should be defined. Hierarchies do not necessarily involve disjointness axioms, the ones considered here seem to involve them. Some hierarchies could involve axioms of the type $A\sqsubseteq \exists R.C$ which seems to be not the case here. After defining the problem statement and how an embedding should look like, the loss functions can be defined.
- Why is the L_distance^Minus loss defined like this? Two boxes are disjoint, if they are disjoint in one dimension. Does enforcing all dimensions not to intersect improve the experimental outcomes?
- What is the influence of using Gumbel random variables? How is this done in detail?
# Material and Methods
- Introductory sentence: In this section, we will first ..., then ...
- Before explaining box embeddings and the like, there should be a detailed problem statement: What should be learned, what is given, what is the input to the GNN, etc.
- What is the outcome of the GNN training and why should the output of each layer be treated as a box? What are the expected advantages of using boxes? Isn't it possible to model the hierarchy in some other way? This seems to be the main idea of the paper and therefore needs a lot more detail. Also, a small artificial example would be helpful.
- "Heterogeneous KGs, with different domains made up of classes we do not want to embed together, can have separate embeddings for each domain, which are trained using separate class hierarchies." -> this should be formalized.
- How is the negative sampling done? Based on instances or concepts? Is it actually necessary? Wouldn't it be possible to set for each $A\sqsubseteq B$ in the hierarchy the constraint that $B\not\sqsubseteq A$? Isn't there any disjointness information given in the ontology?
- Handling nominals should not be discussed in 3.2 but earlier, when talking about box embeddings in general.
- Handling nominals as boxes could lead to problems: How is it interpreted if the boxes representing {a} and {b} are intersecting but not equal? What happens if the box is not fully part of a concept but only partly? In the experimental section, this happens several times.
- How could {a}\sqsubseteq \exists r.{b} be represented? In my understanding, only the hierarchy is represented in the box embedding. Or is this only used for the GNN-part?
## 3.3
- It would be helpful to have a strict separation between general method and application, maybe in two different sections: one section describing the embedding method, the GNN etc. in all detail and one section describing the application to this specific use case. Now, 3.3., 3.4 and 3.5 consist of some general discussion on how an approach for prediction or graph revision could look like and some specific discussion on one dataset. Especially 3.4 and 3.5 are not only usable with this specific data and there should therefore be a strict separation between method and use case.
- The use case in 3.3 needs a formal definition of the task to solve and the given data, especially which ontology exactly is used and how it is structured (e.g., how deep is the hierarchy?)
- Is the domain separation done manually? Could this be automated to use the approach not only for this specific dataset? I understand that the domain separation allows for varying the size of the embedding based on the number of classes in the domain, however, why should it decrease the overall dimensionality needed? If the classes are disjoint, then they could be represented as non-overlapping boxes next to each other in the same dimension. Thus, only the maximum dimensionality of the domains would be needed.
- How many infrequent edges have been removed? Less than 1000 occurences seem to be a quite high threshold. Have you observed a difference in result quality when setting a lower threshold?
- What is meant by "removing overlapping edges"? What are "nodes" in this context, instances or classes?
- What does $\sqsubseteq*$ mean? Is this not only the set of direct parents but the set of ancestors? This needs more detail: does it mean that $c\sqsubseteq\overline{p}$ is used as negative sample?
- GraphSAGE should be at least briefly introduced.
- Figure 3 seems to be a quite informative figure but needs to be explained in more detail.
- Why is the dimension of the most common target domains higher? Should this not be dependent on the complexity of the hierarchy? Or is such a high dimension necessary for the GNN and not dependent on the hierarchy to modeled?
## 3.4
- Why do we need a box embedding without a prediction task? What is the goal of this embedding?
- Why is such a simple ontology used?
- Why are the disjointness axioms randomly selected? Why aren't all available disjointness axioms used?
- This ontology does not incorporate relations. There are many approaches being able to learn an embedding of more complex ontologies (such as with conjunction or disjunction) geometrically, some of them with boxes. What is the advantage of this approach compared to the others? I understand the advantage when modeling relations, however, in this task, there seem to be no relations.
## 3.5
- This section needs a lot more detail: What is the exact aim, what is related work?
- If an edge represents a role assertion between two classes, is it then a directed edge representing $A\sqsubseteq \exists R.C$? But how can then box embeddings be trained? In my understanding, it is not possible to train axioms of the type $A\sqsubseteq \exists R.C$, only of the type $A\sqsubseteq B$?
# Results
- a comparison with standard box-embedding approaches, e.g. ELBE or BoxEL or the like is missing, also a comparison to standard KGE-methods (like a TransE-based method) not relying on hierarchical information would have been interesting.
- Some of the standard datasets for knowledge base embeddings could have been used to reduce the dependence of the evaluation on one dataset.
- How well is the hierarchy actually modeled? In Figure 7, it seems like the axioms are not fulfilled even though the ontology is overly simple and could be represented easily in two dimensions. Not all instance representations are fully included in the concept representations and the disjointness between Woman and Country is not satisfied. Is this a general problem or only in this toy example?
## 4.1
- Why is it compared to LightGBM and not to some other approaches?
- Have you tested GNN with box embeddings but without ontology information? This would show whether the box structure in general is helpful or really the hierarchy information.
- Why does the parity plot in 4(b) and (d) "show promise in generalizing to a new task"?
## 4.2
- The goal of this section could be clarified.
- For me as someone without any knowledge in biology, the section is hard to follow. Is it possible to explain the experiment and its goal more general? I think that for a computer science journal, it would be sufficient to state that the hypotheses can be justified; all detailed biological discussion can be moved to the appendix.
- Is this hypothesis only stated when using box embeddings or is it also an outcome of the other approaches? Are there other hypotheses that can be stated and that sound plausible? As this is only one example of a positive outcome, it does not have a high significance to show the overall validity of the approach.
## 4.3
- In Figure 7, is this box embedding dependent on the choice of the seed or is it a reproducible outcome?
## 4.4
- This evaluation needs a lot more detail (see the comments to 3.5)
Minor comments:
- footnote 1: paper title in quotation marks
- p.2: phenomena
- p.3: SGD needs a source
- p.4: "to find the distance from the subclass being completely contained within the superclass" -> strange wording
- (4) for 1 space missing