Original manuscript tracking number: 796-1787.
We would like to thank all reviewers and editors for their valuable
feedback and insightful comments which has strengthen our work. We
appreciate the possibility to revise and resubmit our manuscript
'DELE: Deductive EL++ Embeddings for Knowledge Base Completion', an extended version of NeSy 2024 conference paper 'Enhancing Geometric Ontology Embeddings for EL++ with Negative Sampling and Deductive Closure Filtering'. Please, see
below a point-by-point response to all reviewers' comments. The file 'dele_highlighted.pdf' contains the revised version of the manuscript with changes highlighted in blue. The file 'dele.pdf' contains the same revised version yet without highlighted text.
======================= Review # 1 ================================
Reviewer 1:
[1] The organization of the Methods and Results section is not
clear. The Results section describes the novel methodological
contributions and experimental results. While this type of
organization may be common in other areas (e.g. biology) I don't think
it works well here. I suggest the authors reorganize these sections to
clearly present their methodological contributions, experimental
design and evaluation results.
Response: we reorganized the text and in particular Methods and
Results sections as suggested. First, we removed the 'Approximate
Semantic Entailment' section from 'Related Work' (Section 3). Then we
placed theoretical materials related to additional negative losses
into a separate Section 4 'Negative Sampling and Objective Functions'
of the revised version, and theoretical materials related to the
deductive closure and its utilization into a separate Section 5
'Negative Sampling and Entailments'. Information regarding training
and testing procedures as well as numerical results were placed into a
separate Section 6 'Experiments'.
Reviewer 1:
[2] The authors can be more positive about their contributions. It is
presented here in very neutral terms. For instance, I strongly suggest
that the authors rewrite section 5.1 in a way that more clearly
demonstrates the novelty and potential impact.
Response: As suggested, we added more details concerning why it is
important to sample negative examples during training and, in
particular, negative examples of various types. They can be found in
Section 4 of the revised version of the paper. We highlight that the
operation of negative sampling prevents overgeneralization of learned
embeddings and trivial satisfiability in case a model collapses by
incorporating additional constraints within a model. Since in
geometric models, in previous work (ELEmbeddings, ELBE and Box2EL),
only negatives of GCI2 axiom type are sampled during training, it may
introduce biases towards this axiom type (which is usually used for
evaluation) and lead to false entailments.
Reviewer 1:
[3] In tables 1,2 and 3, bold and underlined are confusing. Do you
mean best results for each embedding method by filtered dataset? If
so, you are missing several bolds and underlines.
Response: For easier redability, we highlighted missing best metrics
in Tables 3-6 of the revised version. We also removed columns with
Hits@1 filtered and non-filtered metrics since they are not quite
informative, and we added columns with the difference between filtered
and non-filtered metrics for macro- and micro-mean ranks (NF-F) in
Tables 3, 5 and 6 highlighting the average amount of entailed
knowledge predicted by various geometric models and their
modifications.
Reviewer 1:
[4] Can you explain why Table 2 presents only non-filtered metrics?
Response: The protein--protein interaction prediction task is not
tailored for evaluation using deductive closures of the train or test
set: for each protein $\{P\}$ (singleton class), its subclasses
include only $\bot$ and superclasses include only $\top$. As a result,
the only inferred axioms will be of type $\bot \sqsubseteq \exists
interacts\_with.\{P\}$, $\{P_1\} \sqsubseteq \exists
interacts\_with.\{P_2\}$ or $\{P\} \sqsubseteq \exists
interacts\_with.\top$, and filtered metrics may be computed only with
respect to the train part of the ontology. For this reason we do not
report filtered metrics for protein--protein interaction prediction
task (Table 4). We now specify this in Section 6.4 (paragraph 9).
Reviewer 1:
[5] Is Equation 6 this is the same as the "positive loss" for
disjointness?
Response: Yes, for all three geometric ontology embedding models we
consider we use the positive loss for disjointness as the negative
loss for $C \not\sqsubseteq D$ axiom type (Equation 1 in revised
version). In ELEmbeddings case, although non-containment of ball
corresponding to $C$ within the ball corresponding to $D$ is not
equivalent to their disjointness, the loss aims to minimize the
classes' overlap for better optimization. The same logic applies for
the negative loss in Equation 8 (Equation 2 in revised version) where
we minimize overlap between the translated ball corresponding to class
$C$ and the ball representing $D$. We highlight this in Section 4.1
(paragraphs 2-3), Section 4.2 (paragraph 2) and Section 4.3 (paragraph
2).
Reviewer 1:
[6] Fig. 3. Is missing a caption, and 3b is hard to interpret due to
the scale of the y axis.
Response: As suggested, we added a caption to Figure 3 and also
captions for subfigures 3(a) and 3(b). We also replaced Figure 3(b)
with a bar plot to make it easier to interpret.
Reviewer 1:
[7] P3 L18-22: These two sentences are not clear. Grammar could be
improved, as well as a more detailed explanation of the achievement of
ref30.
Response: We decided to remove the section 'Approximate Semantic
Entailment' entirely because we do not use the concepts introduced in
this section.
Reviewer 1:
[8] Please clarify that the extended version includes additional
experiments with different ontologies and tasks.
Response: We added more information regarding how the current version
of the paper extends the conference paper mentioning considerations of
new geometric ontology embedding models, benchmark sets, and
tasks. The addition can be found in the last paragraph of Section
1. Essentially, we study aadditional an type of knowledge base
completion task, subsumption prediction, on two different benchmark
sets, Food Ontology and GALEN ontology, which were not presented in
the conference paper. We include a more comprehensive treatment of
computing the deductive closure and using the deductive closure with
EL embedding methods (e.g., for model evaluation). Apart from
ELEmbeddings model, we consider two more state-of-the-art geometric
ontology embedding models for EL, Box2EL and ELBE (we specifically
construct all negative losses with respect to all normal forms for
these new models taking into consideration their geometric
properties). Furthermore, we extend our methods for ELEmbeddings model
(we consider negative losses for all normal forms).
Reviewer 1:
Minor:
P3 L 23 missing space in: [39]or
Response: We fixed the typo.
P2 L34: The GCIs and RIs would be easier to read if presented in a
table format.
Response: Following the suggestion, we organized GCI and RI types into
a table (Table 1 of the revised version).
======================= Review # 2 ================================
Reviewer 2:
1. The addition of work to [35] is not clear. I am also not sure if
the additions are sufficient for a journal. These additions should be
explained in more detail.
Response: We added more information regarding how the current version
of the paper extends the conference paper, mentioning considerations
of new geometric ontology embedding models, benchmark sets and
tasks. The additions can be found in the last paragraph of Section
1. Essentially, we study an additional type of knowledge base
completion task, subsumption prediction, on two different benchmark
sets, Food Ontology and GALEN ontology, which were not presented in
the conference paper. We include a more comprehensive treatment of
computing the deductive closure and using the deductive closure with
EL embedding methods (e.g., for model evaluation). Apart from
ELEmbeddings model, we consider two more state-of-the-art geometric
ontology embedding models for EL, Box2EL and ELBE (we specifically
construct all negative losses with respect to all normal forms for
these new models taking into consideration their geometric
properties). Furthermore, we extend our methods for ELEmbeddings model
(we consider negative losses for all normal forms).
Reviewer 2:
2. Although there is a mention of 3-4 ontologies for evaluation, in
the results tables, evaluation on only one ontology is shown. They are
not even present in the Appendix. Why is that the case?
Response: in Section 6.1 of the revised version, we include the
description of two datasets we are using for our experiments: Gene
Ontology combined with data from STRING database (Section 6.1.1) and
Food Ontology (Section 6.1.2). We use Gene Ontology and STRING
database for evaluating tasks of protein--protein interaction (the
results are shown in Table 4 of revised version) and protein function
prediction (the results are shown in Table 3 of revised version), and
Food Ontology is used for subsumption prediction (the results are
shown in Table 5 of revised version). In the revised version of this
paper, we added experimental results on GALEN ontology, one of the
standard benchmark sets for knowledge base completion task. We added
corresponding information about GALEN ontology to Section 6.1.2 of the
revised version, statistics regarding the number of axioms, classes
and relations can be found in Appendix Section C, experimental results
are summarized in Table 6 of the revised version of the paper. We
comment on newly obtained results in Section 6.4 of the revised
version of the paper.
Reviewer 2:
3. The evaluation is performed only on one task (ontology completion)
and subsumption prediction on only one ontology. How are the results
of these two tasks on other ontologies?
Response: In addition to protein--protein interaction prediction,
protein function prediction on Gene Ontology and STRING database and
subsumption prediction on Food Ontology, we added experimental results
on the GALEN ontology, one of the standard benchmark sets for
knowledge base completion task. We added corresponding information
about GALEN ontology to Section 6.1.2 of the revised version,
statistics regarding the number of axioms, classes and relations can
be found in Appendix Section C, experimental results are summarized in
Table 6 of the revised version of the paper. We comment on newly
obtained results in Section 6.4 of the revised version of the paper.
Reviewer 2:
1. The following paper is also relevant to this work. “EmELvar: A
NeuroSymbolic Reasoner for the EL++ Description Logic. Biswesh
Mohapatra, Sumit Bhatia, M. Raghava Mutharaju, and
G. Srinivasaraghava. https://ceur-ws.org/Vol-3123/paper6.pdf.
Response: We include the suggested paper into our related work section
(Section 3.2). This work is relevant to ours since it extends
ELEmbeddings model by introducing role embeddings which enable
handling many-to-many relations. This work provides a perspective to
extending the method to more expressive Description Logics.
Reviewer 2:
2. What is meant by filtered metrics?
Response: in recent works focusing on knowledge graph completion or
knowledge base completion tasks, filtered metrics are computed by
removing axioms presented within the train set from the list of all
ranked axioms. This filtration is applied to eliminate statements
learned by a model during training phase which are therefore likely to
have lower rank and to evaluate the predictive performance of a model
in a more fair setting. We include these clarifications in the last
paragraph of Section 6.2 of revised version.
Reviewer 2:
3. The negative losses in Section 5 is the main contribution and
should be discussed earlier and not part of the Experiments section.
Response: We reorganized the text and in particular Methods and
Results sections as suggested. First, we removed 'Approximate Semantic
Entailment' section from 'Related Work' (Section 3). Then we placed
theoretical materials related to additional negative losses into a
separate Section 4 'Negative Sampling and Objective Functions' of the
revised version, and theoretical materials related to the deductive
closure and its utilization into a separate Section 5 'Negative
Sampling and Entailments'. Information regarding training and testing
procedures as well as numerical results were placed into a separate
Section 6 'Experiments'.
Reviewer 2:
4. The purpose of Algorithm 2 is not clear. It needs to be better
motivated and explained, perhaps with the help of an example.
Response: The purpose of Algorithm 2 is to enrich the approximate
deductive closure with axioms involving arbitrary relations and
concepts or with axioms of new GCI type which may be missed by
applying rules from Algorithm 1 since Algorithm 1 computes entailed
axioms based on ontology axioms, concept hierarchy and role
inclusions. We added this explanation to Section 5 of the revised
version, and also provided two small examples indicating the
importance of rules listed in Algorithm 2.
Reviewer 2:
5. In lines 47, and 48 of page 11, it is mentioned that filtered
negatives provide a faithful representation of GCI3. Why is this the
case?
Response: According to geometric interpretation, for GCI3 axioms
$\exists R.C \sqsubseteq D$ to be faithfully represented, the $n$-ball
interpreting the concept $C$ translated by $-r$ relation $R$ vector
should lie inside the $n$-ball interpreting the concept $D$. We see
this on Figure 2(b) yet not on Figure 2(a) for axiom $\exists
has\_function.\{GO_2\} \sqsubseteq A$. We add this explanation to
paragraph 6 of Section 6.4 and to caption of Figure 2 as well.
Reviewer 2:
6. The difference between Figures 2(a) and 2(b) is not clear.
Response: The difference between Figures 2(a) and 2(b) lies, e.g., in
the faithful representation of axiom $\exists has\_function.\{GO_2\}
\sqsubseteq A$; we now provide an explanation for this in paragraph 6
of Section 6.4 of the revised version. We also mention this in caption
of Figure 2 highlighting additionally that axioms $\{Q_i\} \sqsubseteq
has\_function.\{GO_2\}, i = 1, \dots, 5$ are better preserved when
negatives are filtered based on precomputed deductive closure (Figure
2(b)) rather than when random negatives are sampled (Figure 2(a)).
Reviewer 2:
7. What are non-filtered metrics and filtered metrics?
Response: In recent works focusing on knowledge graph completion or
knowledge base completion tasks, filtered metrics are computed by
removing axioms presented within the train set from the list of all
ranked axioms. This filtration is applied to eliminate statements
learnt by a model during training phase which are therefore likely to
have lower rank and to evaluate the predictive performance of a model
in a more fair setting. We specify this in the last paragraph of
Section 6.2 of the revised version.
Reviewer 2:
8. The improvements seem to be marginal, if at all. The use of
negative losses does not seem to make too much of a difference.
Response: we evaluate all geometric models under consideration on
novel knowledge prediction following strategies from related works on
geometric ontology embedding methods, and in some cases additional
negative losses may dicrease the ability of models to predict new
axioms and encourage models to predict entailed knowledge first (as,
e.g., in protein function prediction case) thus leading to
construction of a more accurate model of a theory. Since there is a
tradeoff between prediction of novel and entailed knowledge,
additional negative losses may demonstrate worse performance on novel
knowledge prediction. We add this note as paragraph 5 in Section 6.4
of the revised version.
======================= Review # 3 ================================
Reviewer 3:
Clarify that such deductive closure is finite for normalized axioms.
Response: To clarify the statement that the deductive closure computed
with repsect to normal forms of axioms is finite we added the upper
bounds on the total number of entailed axioms of each GCI type which
depends on the total number of concepts and the total number of
relations presented in an ontology; these estimations can be found in
the first paragraph of Section 2.3 of the revised version. We
specifically highlighted that the deductive closure computed with
respect to all normal forms is finite for normalized EL theories.
Reviewer 3:
In the description logic literature, the EL++ ontology language is
often defined without normalization (see the paper Pushing the EL
envelope or Pushing the EL envelope further) and the deductive closure
can be infinite. Although it is common to normalize ontologies, this
may not necessarily be the case in practice. In fact many EL
ontologies used in practice, such as SNOMED CT or Galen, are not
normalized. E.g., there are axioms of the form A\sqsubseteq \exists
r. (B \sqcap \exists r. C), even if they are fewer in quantity. The
papers on EL embeddings introduce new symbols and modify these
original ontologies so as to end up with normalized ontologies for
benchmarking. But since this is not how the language is used in
practice, it would be better to differentiate this explicitly. One can
introduce a name such as EL^n++ whenever one wants to refer to a
normalized ontology language. If the authors would like to keep EL++
instead of giving a new name to the normalized language, then at least
a comment about this fact should be added.
Response: As suggested, we added some more information regarding how
normalized and the original version of an EL++ ontology are related to
each other and highlighted the difference between EL++ and the
normalized version of EL++ following your advice; these remarks can be
found in the first paragraph of Section 2.1. We also mention there
that whenever we refer to EL++ later in the manuscript we always
intend the normalized EL++.
Reviewer 3:
Here are some recent papers on EL embeddings that are not referred to
but are related.
Hui Yang, Jiaoyan Chen, Uli Sattler: TransBox: EL++-closed Ontology Embedding. CoRR abs/2410.14571 (2024)
https://arxiv.org/abs/2410.14571
Victor Lacerda, Ana Ozaki, Ricardo Guimarães: Strong Faithfulness for ELH Ontology Embeddings. TGDK 2(3): 2:1-2:29 (2024)
https://drops.dagstuhl.de/storage/08tgdk/tgdk-vol002/tgdk-vol002-issue00...
Victor Lacerda, Ana Ozaki, Ricardo Guimarães: FaithEL: Strongly TBox Faithful Knowledge Base Embeddings for Eℒ. RuleML+RR 2024: 191-199
https://link.springer.com/chapter/10.1007/978-3-031-72407-7_14
Response: As suggested, we add these works to the bibliography and
discuss them in Section 3.2 of Related Work.
Reviewer 3:
Abstract - The authors claim that EL embedding methods do not use the
deductive closure of an ontology to identify statements that are
inferred but not asserted. Recent works are using the canonical model
for EL ontologies (see last two references). The canonical model has
not only what is explicit but also the deductive closure. Since these
works are using the deductive closure, the abstract needs to be
rephrased and we suggest the authors consider these works as well (no
need to change the experiments, just mention that this strategy has
been explored).
Response: following your suggestion, we discuss the last two
references provided in terms of canonical model construction. We argue
that although these geometric methods aim to construct a proper
canonical model of an ontology, they lack the ability of predicting
new knowledge which is necessary for real-world applications:
knowledge bases may be incomplete, and any ontology embedding method
should be able to find a tradeoff between the prediction of entailed
and novel knowledge. These remarks can be found in Section 3.2 of
Related Work. We also modified the abstract accordingly: we mention
there that we focus on geometric optimization-based approaches for
constring an ontology model as opposed to canonical model-based
methods.
Reviewer 3:
The last two lines on the semantics need to be fixed. in the line
that you have \exists r.A, there is the part {a \in \Delta^I … }. Note
that a is an element of \bold{I}, not of \Delta^I, because you did not
define \Delta^I as a set that contains \bold{I}.
Response: As suggested, we rewrote the definition of semantics of
existential restriction replacing $a$ with $x$ and $b$ with $y$ to
avoid confusion with names of individuals.
Reviewer 3:
In the last line, it should be {a}^I = {a^I} instead of {a}^I = {a}.
Response: As suggested, we fix the interpretation of nominals
accordingly.
Reviewer 3:
It would be better to describe the normalization step in detail. You
can see an example here:
https://link.springer.com/chapter/10.1007/978-3-642-33158-9_4 in
Figure 3.
Response: As suggested, we added the normalization rules, they can be
found in Table 2, Section 2.1 of the revised version.
Reviewer 3:
Section 2.2: “axioms to a knowledge base that are not explicitly
represented” -> (suggestion, something like (?)) that hold but are not
present (and not in the deductive closure).
Response: We modified the definition of a knowledge base completion
task accordingly, it can be found in the first paragraph of Section
2.2. We did not include the deductive closure in this definition since
at the end of the first paragraph, Section 2.2, we mention two tasks
which could be defined as knowledge base completion: adding only
``novel'' axioms to a knowledge base that are not in the deductive
closure of the knowledge base, and adding axioms that are in the
deductive closure as well as some ``novel'' axioms that are not
deductively inferred.
Reviewer 3:
[23] explores -> Mention the name of the author(s). Same for other
similar cases
Response: We added authors' names while discussing related work in
Section 3.3.
Reviewer 3:
Section 3.4 Notion of entailment: “when a model” -> whenever a model
(because when a model is giving the idea that only one model is needed
but in reality this is quantified over all models). Perhaps rewrite
this whole part and give a more formal definition using
interpretations? This would be better since you already defined
interpretations before in the paper.
Response: We decided to remove the whole section 'Approximate Semantic
Entailment' because the concepts introduced are not used elsewhere in
our manuscript. We introduced the notion of entailment in Section 2.1
of Preliminaries.
Reviewer 3:
.. to EL fragment… -> to *the* EL fragment
Response: as suggested, we fixed the grammar here.
Reviewer 3:
… ; to avoid… -> … . To avoid…
Response: fixed
Reviewer 3:
Page 6 line 3: … for Food Ontology… -> for the Food Ontology
Response: as suggested, we fixed the grammar here.
Reviewer 3:
Page 6 line 24 “or semantically valid concept” this sounds strange
because semantics is related to meaning and usually this kind of
replacement happens at a syntactic level, not taking into account the
meaning of the concept that is used to replace other than it is not
entailed. Or maybe I misunderstood, what do you mean here exactly?
Response: We agree and have reformulated this statement. We intended
to refer to concepts that satisfy some additional conditions, such as
being in the domain or range of some relation; however, we now just
refer to sampling from either all concept names or a subset of them.
The updated text is in the first paragraph of Section 4.
Reviewer 3:
Section 5.1.1. Please say what is what, to all the symbols that
suddenly appear in equations 6, 7, 8 *before they appear* and also
give intuition. You write a few things on page 7 but that is too late.
Response: We reorganized Sections 4.1, 4.2 and 4.3 of the revised
version of the paper so that all notations are introduced at the
beginning of each section and the explanation why each negative loss
has some particular form is given before the formula appears in the
text.
Reviewer 3:
Section 5.2.1 You can define deductive closure earlier, in the
preliminaries, since this is a notion from the literature, not a
notion that you introduce (and also make that comment about it being
finite for normalized axioms, so that you can compute it).
Response: We moved the definition of the deductive closure and
estimations regarding the deductive closure of each GCI type to a new
Section 2.3 in Preliminaries.
Reviewer 3:
Algorithm 1, these rules are inference rules for EL that you mention
that you are taking from ELK. Then, no need to write as an algorithm
and it is strange that you put in an algorithm environment without
making explicit what is the input and what is the output and whether
these rules are complete (even though they are sound). Since these
rules exist you could add them in the appendix and just that in the
main text what you do with the axioms inferred by applying the rules.
Response: The inference rules we implement are not identical to the
inference rules in ELK. The goal of our algorithm is to compute the
entire deductive closure w.r.t. all normal forms, whereas ELK computes
them w.r.t. subsumption axioms (i.e., the concept hierarchy). We
clarified this in the description of Algorithm 1 and also in the
description of Algorithm 2. In Appendix F we provide a proof of
soundness of the procedure we introduce.
Sincerely,
The authors