DELE: Deductive EL++ Embeddings for Knowledge Base Completion

Tracking #: 854-1860

Flag : New Paper Submitted

Authors: 

Olga Mashkova
Fernando Zhapa-Camacho
Robert Hoehndorf

Submission Type: 

Article in Special Issue (note in cover letter)

Full PDF Version: 

Supplementary Files: 

Cover Letter: 

Original manuscript tracking number: 796-1787. We would like to thank all reviewers and editors for their valuable feedback and insightful comments which has strengthen our work. We appreciate the possibility to revise and resubmit our manuscript 'DELE: Deductive EL++ Embeddings for Knowledge Base Completion', an extended version of NeSy 2024 conference paper 'Enhancing Geometric Ontology Embeddings for EL++ with Negative Sampling and Deductive Closure Filtering'. Please, see below a point-by-point response to all reviewers' comments. The file 'dele_highlighted.pdf' contains the revised version of the manuscript with changes highlighted in blue. The file 'dele.pdf' contains the same revised version yet without highlighted text. ======================= Review # 1 ================================ Reviewer 1: [1] The organization of the Methods and Results section is not clear. The Results section describes the novel methodological contributions and experimental results. While this type of organization may be common in other areas (e.g. biology) I don't think it works well here. I suggest the authors reorganize these sections to clearly present their methodological contributions, experimental design and evaluation results. Response: we reorganized the text and in particular Methods and Results sections as suggested. First, we removed the 'Approximate Semantic Entailment' section from 'Related Work' (Section 3). Then we placed theoretical materials related to additional negative losses into a separate Section 4 'Negative Sampling and Objective Functions' of the revised version, and theoretical materials related to the deductive closure and its utilization into a separate Section 5 'Negative Sampling and Entailments'. Information regarding training and testing procedures as well as numerical results were placed into a separate Section 6 'Experiments'. Reviewer 1: [2] The authors can be more positive about their contributions. It is presented here in very neutral terms. For instance, I strongly suggest that the authors rewrite section 5.1 in a way that more clearly demonstrates the novelty and potential impact. Response: As suggested, we added more details concerning why it is important to sample negative examples during training and, in particular, negative examples of various types. They can be found in Section 4 of the revised version of the paper. We highlight that the operation of negative sampling prevents overgeneralization of learned embeddings and trivial satisfiability in case a model collapses by incorporating additional constraints within a model. Since in geometric models, in previous work (ELEmbeddings, ELBE and Box2EL), only negatives of GCI2 axiom type are sampled during training, it may introduce biases towards this axiom type (which is usually used for evaluation) and lead to false entailments. Reviewer 1: [3] In tables 1,2 and 3, bold and underlined are confusing. Do you mean best results for each embedding method by filtered dataset? If so, you are missing several bolds and underlines. Response: For easier redability, we highlighted missing best metrics in Tables 3-6 of the revised version. We also removed columns with Hits@1 filtered and non-filtered metrics since they are not quite informative, and we added columns with the difference between filtered and non-filtered metrics for macro- and micro-mean ranks (NF-F) in Tables 3, 5 and 6 highlighting the average amount of entailed knowledge predicted by various geometric models and their modifications. Reviewer 1: [4] Can you explain why Table 2 presents only non-filtered metrics? Response: The protein--protein interaction prediction task is not tailored for evaluation using deductive closures of the train or test set: for each protein $\{P\}$ (singleton class), its subclasses include only $\bot$ and superclasses include only $\top$. As a result, the only inferred axioms will be of type $\bot \sqsubseteq \exists interacts\_with.\{P\}$, $\{P_1\} \sqsubseteq \exists interacts\_with.\{P_2\}$ or $\{P\} \sqsubseteq \exists interacts\_with.\top$, and filtered metrics may be computed only with respect to the train part of the ontology. For this reason we do not report filtered metrics for protein--protein interaction prediction task (Table 4). We now specify this in Section 6.4 (paragraph 9). Reviewer 1: [5] Is Equation 6 this is the same as the "positive loss" for disjointness? Response: Yes, for all three geometric ontology embedding models we consider we use the positive loss for disjointness as the negative loss for $C \not\sqsubseteq D$ axiom type (Equation 1 in revised version). In ELEmbeddings case, although non-containment of ball corresponding to $C$ within the ball corresponding to $D$ is not equivalent to their disjointness, the loss aims to minimize the classes' overlap for better optimization. The same logic applies for the negative loss in Equation 8 (Equation 2 in revised version) where we minimize overlap between the translated ball corresponding to class $C$ and the ball representing $D$. We highlight this in Section 4.1 (paragraphs 2-3), Section 4.2 (paragraph 2) and Section 4.3 (paragraph 2). Reviewer 1: [6] Fig. 3. Is missing a caption, and 3b is hard to interpret due to the scale of the y axis. Response: As suggested, we added a caption to Figure 3 and also captions for subfigures 3(a) and 3(b). We also replaced Figure 3(b) with a bar plot to make it easier to interpret. Reviewer 1: [7] P3 L18-22: These two sentences are not clear. Grammar could be improved, as well as a more detailed explanation of the achievement of ref30. Response: We decided to remove the section 'Approximate Semantic Entailment' entirely because we do not use the concepts introduced in this section. Reviewer 1: [8] Please clarify that the extended version includes additional experiments with different ontologies and tasks. Response: We added more information regarding how the current version of the paper extends the conference paper mentioning considerations of new geometric ontology embedding models, benchmark sets, and tasks. The addition can be found in the last paragraph of Section 1. Essentially, we study aadditional an type of knowledge base completion task, subsumption prediction, on two different benchmark sets, Food Ontology and GALEN ontology, which were not presented in the conference paper. We include a more comprehensive treatment of computing the deductive closure and using the deductive closure with EL embedding methods (e.g., for model evaluation). Apart from ELEmbeddings model, we consider two more state-of-the-art geometric ontology embedding models for EL, Box2EL and ELBE (we specifically construct all negative losses with respect to all normal forms for these new models taking into consideration their geometric properties). Furthermore, we extend our methods for ELEmbeddings model (we consider negative losses for all normal forms). Reviewer 1: Minor: P3 L 23 missing space in: [39]or Response: We fixed the typo. P2 L34: The GCIs and RIs would be easier to read if presented in a table format. Response: Following the suggestion, we organized GCI and RI types into a table (Table 1 of the revised version). ======================= Review # 2 ================================ Reviewer 2: 1. The addition of work to [35] is not clear. I am also not sure if the additions are sufficient for a journal. These additions should be explained in more detail. Response: We added more information regarding how the current version of the paper extends the conference paper, mentioning considerations of new geometric ontology embedding models, benchmark sets and tasks. The additions can be found in the last paragraph of Section 1. Essentially, we study an additional type of knowledge base completion task, subsumption prediction, on two different benchmark sets, Food Ontology and GALEN ontology, which were not presented in the conference paper. We include a more comprehensive treatment of computing the deductive closure and using the deductive closure with EL embedding methods (e.g., for model evaluation). Apart from ELEmbeddings model, we consider two more state-of-the-art geometric ontology embedding models for EL, Box2EL and ELBE (we specifically construct all negative losses with respect to all normal forms for these new models taking into consideration their geometric properties). Furthermore, we extend our methods for ELEmbeddings model (we consider negative losses for all normal forms). Reviewer 2: 2. Although there is a mention of 3-4 ontologies for evaluation, in the results tables, evaluation on only one ontology is shown. They are not even present in the Appendix. Why is that the case? Response: in Section 6.1 of the revised version, we include the description of two datasets we are using for our experiments: Gene Ontology combined with data from STRING database (Section 6.1.1) and Food Ontology (Section 6.1.2). We use Gene Ontology and STRING database for evaluating tasks of protein--protein interaction (the results are shown in Table 4 of revised version) and protein function prediction (the results are shown in Table 3 of revised version), and Food Ontology is used for subsumption prediction (the results are shown in Table 5 of revised version). In the revised version of this paper, we added experimental results on GALEN ontology, one of the standard benchmark sets for knowledge base completion task. We added corresponding information about GALEN ontology to Section 6.1.2 of the revised version, statistics regarding the number of axioms, classes and relations can be found in Appendix Section C, experimental results are summarized in Table 6 of the revised version of the paper. We comment on newly obtained results in Section 6.4 of the revised version of the paper. Reviewer 2: 3. The evaluation is performed only on one task (ontology completion) and subsumption prediction on only one ontology. How are the results of these two tasks on other ontologies? Response: In addition to protein--protein interaction prediction, protein function prediction on Gene Ontology and STRING database and subsumption prediction on Food Ontology, we added experimental results on the GALEN ontology, one of the standard benchmark sets for knowledge base completion task. We added corresponding information about GALEN ontology to Section 6.1.2 of the revised version, statistics regarding the number of axioms, classes and relations can be found in Appendix Section C, experimental results are summarized in Table 6 of the revised version of the paper. We comment on newly obtained results in Section 6.4 of the revised version of the paper. Reviewer 2: 1. The following paper is also relevant to this work. “EmELvar: A NeuroSymbolic Reasoner for the EL++ Description Logic. Biswesh Mohapatra, Sumit Bhatia, M. Raghava Mutharaju, and G. Srinivasaraghava. https://ceur-ws.org/Vol-3123/paper6.pdf. Response: We include the suggested paper into our related work section (Section 3.2). This work is relevant to ours since it extends ELEmbeddings model by introducing role embeddings which enable handling many-to-many relations. This work provides a perspective to extending the method to more expressive Description Logics. Reviewer 2: 2. What is meant by filtered metrics? Response: in recent works focusing on knowledge graph completion or knowledge base completion tasks, filtered metrics are computed by removing axioms presented within the train set from the list of all ranked axioms. This filtration is applied to eliminate statements learned by a model during training phase which are therefore likely to have lower rank and to evaluate the predictive performance of a model in a more fair setting. We include these clarifications in the last paragraph of Section 6.2 of revised version. Reviewer 2: 3. The negative losses in Section 5 is the main contribution and should be discussed earlier and not part of the Experiments section. Response: We reorganized the text and in particular Methods and Results sections as suggested. First, we removed 'Approximate Semantic Entailment' section from 'Related Work' (Section 3). Then we placed theoretical materials related to additional negative losses into a separate Section 4 'Negative Sampling and Objective Functions' of the revised version, and theoretical materials related to the deductive closure and its utilization into a separate Section 5 'Negative Sampling and Entailments'. Information regarding training and testing procedures as well as numerical results were placed into a separate Section 6 'Experiments'. Reviewer 2: 4. The purpose of Algorithm 2 is not clear. It needs to be better motivated and explained, perhaps with the help of an example. Response: The purpose of Algorithm 2 is to enrich the approximate deductive closure with axioms involving arbitrary relations and concepts or with axioms of new GCI type which may be missed by applying rules from Algorithm 1 since Algorithm 1 computes entailed axioms based on ontology axioms, concept hierarchy and role inclusions. We added this explanation to Section 5 of the revised version, and also provided two small examples indicating the importance of rules listed in Algorithm 2. Reviewer 2: 5. In lines 47, and 48 of page 11, it is mentioned that filtered negatives provide a faithful representation of GCI3. Why is this the case? Response: According to geometric interpretation, for GCI3 axioms $\exists R.C \sqsubseteq D$ to be faithfully represented, the $n$-ball interpreting the concept $C$ translated by $-r$ relation $R$ vector should lie inside the $n$-ball interpreting the concept $D$. We see this on Figure 2(b) yet not on Figure 2(a) for axiom $\exists has\_function.\{GO_2\} \sqsubseteq A$. We add this explanation to paragraph 6 of Section 6.4 and to caption of Figure 2 as well. Reviewer 2: 6. The difference between Figures 2(a) and 2(b) is not clear. Response: The difference between Figures 2(a) and 2(b) lies, e.g., in the faithful representation of axiom $\exists has\_function.\{GO_2\} \sqsubseteq A$; we now provide an explanation for this in paragraph 6 of Section 6.4 of the revised version. We also mention this in caption of Figure 2 highlighting additionally that axioms $\{Q_i\} \sqsubseteq has\_function.\{GO_2\}, i = 1, \dots, 5$ are better preserved when negatives are filtered based on precomputed deductive closure (Figure 2(b)) rather than when random negatives are sampled (Figure 2(a)). Reviewer 2: 7. What are non-filtered metrics and filtered metrics? Response: In recent works focusing on knowledge graph completion or knowledge base completion tasks, filtered metrics are computed by removing axioms presented within the train set from the list of all ranked axioms. This filtration is applied to eliminate statements learnt by a model during training phase which are therefore likely to have lower rank and to evaluate the predictive performance of a model in a more fair setting. We specify this in the last paragraph of Section 6.2 of the revised version. Reviewer 2: 8. The improvements seem to be marginal, if at all. The use of negative losses does not seem to make too much of a difference. Response: we evaluate all geometric models under consideration on novel knowledge prediction following strategies from related works on geometric ontology embedding methods, and in some cases additional negative losses may dicrease the ability of models to predict new axioms and encourage models to predict entailed knowledge first (as, e.g., in protein function prediction case) thus leading to construction of a more accurate model of a theory. Since there is a tradeoff between prediction of novel and entailed knowledge, additional negative losses may demonstrate worse performance on novel knowledge prediction. We add this note as paragraph 5 in Section 6.4 of the revised version. ======================= Review # 3 ================================ Reviewer 3: Clarify that such deductive closure is finite for normalized axioms. Response: To clarify the statement that the deductive closure computed with repsect to normal forms of axioms is finite we added the upper bounds on the total number of entailed axioms of each GCI type which depends on the total number of concepts and the total number of relations presented in an ontology; these estimations can be found in the first paragraph of Section 2.3 of the revised version. We specifically highlighted that the deductive closure computed with respect to all normal forms is finite for normalized EL theories. Reviewer 3: In the description logic literature, the EL++ ontology language is often defined without normalization (see the paper Pushing the EL envelope or Pushing the EL envelope further) and the deductive closure can be infinite. Although it is common to normalize ontologies, this may not necessarily be the case in practice. In fact many EL ontologies used in practice, such as SNOMED CT or Galen, are not normalized. E.g., there are axioms of the form A\sqsubseteq \exists r. (B \sqcap \exists r. C), even if they are fewer in quantity. The papers on EL embeddings introduce new symbols and modify these original ontologies so as to end up with normalized ontologies for benchmarking. But since this is not how the language is used in practice, it would be better to differentiate this explicitly. One can introduce a name such as EL^n++ whenever one wants to refer to a normalized ontology language. If the authors would like to keep EL++ instead of giving a new name to the normalized language, then at least a comment about this fact should be added. Response: As suggested, we added some more information regarding how normalized and the original version of an EL++ ontology are related to each other and highlighted the difference between EL++ and the normalized version of EL++ following your advice; these remarks can be found in the first paragraph of Section 2.1. We also mention there that whenever we refer to EL++ later in the manuscript we always intend the normalized EL++. Reviewer 3: Here are some recent papers on EL embeddings that are not referred to but are related. Hui Yang, Jiaoyan Chen, Uli Sattler: TransBox: EL++-closed Ontology Embedding. CoRR abs/2410.14571 (2024) https://arxiv.org/abs/2410.14571 Victor Lacerda, Ana Ozaki, Ricardo Guimarães: Strong Faithfulness for ELH Ontology Embeddings. TGDK 2(3): 2:1-2:29 (2024) https://drops.dagstuhl.de/storage/08tgdk/tgdk-vol002/tgdk-vol002-issue00... Victor Lacerda, Ana Ozaki, Ricardo Guimarães: FaithEL: Strongly TBox Faithful Knowledge Base Embeddings for Eℒ. RuleML+RR 2024: 191-199 https://link.springer.com/chapter/10.1007/978-3-031-72407-7_14 Response: As suggested, we add these works to the bibliography and discuss them in Section 3.2 of Related Work. Reviewer 3: Abstract - The authors claim that EL embedding methods do not use the deductive closure of an ontology to identify statements that are inferred but not asserted. Recent works are using the canonical model for EL ontologies (see last two references). The canonical model has not only what is explicit but also the deductive closure. Since these works are using the deductive closure, the abstract needs to be rephrased and we suggest the authors consider these works as well (no need to change the experiments, just mention that this strategy has been explored). Response: following your suggestion, we discuss the last two references provided in terms of canonical model construction. We argue that although these geometric methods aim to construct a proper canonical model of an ontology, they lack the ability of predicting new knowledge which is necessary for real-world applications: knowledge bases may be incomplete, and any ontology embedding method should be able to find a tradeoff between the prediction of entailed and novel knowledge. These remarks can be found in Section 3.2 of Related Work. We also modified the abstract accordingly: we mention there that we focus on geometric optimization-based approaches for constring an ontology model as opposed to canonical model-based methods. Reviewer 3: The last two lines on the semantics need to be fixed. in the line that you have \exists r.A, there is the part {a \in \Delta^I … }. Note that a is an element of \bold{I}, not of \Delta^I, because you did not define \Delta^I as a set that contains \bold{I}. Response: As suggested, we rewrote the definition of semantics of existential restriction replacing $a$ with $x$ and $b$ with $y$ to avoid confusion with names of individuals. Reviewer 3: In the last line, it should be {a}^I = {a^I} instead of {a}^I = {a}. Response: As suggested, we fix the interpretation of nominals accordingly. Reviewer 3: It would be better to describe the normalization step in detail. You can see an example here: https://link.springer.com/chapter/10.1007/978-3-642-33158-9_4 in Figure 3. Response: As suggested, we added the normalization rules, they can be found in Table 2, Section 2.1 of the revised version. Reviewer 3: Section 2.2: “axioms to a knowledge base that are not explicitly represented” -> (suggestion, something like (?)) that hold but are not present (and not in the deductive closure). Response: We modified the definition of a knowledge base completion task accordingly, it can be found in the first paragraph of Section 2.2. We did not include the deductive closure in this definition since at the end of the first paragraph, Section 2.2, we mention two tasks which could be defined as knowledge base completion: adding only ``novel'' axioms to a knowledge base that are not in the deductive closure of the knowledge base, and adding axioms that are in the deductive closure as well as some ``novel'' axioms that are not deductively inferred. Reviewer 3: [23] explores -> Mention the name of the author(s). Same for other similar cases Response: We added authors' names while discussing related work in Section 3.3. Reviewer 3: Section 3.4 Notion of entailment: “when a model” -> whenever a model (because when a model is giving the idea that only one model is needed but in reality this is quantified over all models). Perhaps rewrite this whole part and give a more formal definition using interpretations? This would be better since you already defined interpretations before in the paper. Response: We decided to remove the whole section 'Approximate Semantic Entailment' because the concepts introduced are not used elsewhere in our manuscript. We introduced the notion of entailment in Section 2.1 of Preliminaries. Reviewer 3: .. to EL fragment… -> to *the* EL fragment Response: as suggested, we fixed the grammar here. Reviewer 3: … ; to avoid… -> … . To avoid… Response: fixed Reviewer 3: Page 6 line 3: … for Food Ontology… -> for the Food Ontology Response: as suggested, we fixed the grammar here. Reviewer 3: Page 6 line 24 “or semantically valid concept” this sounds strange because semantics is related to meaning and usually this kind of replacement happens at a syntactic level, not taking into account the meaning of the concept that is used to replace other than it is not entailed. Or maybe I misunderstood, what do you mean here exactly? Response: We agree and have reformulated this statement. We intended to refer to concepts that satisfy some additional conditions, such as being in the domain or range of some relation; however, we now just refer to sampling from either all concept names or a subset of them. The updated text is in the first paragraph of Section 4. Reviewer 3: Section 5.1.1. Please say what is what, to all the symbols that suddenly appear in equations 6, 7, 8 *before they appear* and also give intuition. You write a few things on page 7 but that is too late. Response: We reorganized Sections 4.1, 4.2 and 4.3 of the revised version of the paper so that all notations are introduced at the beginning of each section and the explanation why each negative loss has some particular form is given before the formula appears in the text. Reviewer 3: Section 5.2.1 You can define deductive closure earlier, in the preliminaries, since this is a notion from the literature, not a notion that you introduce (and also make that comment about it being finite for normalized axioms, so that you can compute it). Response: We moved the definition of the deductive closure and estimations regarding the deductive closure of each GCI type to a new Section 2.3 in Preliminaries. Reviewer 3: Algorithm 1, these rules are inference rules for EL that you mention that you are taking from ELK. Then, no need to write as an algorithm and it is strange that you put in an algorithm environment without making explicit what is the input and what is the output and whether these rules are complete (even though they are sound). Since these rules exist you could add them in the appendix and just that in the main text what you do with the axioms inferred by applying the rules. Response: The inference rules we implement are not identical to the inference rules in ELK. The goal of our algorithm is to compute the entire deductive closure w.r.t. all normal forms, whereas ELK computes them w.r.t. subsumption axioms (i.e., the concept hierarchy). We clarified this in the description of Algorithm 1 and also in the description of Algorithm 2. In Appendix F we provide a proof of soundness of the procedure we introduce. Sincerely, The authors

Previous Version: