Towards Semantically Enriched Embeddings for Knowledge Graph Completion

Tracking #: 673-1653

Flag : Review Received

Authors:

Mehwish Alam

Frank van Harmelen

Maribel Acosta

Responsible editor:

Ernesto Jimenez-Ruiz

Submission Type:

Other (note in cover letter)

Full PDF Version:

nai-paper-673.pdf

Cover Letter:

Dear Editors, This survey and position paper covers the overall view of the algorithms adding more semantic information into the Knowledge Graph Embeddings including type-hierarchy, large language models, and ontological information contained in the Knowledge Graphs. The paper is invited as part of the opening issue by the editorial board of the journal. Best Regards Mehwish Alam

Approve Decision:

Approved

Revised Version:

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 05/Oct/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

This paper provides a survey of the knowledge graph embeddings field, with some focus on the use of LLMs and perspective on semantically enriched embeddings. The authors offer some final recommendations and critically reflect on the recent results.

While I believe that some of the content presented also appears in other survey papers, I think this paper provides an interesting overview. It could be an interesting piece to have in the journal.

I still want to flag a couple of things/problems that I think can be fixed with some additional writing and restructuring.

+ The survey is, in general, very concise.

In general, if this wants to be a survey paper, I believe that the knowledge graph embedding summary could benefit from some additional details on what models actually do and a more general introduction to embeddings (e.g., how is a scoring function used to generate a prediction).

There are some papers that have not been cited (e.g., https://arxiv.org/pdf/1903.05485.pdf on multi-modal kg embeddings). I also understand that this might not be the entire focus of the paper (as some of this info can be found in other survey papers), but it also depends on how much this paper wants to be a survey paper.

If it is not possible to extend the survey due to page constraints, I'd still try to reorganize it and provide more details on the vision and the discussion part (4.3 is one of the few sections with a section-specific discussion). For example, I find the recommendations section useful, but I would again extend it and present the content in a more structured format (e.g., a figure or a table). This kind of format could help in conveying the main ideas quickly.

+ I find the paper a bit terse, as some things are briefly mentioned but not really explained. There are many instances of this problem.

"GenKGC [61] converts the KG completion task to a sequence-to-sequence (Seq2Seq) generation task. The incontext learning paradigm of GPT-3 learns correct output answers by concatenating the selected samples relevant to the input." - what are the selected samples here? In addition to this, the next sentence starts with "GenKGC similarly" but I am not sure what "similarly" refers to.

Section 5 is very short and some paragraphs would require better structure and better organization.

"ELEm has been evaluated on the Protein-Protein Interaction (PPI) dataset for the LP task. However, the successor algorithms introduced more appropriate datasets such as Gene Ontology (GO)" - why are these more appropriate and why does this make a difference?

Then, in the same section, the paper describes the results in Ruffinelli et al., which are important results in the KG embeddings evaluation, but I am not sure if this is the best way or section to introduce them. Also, since there have been some attempts at providing more uniform benchmarking utilities such as https://github.com/pykeen/pykeen, I think it might be worth mentioning these (the authors briefly introduce the paper in ref [80] but since this is an evaluation setup section I think more details could help).

+ It is sometimes unclear to me if the focus is on Knowledge Graph embeddings or LLMs.

It seems to me that LLMs appear mostly in one section, but from the introduction, I'd have expected a larger analysis and more details in the discussion section. From the title the focus should be semantically enriched embeddings, but the focus seems on more general knowledge graph embeddings to me.

Some additional references:

Additional ref and dataset for inductive link prediction: https://arxiv.org/abs/2203.01520
On multi-modality: https://arxiv.org/pdf/1903.05485.pdf

Minor comments:

RoBerta should be RoBERTa

Table 3. I think this is a useful table, as it summarizes many of the different evaluations. Maybe it would be good to add the references for the datasets?

Review #2 submitted on 12/Dec/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Detailed Comments:

The position paper presents a critical reflection on the shortcomings of current knowledge graph embedding methods for KG completion. It proposes that using LLMs, considering type hierarchies, and exploring DL axioms are possible solutions. It also enumerates broader challenges in KG completion methods and evaluation and finishes with a set of recommendations for future work in this area,

Page 1
I believe the style of the abstract could be changed to match the tone of the rest of the paper. The bottom half reads a bit like an enumeration of sections, and it would be more valuable if it actually read like a summary of important conclusions, which is the great contribution of the paper.

Page 1
I think the Introduction takes too long to reach the crux: knowledge graph completion. The reader must go through a paragraph and a half about LLMs before hitting the KG completion. I think it would benefit from a complete restructuring to place more emphasis on (1) why KG completion is an important task and (2) why are semantically enriched KG embeddings important to support it. The paragraphs about LLMs feel out of place. LLMs are ONE way to introduce more richness, are perhaps THE way to go in the future, but the problem formulation needs to be well-established before going into those particulars.

Page 3
Section 3 is titled Knowledge Graph Embedding Algorithms but in sections 3.2 and 3.3 other types of methods are presented. A clear restructuring would be needed to cover both the related work on KG embeddings and that on KG completion.

Page 3
I miss a table in section 3 organizing the work in KG completion into the types of embeddings used, and use of contextual/external information.

Page 5
Comment on "The algorithms discussed so far consider only statements in the ABox for generating KG embeddings
and performing LP." In Table 2 you present many approaches that embed the TBox. They could in principle be applied to KGs, including the ABox. Some more discussion on this point would be nice.

Page 5
Comment on: "A vast amount of knowledge is captured by LLMs, type hierarchy and the expressivity
of the description logic axioms has not been considered. The subsequent sections focus on these aspects of LP." This sentence feels out of place. It would require a lot more context to make sense. For one, there is no shortage of embedding methods that consider type hierarchies (all random-walk methods do, ontology-oriented methods do, and there is nothing to prevent other KGE methods to also explore the TBox, even if they interpret those triples just as ABox triples). I would phrase the LLMs as an opportunity and the others as limitations. While I agree that ignoring DL axioms can be seen as a limitation of KG embedding methods, I argue that not exploring LLMs isn't. LLMs are "external sources of knowledge" which is a fundamentally different aspect.

Page 5
In 4.1 I miss a clear value proposition on how LLMs can capture semantics. The section is a survey of existing works, but it lacks a clear way to place the mentioned works in the context of capturing semantics. There is also no discussio on this, since the discussion portion of this section only covers 4.2 and 4.3.

Page 9
One of the issues that is raised is the size of benchmarks. It would be helpful if Table 3 also included the size of the datasets.

Page 9
Table 3 is missing citations to the benchmarks and ontologies it refers to.

Page 10 p
Sem@K, although a step in the right direction, does not address the problem of genuinely new links. Some insight on how this could be tackled would be nice.

Page 10
Missing citations to relevant works in these areas: "Other KGs can also provide such
background knowledge, leading to an interesting blurring between the tasks of KG linking and KG completion.
Recent work on exploiting the temporal evaluation of a KG as the source of information is another example of using
information outside the KG for KG completion."

Review #3 submitted on 17/Oct/2023

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Detailed Comments:

This is a vision paper on knowledge graph embeddings for knowledge graph completion. It introduces knowledge graph embeddings in transductive link prediction, entity type prediction and inductive link prediction, as well as methods for utilising different semantics knowledge graph embedding and completion.

Regarding the related work part in Introduction, it looks quite short, with some recent important survey, position and vision papers missing. One example is: https://arxiv.org/abs/2308.06374, which introduces the combination of LLMs and knowledge graph. Although it does not exactly match the purpose of this paper, it covers the part of knowledge graph completion using LLMs. Briefly, I think more should be discussed on the recent survey, position and vision papers related to knowledge graph embedding/completion, and the difference of this paper in comparison with the existing ones.

The title of Section 3 is "Knowledge Graph Embedding Algorithms", but the subsections are organised according to the knowledge graph completion task. Why not directly name Section 3 as "Knowledge Graph Completion Tasks via Embeddings" or some title similar.

The title of Section 4 is "Towards Capturing Semantics in Knowledge Graph Embeddings". This means the subsection will introduce the works from the perspective of "Semantics" utilised. But it seems Section 4.1 is to introduce the methods of using the LLM technique. I can understand most of the LLM-based method for knowledge graph completion utilise the semantics of literals, but that may not be 100% correct. I would suggest to describe methods in Section 4.1 from the perspective of using literal semantics (LLM is not a kind of knowledge graph semantics, but a technique). Section 4 .2 and 4.3 describe the works from the perspective of semantics, but for Section 4.3, why not directly name the semantics e.g., TBox in the title, instead of using "Semantically Rich Embeddings" ("rich" can refer to many kinds of semantics).

Section 5 introduces evaluation setting, but it is not complete. I would suggest to list and introduce the commonly used benchmarks for knowledge graph embedding benchmarking, such as Wikidata5M and FB15K.

As a position, more should be discussed on the future directions. Section 6.2 gives some future direction discussion, but it seems to be quite abstract, being short of concrete technical challenges. For example, the future direction sentence from 46-47 mentions "challenges of inductive setting". But what kinds of challenges? What challenges have been partially investigated and what haven't?

I understand this domain has so many papers to cite. As a position paper, it is not necessary to cite all of them. But some important and representative ones should be considered. For example, for inductive knowledge graph completion, unseen entities, unseen relations, both unseen entities and unseen relations should be considered. Here are two papers for consideration on this topic (and the categorisation):

https://dl.acm.org/doi/abs/10.1145/3442381.3450042
https://arxiv.org/pdf/2210.03994.pdf

Similarly, the other topics could also have method categorisation, and have representative papers cited for each category.

The category column of Table 3 is quite strange. They are not categorised from on dimension. The current categories may have much overlap, e.g., LLM-based methods and inductive link prediction.

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

Tracking #: 673-1653

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 673-1653

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links