Towards Interpretable Embeddings: Aligning Representations with Semantic Aspects

Tracking #: 799-1790

Flag : Review Received

Authors:

Nitisha Jain

Antoine Domingues

Adwait Baokar

Albert Merono Penuela

Elena Simperl

Responsible editor:

Guest Editors NeSy 2024

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-799.pdf

Supplementary Files:

nai-supplementary-799.pdf

Cover Letter:

Dear Editors, I am pleased to submit our manuscript titled "Towards Interpretable Embeddings: Aligning Representations with Semantic Aspects" for consideration in the Neurosymbolic AI Journal Special Issue on NeSy 2024. This manuscript is an extension of the short paper we presented at NeSy 2024, which was titled "Bringing Back Semantics to Knowledge Graph Embeddings: An Interpretability Approach." We believe this extended version provides significant additional contributions to the field. The new manuscript offers the following key enhancements: • Detailed Introduction and Motivation: We have expanded the introduction to provide a more in-depth discussion of the motivation for this work and its relevance to the community. • Preliminaries Section: This new section introduces key background concepts, ensuring that readers have the necessary foundation to fully understand our approach. • Comprehensive Related Work: The related work section has been considerably expanded to include a broader and detailed review of prior studies, establishing a clearer context for our contributions. • Detailed data analysis: A new section explaining the process of data-driven feature selection for different datasets has been added, with several illustrations and plots. • Formalization of the Approach and Algorithm: We have formalized our approach and included an algorithm to offer a more structured and transparent presentation of our method. • Added figure explaining the method: The figure representing our method has been revised and improved for greater clarity. • Significant Expansion of Experiments: We have greatly expanded the experimental section, providing details of feature extraction, and introducing new experiments on two additional Knowledge Graph datasets, YAGO and Freebase. The results are presented in multiple tables, with supporting vector plots to enhance clarity. • Alternative Feature Extraction and Evaluation with LLMs: An additional section discusses an alternative approach to feature extraction and the evaluation with large language models, along with the reasoning for not pursuing this approach further. We are confident that these extensions bring substantial value to the original work, making this manuscript a strong contribution to the special issue. We appreciate the opportunity to share our work with the NeSy community and thank you for your consideration. We look forward to your feedback and are available to provide any further information or clarifications. Sincerely, Nitisha Jain

Approve Decision:

Approved

Revised Version:

Towards Interpretable Embeddings: Aligning Representations with Semantic Aspects

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 19/Dec/2024

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: No

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

Title: Towards Interpretable Embeddings: Aligning Representations with Semantic Aspects

Summary:
This paper provides a method to obtain interpretable embeddings for
knowledge graphs. The method proposed in this paper can be applied to
existing Knowledge Graph Embedding methods by post-processing the
learned embeddings of these methods. The interpretability of
embeddings is achieved by selecting features for entities and
relations based on the dataset statistics on entity and relation
frequency. Evaluation and code are provided.

Strengths:

- The framework is applicable to, potentially, any knowledge graph
embedding method, which highlights its applicability.
- Evaluation is performed exhaustively in terms of interpretability
and semantic similarity.

Comments:

- Interpretability can be understood as the capability of
understanding of the internal mechanisms of an agent to perform a
certain decision. There might be other definitions of
interpretability, however, I think I missed a definition of
interpretability in the paper and how the proposed approach reaches
or gets closer to the definition of interpretability that the
authors may choose.

- Section 3.3: "Additionally, these approaches do not address
knowledge graph data, which is central to our work, as we seek to
represent meaningful and task-relevant aspects of the KG entities in
the embedding space." This statement refer to works that generate
embeddings for OWL ontologies. However, it is imprecise to say that
ontology embedding methods do not address knowledge graph data
because a knowledge graph can be seen as an ontology ABox, making
ontology embedding methods also knowledge graph embedding methods.

- Section 5.1 describes InterpretE formally. However, I found some
issues with definitions and notation:

- Definition of "Knowledge graph" from section 2.1 differs from knowledge
graph definition from section 5.1.

- There is not and explanation of what R_C is and how
it is different from just R. This makes the explanation of the
method confusing.

- What is the difference between P(r | class(head) = C) and P(r,v |
class(head)=C)? Also, where is "head" defined?

- I do not understand the feature vector f_e. Why for V_{r_1} the
feature vector takes values v_1 and v_2 (two values) and for the
other V_{r_k} it only takes v_1 (one value)?

- The symbol u'_e in page 13 is used to denote a feature vector and
also to denote a value. This makes the whole expression look like
a recursive definition. Is that what the authors intended?

- While the evaluation of knowledge graph embeddings is exhaustive in
terms of interpretability and semantic similarity, I wonder what
would be the impact on the downstream applications showed in Figure
14 and other ones (link prediction, query answering). Is there any
trade-off between original and InterpretE embeddings?

Minor comments:

- Typo in equation 1. Xh should be X_h (with "h" as subscript)?
- Line 20: OWL2VecVec∗ --> OWL2Vec∗?
- There are several references that point to ArXiv while a published
version for those work are available.

Review #2 submitted on 16/Jan/2025

By Till mMossakowski
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

The paper provides an approach for interpretable semantic embeddings of knowledge graphs, such that vector dimensions in latent space correspond to cognitively meaningful features, as in Gärdenfors' conceptual spaces. Features here are provided through type information in the knowledge graph.
Based on an existing knowledge graph embedding and a manual selection of entity features, a support vector machine is trained to produce an interpretable embedding.
This architecture, called InterpretE, beats baselines like TransE, ConvE and Rescal (which in each case are used as existing knowledge graph embedding in the SVM) significantly (using timtop10 scores) on clustering along features. Other downstream tasks like link prediction are not evaluated.

The paper is well-readable. Compared to the NeSy 2024 conference version, it has been considerably extended, e.g. with a detailed description of the algorithm and a section on experiments with LLMs, and more.
A github repository with code is provided.
Many of the figures (Figs. 2-13,15-20) are too small to be well-readable in a printout.
The more technical part of the paper contains many errors (see detailed comments below), which make it hard to fully grasp the formal details of the approach.

A major drawback of the paper concerns the avoidance of polysemanticity through a purely feature-based approach. Note that medium-dimensional context-based embeddings in the style of Word2Vec are polysemantic in the sense of [14]. In the present paper, polysemanticity only is discussed as disadvantageous due to lack of interpretability. However, polysemanticity has many advantages, beyond achieving a compact representation: improved semantic understanding of words in their context, better performance in NLP tasks, better generalisation, better capturing of linguistic relationships. These advantages may get lost when moving to a monosemantic representation, which InterpretE seems to be. It therefore may be that InterpretE has severe drawbacks compared to other embeddings, which just become not apparent in the paper because it focuses on very specific evaluations. Indeed, the better performance of InterpretE in terms of feature similarity is not surprising given that the embeddings are based on features. Other downstream tasks like link prediction are not evaluated in the paper.

Another major drawback of the paper is that seemingly one-to-many relations are excluded (cf. p.11 l.37). This is some limitation that the authors complain about other approaches like TransE in the related work section. This limitation of InterpretE is not even discussed.

A certain limitation of the InterpretE approach is the need of manual feature selection and engineering (section 4). In section 7.1, the authors try to overcome this limitation by using LLMs. However, it seems that the setting in section 7.1 is completely different: in section 4, the input is a knowledge graph plus an embedding of the KG, while in section 7.1, the input seems to be a set of documents and a user prompt (although the first line of section 7 still speaks of "KG datasets"). IT is not clear to me where these documents come from, and how the KG is used as input for the LLM. Moreover, the prompt needs to be manually engineered, and Fig. 19 presents to sample prompts, with output that is not completely satisfactory. In face of this, the authors quickly give up the LLM aproach. However, there would be the chance to refine the prompts. Work in the literature usually uses much more refined prompts than it is done here. Moreover, few shot learning or fine tuning could possibly be used. So somehow section 7.1 seems to take the task not very serious and it seems to be more like an alibi for not automating the feature selection process. But I think that the feature selection process should and can be more automated, using LLMs - even if some final manual massaging of the LLM's results might be needed.

Due to these problems of the paper, a major revision is needed.

detailed comments:

p.2 l.32 "there is no clear correspondence between entity aspects and the dimensions of the resulting vectors."
this may be a bug, but also a feature (see discussion above)

p.3 l. 46
please use triple notation (h,r,t)
the equation of G includes *all* possible triples into G, which does not make sense

p.5, section 3.1
[14] is highly relevant related work, because also there, a monosemantic representation is achieved (through a sparse autoencoder). Why not follow the same approach here? This deserves more discussion.

p.6, l.12
You should mention OWL2Vec already here and not only later.

p.6 l.20
OWL2VecVec*

p.8 l.43
why not introduce a hierarchy of relations, i.e. playsFor is a subrelation of isAffiliatedTo? This would be the ontological way to resolve this problem.

p.8 l.46f.
"These values, coupled with the associated relation, serve as the entity aspects for the experiments." why?

p.11 l.18
V_r has not been introduced here (it is defined only in line 36)

p.11 l.26
"for a given value v\in V_r" - but v is not used

p.11 l.27
"... is computed based on its occurrences in triples" what does this mean exactly?

p.11 l.30
R_C has not been defined
what is head?

p.11 l.37
"categorized into features based on their unique values"
So it seems that one-to-many relations are excluded? (Something you complain about other approaches in section 2.2.)

p.11 l.46, defintion of f_e
The notation is too specific. Why only v_1,v_2\in V_{r_1}, and not v_1,v_2,v_3\in V_{r_1}? etc.

p.12 l.6, definition of D
D depends on the class C, so it should be D_C.

p.12 l.32
arg min_{w,b} returns a pair (w,b), but you bind the output to w_{r,v}. This does not match.

p.13 l.8, line 7 of the algorithm, class(h)
what is h?

p.13 l.18, line 17 of the algorithm
what is a union of vectors?

p.13 l.29
entities -> entity

p.13 l.32
is this a concatenation of vectors? Then this should be also used in line 17 of the algorithm?

p.14 l.13
closing bracket missing

p.15 l.42
"an additional dimension"
The kernel trick usually maps the data to a much higher-dimensional space, and not just one additional dimension.

p.16 l.43
"However, LLMs have finite knowledge and are prone to hallucinations."
Finite knowledge is trivially true and not specific to LLMs.
Maybe you mean that LLMs do not cover general knowledge that can be instantiated in infinitely many ways? But is this really true?

Review #3 submitted on 27/Jan/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Excellent
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Detailed Comments:

This paper presents InterpretE, a framework by which interpretable vectors are produced from an original KGE embedding by selecting features, based on representative roles in the original KG. These features are then used, in conjunction with SVM classifiers, to produce interpretable vectors. Loosely speaking.

Overall, the paper (at least the first 6 sections) is actually quite strong. I notice no problems in the design and construction of the experiments. The evaluation is rigorous and conducted, demonstrating significant improvement over classic KGE models.

Nonetheless, I have a few questions/comments:
* Generally, the mathematical notation in Section 5.1 is rather difficult to follow.
* How drastically did the value for \tau change your downstream results?
* In that line, how many relations were typically left out for a given entity?
* I didn't quite understand how the example in page 11 line 37-38 actually applied here.
* I found the equation at bottom of page 11 to be particularly egregious.

The pseudocode is basically a direct restatement of the text and didn't really help. I'd suggest a short paragraph in Section 5.1 page 11 line 17 to precede your "formally" paragraph with "intuitively". A running or representative example would be quite excellent. Either that or a figure, for example, giving the reader a concrete feature vector based on your experiments.

Algorithm 1: line 2: U' not in mathmode

Finally, I am a little confused about Section 7. It feels very distinct from the paper -- I'd honestly suppose that if it were left out, the paper would be of much tighter focus and stronger for it. Based on my understanding, this reads as preliminary work to the Section 4/5, basically acting as motivation because it didn't work, was too computationally and time expensive, and produced poor results? I'm not actually sure what the takeaway is here. Presumably, this should be read as a castigation: LLMs can't solve everything, and we shouldn't focus all of our attention. Yet, I reiterate, it doesn't really seem to fit with the rest of the paper.

Towards Interpretable Embeddings: Aligning Representations with Semantic Aspects

Tracking #: 799-1790

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 799-1790

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links