By Till mMossakowski
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Average
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
The paper provides an approach for interpretable semantic embeddings of knowledge graphs, such that vector dimensions in latent space correspond to cognitively meaningful features, as in Gärdenfors' conceptual spaces. Features here are provided through type information in the knowledge graph.
Based on an existing knowledge graph embedding and a manual selection of entity features, a support vector machine is trained to produce an interpretable embedding.
This architecture, called InterpretE, beats baselines like TransE, ConvE and Rescal (which in each case are used as existing knowledge graph embedding in the SVM) significantly (using timtop10 scores) on clustering along features. Other downstream tasks like link prediction are not evaluated.
The paper is well-readable. Compared to the NeSy 2024 conference version, it has been considerably extended, e.g. with a detailed description of the algorithm and a section on experiments with LLMs, and more.
A github repository with code is provided.
Many of the figures (Figs. 2-13,15-20) are too small to be well-readable in a printout.
The more technical part of the paper contains many errors (see detailed comments below), which make it hard to fully grasp the formal details of the approach.
A major drawback of the paper concerns the avoidance of polysemanticity through a purely feature-based approach. Note that medium-dimensional context-based embeddings in the style of Word2Vec are polysemantic in the sense of [14]. In the present paper, polysemanticity only is discussed as disadvantageous due to lack of interpretability. However, polysemanticity has many advantages, beyond achieving a compact representation: improved semantic understanding of words in their context, better performance in NLP tasks, better generalisation, better capturing of linguistic relationships. These advantages may get lost when moving to a monosemantic representation, which InterpretE seems to be. It therefore may be that InterpretE has severe drawbacks compared to other embeddings, which just become not apparent in the paper because it focuses on very specific evaluations. Indeed, the better performance of InterpretE in terms of feature similarity is not surprising given that the embeddings are based on features. Other downstream tasks like link prediction are not evaluated in the paper.
Another major drawback of the paper is that seemingly one-to-many relations are excluded (cf. p.11 l.37). This is some limitation that the authors complain about other approaches like TransE in the related work section. This limitation of InterpretE is not even discussed.
A certain limitation of the InterpretE approach is the need of manual feature selection and engineering (section 4). In section 7.1, the authors try to overcome this limitation by using LLMs. However, it seems that the setting in section 7.1 is completely different: in section 4, the input is a knowledge graph plus an embedding of the KG, while in section 7.1, the input seems to be a set of documents and a user prompt (although the first line of section 7 still speaks of "KG datasets"). IT is not clear to me where these documents come from, and how the KG is used as input for the LLM. Moreover, the prompt needs to be manually engineered, and Fig. 19 presents to sample prompts, with output that is not completely satisfactory. In face of this, the authors quickly give up the LLM aproach. However, there would be the chance to refine the prompts. Work in the literature usually uses much more refined prompts than it is done here. Moreover, few shot learning or fine tuning could possibly be used. So somehow section 7.1 seems to take the task not very serious and it seems to be more like an alibi for not automating the feature selection process. But I think that the feature selection process should and can be more automated, using LLMs - even if some final manual massaging of the LLM's results might be needed.
Due to these problems of the paper, a major revision is needed.
detailed comments:
p.2 l.32 "there is no clear correspondence between entity aspects and the dimensions of the resulting vectors."
this may be a bug, but also a feature (see discussion above)
p.3 l. 46
please use triple notation (h,r,t)
the equation of G includes *all* possible triples into G, which does not make sense
p.5, section 3.1
[14] is highly relevant related work, because also there, a monosemantic representation is achieved (through a sparse autoencoder). Why not follow the same approach here? This deserves more discussion.
p.6, l.12
You should mention OWL2Vec already here and not only later.
p.6 l.20
OWL2VecVec*
p.8 l.43
why not introduce a hierarchy of relations, i.e. playsFor is a subrelation of isAffiliatedTo? This would be the ontological way to resolve this problem.
p.8 l.46f.
"These values, coupled with the associated relation, serve as the entity aspects for the experiments." why?
p.11 l.18
V_r has not been introduced here (it is defined only in line 36)
p.11 l.26
"for a given value v\in V_r" - but v is not used
p.11 l.27
"... is computed based on its occurrences in triples" what does this mean exactly?
p.11 l.30
R_C has not been defined
what is head?
p.11 l.37
"categorized into features based on their unique values"
So it seems that one-to-many relations are excluded? (Something you complain about other approaches in section 2.2.)
p.11 l.46, defintion of f_e
The notation is too specific. Why only v_1,v_2\in V_{r_1}, and not v_1,v_2,v_3\in V_{r_1}? etc.
p.12 l.6, definition of D
D depends on the class C, so it should be D_C.
p.12 l.32
arg min_{w,b} returns a pair (w,b), but you bind the output to w_{r,v}. This does not match.
p.13 l.8, line 7 of the algorithm, class(h)
what is h?
p.13 l.18, line 17 of the algorithm
what is a union of vectors?
p.13 l.29
entities -> entity
p.13 l.32
is this a concatenation of vectors? Then this should be also used in line 17 of the algorithm?
p.14 l.13
closing bracket missing
p.15 l.42
"an additional dimension"
The kernel trick usually maps the data to a much higher-dimensional space, and not just one additional dimension.
p.16 l.43
"However, LLMs have finite knowledge and are prone to hallucinations."
Finite knowledge is trivially true and not specific to LLMs.
Maybe you mean that LLMs do not cover general knowledge that can be instantiated in infinitely many ways? But is this really true?