Towards Semantic Understanding of GNN Layers embedding with Functional-Semantic Activation Mapping

Tracking #: 803-1794

Flag : Out For Review

Authors:

Kislay Raj

Alessandra Mileo

Responsible editor:

Guest Editors NeSy 2024

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-803.pdf

Cover Letter:

Editors-in-Chief Neurosymbolic Artificial Intelligence Tarek R. Besold, Prof. Artur d'Avila Garcez, and Prof. Pascal Hitzler Dear Dr. Besold, Prof. d'Avila Garcez, and Prof. Hitzler, I am pleased to submit our manuscript titled “Towards Semantic Understanding of GNN Layers Embedding with Functional-Semantic Activation Mapping,” for consideration in the NeSy 2024 special issue of Neurosymbolic Artificial Intelligence. This paper extends our previous work with Functional-Semantic Activation Mapping (FSAM), introducing new insights into how varying the number of layers in Graph Neural Networks (GNNs) influences both performance and semantic alignment. Our key findings include: Layer Depth vs. Semantic Coherence: We demonstrate that additional layers may improve accuracy but often degrade semantic coherence, potentially leading to correct predictions based on misaligned representations. Neuron Specialization and Community Analysis: Deeper layers can reduce neuron specialization, resulting in overlapping communities and misclassifications within classes, signaling a loss of class-specific features. Importance of Assessing Beyond Accuracy: Our work underscores the need to assess GNNs with a balanced view of accuracy and semantic clarity to improve model interpretability. These contributions highlight the value of FSAM in diagnosing GNN behavior across different layer configurations, offering a practical framework to balance interpretability and performance in neurosymbolic AI. Thank you for considering our manuscript for the NeSy 2024 special issue. We hope that our findings will contribute meaningfully to the journal’s aim of advancing explainable and semantically coherent AI models. Sincerely, Kislay Raj Doctoral Researcher , School of Computing, Dublin City University, Ireland

Approve Decision:

Approved

Revised Version:

Towards Semantic Understanding of GNN Layers embedding with Functional-Semantic Activation Mapping

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 12/Feb/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

1. Strengths

The paper presents several strong points in terms of novelty, research depth, and experimental validation:
1.1 Novelty and Contribution
The study extends Functional-Semantic Activation Mapping (FSAM) to analyze the impact of GNN layer depth on model performance and semantic representation.
It addresses the critical issue of over-smoothing in Graph Neural Networks (GNNs), an area that has been less explored from an interpretability perspective.
The introduction of community analysis within FSAM graphs to highlight class misclassifications and neuron activation overlap is a fresh approach to GNN explainability.
The study contributes to the broader discourse on explainable AI (XAI).
1.2 Methodological Strengths
The use of multiple datasets (Cora, CiteSeer, PubMed, Amazon Computers, Amazon Photos, and Coauthor) enhances the robustness and generalizability of the results.
The paper systematically evaluates GNNs with varying layer depths (from 1 to 4 layers), demonstrating how increasing depth influences FSAM graph quality and accuracy.
FSAM graphs provide a visual and quantitative tool for diagnosing when a GNN makes "right predictions for the wrong reasons", a critical issue in machine learning interpretability.
1.3 Experimental Findings
The experiments demonstrate that adding more layers to a GNN does not always improve accuracy or semantic representation, reinforcing concerns about over-smoothing.
FSAM successfully captures neuron activation patterns and identifies when deeper layers compromise class separability, making it a promising tool for understanding GNN behavior.
The layer-wise accuracy vs. FSAM quality comparison effectively highlights the diminishing returns of increasing GNN layers.
2. Weaknesses

Despite its strengths, the paper has a few weaknesses that could be improved:
2.1 Lack of Baseline Comparisons
While FSAM is compared with some existing explainability methods, a direct comparison with state-of-the-art GNN explainability models (e.g., GNNExplainer, XGNN, PGExplainer, SubgraphX) on the same datasets is missing.
A more quantitative benchmark comparison would help demonstrate FSAM’s advantage in real-world interpretability.
2.2 Limited Discussion on Computational Complexity
FSAM involves tracking neuron activations across multiple layers and performing correlation-based analyses, which could be computationally expensive for large-scale graphs.
There is no mention of FSAM’s scalability when applied to larger graphs.
A discussion on computational cost and possible optimizations would be valuable.
2.3 Over-Reliance on FSAM as a Ground Truth
The paper assumes that FSAM-generated graphs always represent correct semantic relationships.
However, no external validation (e.g., human domain experts) is provided to confirm whether FSAM representations align with true semantic structures.
A qualitative evaluation or expert validation could strengthen FSAM's reliability.

3. Potential Errors and Areas of Improvement

3.1 Statistical Consistency in Results
The Pearson correlation values in Table 2 (ranging from 0.589 to 0.917) indicate high correlation but do not necessarily prove causation.
It would be beneficial to perform:
Significance testing (p-values)
Confidence intervals
Statistical ablation studies (e.g., what happens if certain neurons are artificially suppressed?).
3.2 Ambiguities in Community Analysis
While the paper states that overlapping neuron activations contribute to misclassification, it does not quantify the degree of overlap that leads to significant misclassifications.
A threshold for defining critical neuron overlaps would provide a clearer interpretation.
The Table 3 (Community Structure Analysis) could be enhanced by showing per-class misclassification rates instead of just total mistakes.
3.3 Experiment Reproducibility
The paper lacks clear implementation details:
What hyperparameters were used for GNN training?
What batch sizes, learning rates, and optimizers were employed?
Were all experiments conducted under controlled settings (same random seed)?
Providing a GitHub repository or additional experimental details would improve reproducibility.

Review #2 submitted on 18/Feb/2025

By Lia Morra
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Detailed Comments:

Summary and contribution

The manuscript proposes a XAI technique for GNN embeddings which aims at producing a global explanation of the graph embeddings. It is based on Functional Semantic Activation Mapping (FSAM), recently proposed in a Nesy24 paper, of which the present manuscript is an extension. FSAM constructs a semantic graph that relates hidden nodes with output classes based on activations’ correlations. This graph can be analyzed using, e.g., community detection, to characterize the network’s behavior. Experiments conducted on several benchmarks show that FSAM can analyze the GNN behavior on a layer-by-layer basis. FSAM is used to explore GNNs of increasing depth, showing that increased correlation between activations correlates to a decrease in performance, due to oversmoothing.

Strengths

- The manuscript focuses on an important yet underexplored topic (GNN explainability at the model rather than instance level)

- Extensive experimental validation is provided to show that FSAM quality correlates with GNN accuracy, and that increasing the number of layers reduces neuron specialization

- Community analysis provides a way to group classes that are considered similar by the network, thus providing insights into its inner working, especially when correlated with misclassification patterns

Weaknesses

- The technical novelty is limited with respect to the conference paper: the primary contribution are additional experiments

- While the paper claims that “FSAM quality” correlates with GNN accuracy, the concept of quality is vaguely defined. It is not clear if the FSAM is per se interpretable or validated against a reference standard

- The manuscript could be improved in terms of clarity and flow

Detailed remarks

1. One of the claims of the paper is that FSAM quality decreases due to over-smoothing (page 2, L20-26). A small introduction on the concept of over-smoothing, and a summary of related works on the topic, would strengthen and clarify the contribution of this paper. Oversmoothing has long been studied in the machine learning literature [1], what are the unique advantages of FSAM, if any, in detecting oversmoothing? Or conversely, is the fact that FSAM output indicates oversmoothing proof of its validating, since oversmoothing is a well-studied problem?

2. In Table 1, the content of each column should be clarified, ideally in the caption. While type or black-box are self-explanatory, task, target, flow, and design are less so. Abbreviations used in the Table should be defined in the caption or in the text.

3. I like the idea of having Section 4 with the detailed contribution, but a lot of content is repeated from the introduction. The introduction could be shortened to avoid repetitions

4. It is not clear to me what cross-domain validation means in the context of FSAM validation (page 6, line 9). I think validation on multiple domains would be clearer, as often cross-domain is used to entail that the system is trained/configured on one domain, and tested/used on another domain

5. The description of FSAM is much more concise than in the NeSy paper. I understand that this choice leaves room to expand on the experimental validation, and limits repetition. However, in the interest of a more self-contained manuscript, I believe it would be useful to at least briefly define all elements of the methodology. In particular, the following concepts are mentioned but not defined or explained: the notion of ego-graph (page 5, line 10); how each activated neuron is mapped to the final predicted class (page 7, line 34-38); how communities are extracted (Section 5.4)

6. In Fig.1 there are two nodes which are separately from the rest of the graph. Is it an artifact of the visualization or are they nodes with distinct characteristics?

7. It is not entirely clear, to me, why the graphs in Fig. 1-4 are “semantic graphs”, since most of the nodes are layers, and thus are labelled with strings that do not carry, by themselves, any semantic meaning. If I understand the paper correctly, the semantics are given by the connections with the predicted labels, but these are hard to interpret visually. It is also not evident which neurons correspond to each layer, and thus how layer-by-layer comparison can be obtained. In Figs 2-4, all labels appear to refer to the first convolutional layer (Conv1*), thus it is not self-evident how the structure changes in different layers, or with networks of different depths.

8. In Table 2, how is the layer-wise accuracy calculated? Or, if the table compares the accuracy of separate networks characterized by different depths, then the caption should be revised (by layer-wise accuracy, I understand that the network has four layers, and the accuracy is computed at the end of the first, second, third and fourth layer).

9. Table 3 includes the absolute mistake count, however, also including the percentage figure would clarify.

10. Sections 5.4 and 5.5 are quite long and would benefit from a revision to better summarize the substantial number of experiments described in the paper. Section 5.5 also refers to several key figures (page 15, 1-24) but does not refer to the actual content of the paper. Some references are unclear: for instance, there is a reference to Section 8, which does not exist in the paper, or to visualization of community structures that are not present in the manuscript. Page 15, line 25 refers to Table 2, but the sentence seems more consistent with Table 3 instead. I think that revising these two final sections to improve readability, clarify conclusions, and better connect them with the experimental results would strengthen the paper.

[1] Li, Qimai, Zhichao Han, and Xiao-Ming Wu. "Deeper insights into graph convolutional networks for semi-supervised learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 32. No. 1. 2018.

Typos:

Page 6, line 8 (and 16/24/31): in the subsection 5.2 We are extending -> in Subsection 5.2, we are extending

Page 6, line 9: what is 5.1? subsection I image

I would avoid capitalization in the sentence (e.g., page 6, line 8)

Page 7, line 44: However, As shown -> However, as shown

Page 11 lines 31-48: the same paragraph is repeated twice

Review #3 submitted on 24/Jan/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Weak

Detailed Comments:

Short Summary
This paper extends the authors' prior work on FSAM, a method for interpreting GNNs via semantic graphs. The main contributions are experimental validations of FSAM across multiple datasets and layer depths. However, FSAM is not novel, and the findings (over-smoothing, misalignment) are well-documented in GNN literature. The work provides empirical validation but lacks theoretical or practical advancements. It is an extension of existing research rather than a novel contribution.

Novelty
The novelty of this work is limited. The FSAM method itself is not new, as it was previously introduced and published by the same author. The paper conducts additional experiments with FSAM across multiple datasets and layer configurations. The key findings, misalignment between accuracy and semantic quality and over-smoothing in deeper GNN layers are well-studied phenomena in GNN/ML research. While the experiments empirically validate these effects using FSAM, they do not introduce new insights or address previously unexplored challenges. Thus, the work extends existing research but does not significantly advance the field.

Significance and Expected Impact
The significance and expected impact of this work is limited. While the experiments provide further validation of FSAM, they do not explore how the setup could be generalized or applied to other XAI methods. The findings (misalignment and over-smoothing) are well-known issues in GNNs and do not offer new insights. As a result, the work is unlikely to substantially impact the field.

Soundness
The soundness of the paper is good. The experimental setup is sound, with evaluations across multiple datasets and layer configurations. However, the paper lacks a deeper theoretical analysis or comparison with other XAI methods.

Clarity of Exposition
The clarity of the paper is suboptimal. The FSAM method is not well-presented. The contributions are repeated excessively, making the text feel redundant. The figures are poorly presented. Additionally, the writing style often feels repetitive.

Detailed Comments:

Page 2 line 3&4:
"This ambiguity complicates the interpretation of learned embeddings, making it essential to understand model behaviour through detailed activation analysis."
What is the ambiguity in this context? Please be more concrete.

Page 3 line 13: Abbreviation definition in header. Please do not do this.

Page 3 Line 36-38:
"Additionally, these methods tend to provide explanations that are difficult for humans to interpret, as they do not reveal the underlying relationships between the GNN’s learned representations and the data’s inherent structure."
Please provide evidence for this claim. I would argue this heavily depends on how concise an explanation is, and various methods provide concise explanations for GNNs (e.g., GraphLime).

Page 3 Line 40-43:
"However, XGNN’s assumption that a single synthetic graph can represent an entire class oversimplifies the complex relationships within real-world datasets."
Please justify this claim.

Page3 Line 43 & 44 (This claim is made more often, but no substantial reasoning is provided)
"Such approaches, while offering some insight into the final predictions, fail to account for how intermediate layers contribute to the learned representations"
Why is the contribution of the intermediate layers inherently important? Intermediate steps are not enforced to be linked to semantic and human-interpretable concepts, and thus, the model may learn a non-human-accessible representation as an intermediate step. This is not necessarily a downside of the model.

Page 4 Line 30 & 31:
"Furthermore, soft masking techniques [29], which are effective in image domains, compromise the integrity of graph structures when adapted to GNNs."
Why is that important here? The sentence feels out of place.

Page 4 Line 37 & 38:
"FSAM enhances both transparency and accountability, providing explanations that are intuitive and accessible to non-experts"
Non-experts of what? GNNs?

Page 4 line 43:
A nice figure would help to relate better to the method.

Page 5 Line 19-20:
"To capture the behaviour of neurons within the GNN, we calculate neuron activations using Graph Convolutional Networks (GCNs)"
Do you use GCNs to calculate the neuron activation for any underlying GNN architecture?
And please use a reference here.

Page 5 Line 43&44
"We visualize these graphs using thresholding techniques to identify the most influential neurons in decision-making."
It would help to see an example visualization of these graphs to understand what is going on here.

Page 5 Line 46 -51:
"We find that additional layers beyond a certain threshold do not yield significant new information. Instead, the overlap between neuron activations for different classes intensifies, undermining class-specific representation and confirming our hypothesis that over-smoothing impairs the model’s ability to distinguish classes effectively. This extended analysis substantiates our hypothesis that beyond an optimal point, adding layers fails to enhance the GNN’s knowledge capacity. The FSAM framework thus proves to be an insightful tool, not only for visualizing these limitations but also for guiding the design of more efficient GNN architectures."
You state the same claim twice in a row without substantiating it in those five lines.

Page 7 Line 1 and 2:
"testing our hypothesis that deeper layers may not always provide additional knowledge and could hinder class differentiation"
Your hypothesis describes over-smoothing, which has already been studied and confirmed to happen in GNNs.

Page 7 Line 36-38:
"This layer-wise mapping approach enabled a deeper understanding of the model’s behaviour across layers and allowed us to evaluate the effects of increasing model depth."
How did that help you?

Page 8 Fig.1.:
This figure is not human-interpretable ("Ball of Edges and Nodes")

Page 15 Line 49 -51
"These insights suggest that tuning efforts should focus on reducing overlap in the co-activation graph for similar classes to enhance the GNN’s ability to differentiate between them. By targeting overlapping nodes, we can potentially decrease misclassification rates and improve overall model accuracy."
I really like this insight because it is actionable. More actionable findings from your experiments deserve a whole subsection.

Overall notes:
- The writing style feels LLM-generated.
- Page 5, Line 6-8: (You make this claim repeatedly, and it is also part of your contributions) "Our extended analysis suggests that after a certain number of layers, additional neurons contribute less meaningful information due to over-smoothing, resulting in decreased model performance". This is a well-known phenomenon. There are even medium articles about that. https://towardsdatascience.com/over-smoothing-issue-in-graph-neural-netw...
- The contributions are basically some experiments with FSAM - an existing method by the author. The findings, misalignment and over-smoothing are not novel. They are well-studied effects in GNNs.
- Why is FSAM not compared to other global explanation methods for GNNs?

Towards Semantic Understanding of GNN Layers embedding with Functional-Semantic Activation Mapping

Tracking #: 803-1794

Flag : Out For Review

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 803-1794

Flag : Out For Review

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links