By Anonymous User
Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
The paper explores the enhancement of machine learning (ML) predictions in data-scarce environments through semantic-based data augmentation leveraging knowledge graphs (KGs). It enriches tabular datasets with various KG-derived embeddings and evaluates their impact on the predictive performance of ML models (e.g., KNN, SVM, XGBoost, Neural Networks) across different embedding techniques and augmentation strategies. The methodology is applied to binary classification tasks for heart disease and chronic kidney disease using public datasets. The findings demonstrate notable improvements, particularly when distance-based KG features are incorporated, with XGBoost and Neural Networks showing the most significant gains.
The paper fits well within the scope of the Neuro AI journal. It is well-written and clear. Therefore, I recommend accepting the paper with minor revisions. Below are some suggested improvements.
Presentation:
- Include a summary of the main changes in this extended version compared to the NeSy 2024 paper.
- Add a summary table in the related works section to visualize the relationship between the current work and NeSy-related studies.
- Position tables summarizing the results in the relevant sections of the main text.
- Move all algorithms to an appendix for better readability.
- Utilize Figure 14 in the main text to reference all approaches instead of having separate figures.
Experiments and Results:
- Section 6.2: While the paper reports an averaged performance across three embedding dimensions to ensure robustness, it is common practice to also average the results of multiple runs for each experiment and report the standard deviation.
- Table 2: Provide details on how the hyperparameters for each embedding method were selected.
- Section 7: Clarify how the impact of KGs was computed. Since the ontologies are used to build the KGs and subsequently implement the described approaches, specify whether the reported value represents an average across all these approaches.
Minor comments:
p10, line 44: approachess -> approaches
p10, line 51: c_j for each target class is computed. How is the centroid computed?
p11, line 19: and no classes -> and noDisease classes.
p11 in Alg3, $\vec{v_i}$ is not defined
p12, line 43: In this approach, referred to as EmbedClusterAugTab -> In this approach, referred to as ClusterAugTab
p12, line 45: Algorithm 6 -> Algorithm 5
p17, line 33: we only used on the third approach -> we only used the third approach
p17, line 39: Detailed descriptions -> I would rather say 'An overview of these models is provided in Section 2.'
p19, line 34: in the other hand -> on the other hand