Student Performance Prediction Model Based on Course Description and Student Similarity

Tracking #: 779-1770

Flag : Review Received

Authors:

David Mäder

Maja Spahic-Bogdanovic

Hans Friedrich Witschel

Responsible editor:

Guest Editors Neuro-Symbolic AI and Conceptual Modeling 2024

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-779.pdf

Supplementary Files:

nai-supplementary-779.zip

Cover Letter:

Special_Issue_on_Neuro_Symbolic_AI_and_Domain_Specific_Conceptual_Modelling

Approve Decision:

Approved

Previous Version:

Student Performance Prediction Model Based on Course Description and Student Similarity

Tags:

Reviewed

Decision:
Reject

Solicited Reviews:

Review #1 submitted on 21/Feb/2025

By Alessandro Oltramari
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Satisfactory
Level of English: Unsatisfactory
Overall presentation: Average

Detailed Comments:

Review of "Student Performance Prediction Model Based on Course Description and Student Similarity"

This paper presents a student performance prediction model that leverages course descriptions and student similarity metrics. While the study explores an interesting approach to forecasting academic outcomes, several key aspects require attention and improvement.

The authors utilize BERT for feature extraction, which, while effective, is no longer state-of-the-art for tasks requiring deep semantic understanding. Recent advancements in large language models could significantly enhance the system’s ability to capture nuanced relationships between course descriptions and student profiles. Given the rapid evolution of NLP models, replacing BERT with a more powerful and fine-tuned architecture could yield improved predictive performance.

The authors acknowledge that their dataset is both unbalanced and not extensive. These limitations are a fundamental reason for the model's suboptimal predictive performance. However, the discussion could be expanded to analyze in greater depth how this impacts generalization and whether alternative data augmentation or resampling techniques could mitigate these issues. A more detailed error analysis could further substantiate the connection between dataset limitations and model outcomes.

A significant methodological concern lies in how the study minimizes diversity in student profiles. By not considering academic and professional histories and by focusing on courses that are thematically similar, the model risks incorporating biases that fail to reflect the diversity of the student population. Student preferences, external factors, and prior experiences often play crucial roles in academic performance, yet these are not adequately captured in the current framework. As a result, the model may fail to generalize across different student demographics and course structures. Addressing these biases is essential to ensure fair and representative predictions.

The manuscript would benefit from thorough proofreading to correct minor but noticeable errors, such as “enrolment”. Additionally, scientific writing requires precision, yet certain phrases—such as describing the dataset as "not particularly extensive"—are too ambiguous for a rigorous academic publication.

The latter sections of the paper demonstrate that the authors have a solid understanding of their model’s shortcomings and potential improvements. This self-awareness indicates a clear path forward for refining the research. However, given the substantial limitations in dataset balance, model selection, and bias considerations, major revisions are recommended before resubmission. Specifically, addressing dataset biases, improving feature diversity, and upgrading the NLP model could significantly strengthen the study’s contributions.

Final recommendation: The paper presents an interesting approach to student performance prediction but requires substantial refinements in methodology, dataset handling, and model selection. Given these concerns, major revisions are recommended prior to resubmission to ensure that the study’s conclusions are both robust and generalizable.

Review #2 submitted on 08/Jan/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Average
Originality of the paper: No
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

The paper provides for a pipeline to predict poor student performance in a course. The authors argue for this variant of the grade-prediction problem, formalize an approach primarily based on embeddings with KNN inference, and report results, which by the authors' own admission were not particularly good.

I think the biggest strength of this paper was the positioning of the problem, which I found interesting. It is a different take on previous work (which is thoroughly reviewed by the authors) and the imbalanced aspect of predicting a poor grade provides an interesting challenge.

However, I think there were two big weaknesses that preclude it from acceptance to this journal. First, the approach does not appear to be neurosymbolic. It strikes me as a pure data mining approach - perhaps more appropriate for a venue like KDD's applied data science track. Second, the evaluation requires a bit of work. The results are quite limited, both in significance and thoroughness.

With respect to the experiments, let me start with that.

1. First, the overall results of precision, recall, and F1 are very low - so on the surface, the results are not significant. However, I question if these metrics are even the most important. This is a highly imbalanced problem that has not been shown to be solvable with other methods - so maybe these numbers are not so bad after all.

2. The authors lack baseline comparisons. The three that would come to mind are (1.) applying a standard dense neural network, (2.) applying a simple modification of an approach from the grade prediction literature, and (3.) some random baseline. How well or poorly your algorithm work would be more clear in the context of other methods.

3. The hyperparameter exploration seems insufficient. The authors tried two values of k, but perhaps other hyperparameters or easy modifications can be explored, for example why not threshold distances, try different distance functions, different embeddings, etc. It would be interesting to see if the authors could develope methods that have different properties (e.g., one to improve precision and one to improve recall).

4. I also question if precision and recall are the best metrics. While standard metrics such as these are a good idea to report, the authors should also consider developing an application-specific metric as well. Is there a way the algorithm can provide results that can better impact the students? How do the baselines perform?

My second major problem with the paper is that it cannot be considered neurosymbolic in its current form. I think there are some opportunities with this problem that are worth exploring:

1. The authors describe in section 5.3 how the performance prediction works, which amounts to a series of case statements. It seems that this can be represented as a logic program (e.g., ASP, PROLOG, PyReason, etc.) and you can dumb the KNN results into this as logical facts. The result would also give the user an explanation as to why it thinks they are likely to not perform well in a class.

2. The distance functions include direct grade information, which can also be represented symbolically (e.g., propositional logic) and perhaps such information can be used to characterize such distances with some sort of symbolic annotation. Again, this could improve explainability.

In short, while I see some promise with this problem, the paper is not ready for acceptance to the journal. The authors would at a minimum need to address both of the above points; if they only address the first, it should be resubmitted to a data mining venue (and not a neurosymbolic one).

Review #3 submitted on 26/Oct/2024

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Weak

Detailed Comments:

I do not understand why this article is submitted to the Neural Symbolic AI journal; it is more a data science project aimed at building a machine learning model that captures underlying patterns using student-related educational data as features, with course selection as the binary target. An explanation of the selected venue for submission in the introductory part that mention the connection would be needed. This work could be more suitable for submission to Artificial Intelligence in Education (AIED).

They employed a standard yet unexciting pipeline. The data pipeline takes as input a pool of various course modules from students. The features include a similarity measure composed of two components: one that captures students' interests in specific subjects and another that assesses their performance in previous courses. To obtain these two measurements, the data is processed through BERT to identify students with similar course interests. The first component is achieved by calculating the averaged vector embeddings of the students' knowledge profiles, while the second evaluates course performance using an Euclidean Distance similarity measurement. The final student similarity score is derived by integrating the previously computed similarities with weighted parameters. The output of the pipeline is a binary prediction indicating whether the chosen course is appropriate or not.

However, the rationale for selecting BERT for this pipeline is not explained. There are many tools available that can generate the embedding space for text data, so why BERT? Additionally, the choice of Min-Max Normalization—converting each student’s grades into a range between 0 and 1—seems to lack consideration of outliers. What if students with similar interests have performance scores that are very low or very high? This approach is inadequate in situations with outliers. In one of the most important figures of this paper, Fig. 2, I found the quality to be disappointing; it contains a typo ("normalisation") and unnecessary icon images.

I also compared the resubmitted paper with the previous version, and it shows that most of the revisions are focused on language and clarity, sentence structure, and formatting. I believe this article needs more substantial improvements than just these.

Review #4 submitted on 19/Feb/2025

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Detailed Comments:

Summary:
In this paper, the authors aim to predict students' future grades based on their historical academic performance and course descriptions of both completed and upcoming courses. The authors employ BERT to calculate cosine similarity between course descriptions.

Points of Consideration:

Point 1. Assumption of Predicting Future Grades:
The authors make the assumption that it is possible to predict a student's future grades based on their previous course records. However, this might overlook the diversity of courses offered in a department.

In many cases, courses can be quite distinct, either in content or in difficulty. For example:
(1) If two courses have largely different content, predicting future grades based on past performance may not be reliable unless there’s a clear mechanism to account for this difference.
(2) If one course is an advanced version of another, it’s possible that students could carry over their performance. However, this assumption should be examined more closely, especially considering how the nature of the courses might influence the predictions.

The authors could provide further insight into how they envision the course diversity impacting prediction outcomes.

Furthermore, while the authors use BERT to calculate cosine similarity between course descriptions, this similarity score may not fully support the assumption, as it merely compares numerical values without offering deeper insight into how this might affect grade predictions.

Point 2. On Page 3, Line 16, the authors mention that a related work used GAN to expand the dataset from 1,044 to 46,044 rows. However, this seems to be a challenging GAN application to verify, as it may not be entirely clear how GANs were effectively used to generate more realistic student data in this context. It would be beneficial if the authors could provide more detailed explanations of the GAN's role in their methodology, particularly the authors mentioned the possibility to adopt the same method to improve their model.

Point 3. Potential Negative Impact of the Model:
If applied in real-world scenarios, the proposed model could have negative consequences. A student who receives a prediction of low future grades may develop a negative bias about his/her performance before even engaging with the course. This could undermine the student's confidence and demotivate him/her, especially for challenging but valuable courses. The authors might explore potential strategies to mitigate this negative impact, as promoting a growth mindset would be critical to student development.

Point 4. Prediction Performance:
Given the concerns mentioned in Point 3, it seems that the model's prediction performance could be not reliable or helpful in its current form. It would be useful for the authors to address these potential shortcomings and consider the real-world applicability of their predictions.

Point 5. Lack of Explainability:
Additionally, the proposed model lacks explainability, which is concerning, especially in educational contexts (see Point 3). Clear explanations of how predictions are made and their implications would be crucial for ensuring that students and educators can trust and make use of the model's outputs effectively. The authors might consider discussing potential approaches to improving model transparency.

Point 6. Neurosymbolic Aspects:
There is some uncertainty around the connection to Neurosymbolic methods in this paper. The authors might consider clarifying how their work is related to or contributes to the neurosymbolic domain. At present, the paper does not seem to provide significant advancements or clear applications of existing neurosymbolic techniques.

Student Performance Prediction Model Based on Course Description and Student Similarity

Tracking #: 779-1770

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Cover Letter:

Approve Decision:

Previous Version:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 779-1770

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Supplementary Files:

Cover Letter:

Approve Decision:

Previous Version:

Tags:

Journal Info

Submit

For Reviewers

Links