Learning Semantic Association Rules from Internet of Things Data

Tracking #: 808-1799

Flag : Review Received

Authors:

Erkan Karabulut

Paul Groth

Victoria Degeler

Responsible editor:

Guest Editors Neurosymbolic AI for CPS 2024

Submission Type:

Article in Special Issue (note in cover letter)

Full PDF Version:

nai-paper-808.pdf

Cover Letter:

Dear Editors, We are pleased to submit our research paper entitled “Learning Semantic Association Rules from Internet of Things Data” for consideration by Neurosymbolic Artificial Intelligence Journal’s Special Issue on Neurosymbolic AI for Cyberphysical Systems. Rule learning is part of Interpretable Machine Learning (ML) research that especially plays a major role in high-stake decision-making processes. Internet of Things (IoT) systems (e.g., of critical infrastructures such as water networks) also include high-stakes decision-making. In line with this research field, our paper introduces two contributions: i) a novel rule learning pipeline for IoT systems that make use of both sensor data and semantics (IoT knowledge graphs) as opposed to state-of-the-art sensor data-only approaches, ii) a neurosymbolic rule learning approach named Aerial that enables learning a more concise set of high-quality rules than the state-of-the-art. Our proposed Aerial approach creates a neural representation of given input data and then extracts logical rules from the neural representation. The approach is applied to the IoT data. We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. Thank you for your time and consideration. Kind regards, Erkan Karabulut, Paul Groth, Victoria Degeler University of Amsterdam, The Netherlands Corresponding Author: Erkan Karabulut (e.karabulut@uva.nl)

Approve Decision:

Approved

Revised Version:

Learning Semantic Association Rules from Internet of Things Data

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 13/Mar/2025

By Andreas Martin
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

The paper presents a well-structured study on semantic association rule mining for IoT data, combining static knowledge graphs and dynamic sensor data. The proposed Autoencoder-based Neurosymbolic ARM method effectively reduces the number of rules while maintaining high data coverage. The experiments are rigorous and provide convincing evidence of the method’s effectiveness.

Strengths:
- The integration of static and dynamic IoT data for association rule mining is a novel and well-motivated contribution.
- The evaluation is extensive, including comparisons with exhaustive and optimization-based ARM methods.
- The discussion on execution time, scalability, and variations of the method is thorough and relevant for future research.

Major Concerns:
1. The paper does not conform to the journal's layout and template. The formatting should be adjusted to align with the journal’s requirements, including section numbering, citation style, and figure placement.
2. The methodology and experimental setup are dense, making it difficult for non-expert readers to follow. More intuitive explanations or a simplified example pipeline would improve accessibility.
3. The study relies solely on one language model for evaluating the ability to extract rules. A brief discussion on whether results would generalize to other models would strengthen the impact.
4. The execution time analysis is detailed, but the discussion on real-world scalability could be expanded. The applicability of the method to large-scale IoT systems with a high number of sensors should be addressed.
5. The study is theoretically rigorous, but a more explicit discussion of practical implications would enhance its relevance.

Review #2 submitted on 08/Mar/2025

By Marvin Schiller
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Good

Detailed Comments:

As far as I can judge, the presented methodology is well-motivated and executed and the results (in particular, the clear effect of the inclusion of semantics) appear highly relevant. I find the very detailed experimental analyses very instructive and easy to follow. The overall presentation, in terms of structure and language is excellent (nonwithstanding some minor comments below), and I appreciate the Figures and examples that illustrate the main workflow and the steps taken by the algorithm (like Figure 2 or Table 6 with examples of actual association rules). The source code of Aerial being available, well described and also at first glance quite clear is highly appreciated.

However, I noticed a gap in the otherwise convincing and clear presentation of the ideas, when putting myself in the shoes of someone
who is trying to follow or replicate the presented work or adapt it to a potentially new domain (which I think would be the prime target group for this paper).
First of all, it was not clear to me how exact numeric (sensor) measurement values in the input data relate to the intervals "s1 must measure between 23-31" shown as part of the mined association rules or the comparisons like shown in Table 2 (e.g. "p1.length > 100"). In Section "Pipelines" I noted that "the proposed approach is applied to, such as the type of discretization...", so I assume that such details have been left out, but at least in the case of the discussed water sensor networks, it would have been instructional to see how this was done.
Similarly, I understand that the semantics of classes in the ontology should not be understood in the sense of e.g. RDFS or OWL class hierarchies, but more like sets/tags.
Still, I assume that the granularity of modeling applied to the ontology can have an effect on the performance of Aerial, so it would have been nice to see what was done here with a short example. Luckily I found some previous papers by the authors (e.g. https://arxiv.org/pdf/2310.07348) where this is more clear (showing a little part of
the generated knowledge graph). Taking the practicioner perspective, I would assume that if the ontological modeling was done naively (e.g. one class per sensor instance) the value of the semantics would be null. Therefore, I would have found it highly valuable to see a coherent running example included, not only for the syntax of expressions (like Table 2), but also of the translation to the enriched sensor data. Such a running example could help to pinpoint how the positive effect of semantics unfolds during the course of processing, which would put the cherry on the cake. From my perspective, this would be more crucial than exhaustive formal definitions of the underlying set theoretic (graph, language) structures.
Also, I was wondering to what degree typical knowledge-graph expressivity (like e.g. symmetry or transitivity of relations) would be picked up by the Aerial approach, to understand what degree of "semantification" is supported at all (again, thinking of making a dataset "Aerial-ready").
Also, to me it is not entirely clear if a transaction refers to all data in a given timespan of all sensors together with the properties from the entire related knowledge graph (as it seems according to "Input", and also the sets presented as "Input transactions" for the autoencoder), or if transactions are structured or even grouped/cut according to individual sensors and their vicinity
(as suggested by; "Property values from neighbors of node v can also be in the transaction set depending on the application."; from "Pipeline").

Minor comments:

- The abbreviation DL (for deep learning) is never spelled out (to distinguish e.g. from "description logic", since semantic technologies are also mentioned)
- In "Autoencoder Architecture": "lost function" --> "loss function"
- In the Discussion: "Semantic enrichment increases execution time by 2-3 times for Aerial and 3-12 times for exhaustive "methods, as shown in Experiment 2.1. --> Should read: "Experiment 1.2"
- In the conclusion: "lii" --> "ii"
- "Note that the Aerial" --> skip "the"

Review #3 submitted on 24/Mar/2025

By Savitha Sam Abraham
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Detailed Comments:

Summary:
The paper addresses a very relevant research problem, that of mining a concise set of rules from dynamic sensor data leveraging static semantic knowledge about the domain. The proposed solution employs an autoencoder for rule mining, controls the number of rules mined by using additional parameters like semantic threshold and number of antecedents, and the static semantic knowledge is incorporated by enriching the input to the auto encoder.

Questions:
What is not clear to me is the semantic enrichment step in the pipeline. I understand that eventually the input to the auto encoder is just a vector that does not have information about the semantic class or relations. Is the auto encoder aware of what these values in the vector actually represent semantically? If so, it has not been explained well in the paper.

I would like to clarify: I understand that the output is structured to capture the semantics of the values being predicted - by using softmax that predicts probabilities per class values. Could you explain this further. For instance, as in the example provided: feature f1 can take values a, b and feature 2 (f2) takes three possible values. In this case the output would have two probability distributions, one for f1 across values an and b and the other for f2. Is this right? If so, do you discretize the input feature values?

Why is the complexity only depending on number of features and number of antecedents? Shouldn't it depend on the number of values possible for each feature as well?

More explanation required: It would be nice to explain the choice of auto encoder - why undercomplete and not over complete. Did the ARM-AE paper also use under complete AE? Also, the paper mentions that ARM-AE considers the input as consequent and output as antecedent. Why does it matter? Is the proposed method differentiating between causality and correlation? The AE predicts the feature that is highly likely to co-occur with another feature. How do you conclude that it is f1-->f2 and not f2-->f1. Do you provide both these instances as input to the AE and test?

Clarification: Is it right if I say that the proposed approach results in a concise set of rules only by controlling the input parameters - threshold and number of antecedents? If so, how do you ensure that high quality or significant rules are given priority to less significant rules?

In Table 5, it is seen that FP-G method with semantics has better support and coverage for all datasets except LeakDB. It also has better confidence than Aerial with semantics in all but one dataset (LBNL). How do you explain this?

Writing: I feel the form given in Table 2 is difficult to understand - may be a better representation can be used.

Learning Semantic Association Rules from Internet of Things Data

Tracking #: 808-1799

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 808-1799

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links