By Savitha Sam Abraham
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Average
Content:
Technical Quality of the paper: Weak
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average
Detailed Comments:
Summary:
The paper addresses a very relevant research problem, that of mining a concise set of rules from dynamic sensor data leveraging static semantic knowledge about the domain. The proposed solution employs an autoencoder for rule mining, controls the number of rules mined by using additional parameters like semantic threshold and number of antecedents, and the static semantic knowledge is incorporated by enriching the input to the auto encoder.
Questions:
What is not clear to me is the semantic enrichment step in the pipeline. I understand that eventually the input to the auto encoder is just a vector that does not have information about the semantic class or relations. Is the auto encoder aware of what these values in the vector actually represent semantically? If so, it has not been explained well in the paper.
I would like to clarify: I understand that the output is structured to capture the semantics of the values being predicted - by using softmax that predicts probabilities per class values. Could you explain this further. For instance, as in the example provided: feature f1 can take values a, b and feature 2 (f2) takes three possible values. In this case the output would have two probability distributions, one for f1 across values an and b and the other for f2. Is this right? If so, do you discretize the input feature values?
Why is the complexity only depending on number of features and number of antecedents? Shouldn't it depend on the number of values possible for each feature as well?
More explanation required: It would be nice to explain the choice of auto encoder - why undercomplete and not over complete. Did the ARM-AE paper also use under complete AE? Also, the paper mentions that ARM-AE considers the input as consequent and output as antecedent. Why does it matter? Is the proposed method differentiating between causality and correlation? The AE predicts the feature that is highly likely to co-occur with another feature. How do you conclude that it is f1-->f2 and not f2-->f1. Do you provide both these instances as input to the AE and test?
Clarification: Is it right if I say that the proposed approach results in a concise set of rules only by controlling the input parameters - threshold and number of antecedents? If so, how do you ensure that high quality or significant rules are given priority to less significant rules?
In Table 5, it is seen that FP-G method with semantics has better support and coverage for all datasets except LeakDB. It also has better confidence than Aerial with semantics in all but one dataset (LBNL). How do you explain this?
Writing: I feel the form given in Table 2 is difficult to understand - may be a better representation can be used.