By Luca Andolfi
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Average
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
The article proposes a semi-automated pipeline for extracting driving rules from handbooks.
The goal is to find and encode in a formal specification the common sense rules (not solely regarding safety) that human beings effectively apply when driving as a result of their abstract reasoning ability. The obtained formalization can then be used by autonomous driving systems to enhance their reliability and overall robusteness to unexplained anomalies.
I have organized my review into sections a) to e): Merits, Issues, Questions, Suggestions and Conclusion.
a) MERITS.
I believe the article has the following merits:
1. It considers common sense rules in autonomous driving as fundamental element
to integrate in the development of autonomous vehicles (AVs): leveraging these rules
has several benefits from adapting AVs to operate in different regions with distinct
driving patterns, to enhancing the ability to detect system anomalies and increasing
the overall trustworthiness of AVs;
2. The proposed extraction framework expresses formally driving rules
and allows for inferences, portability, and adaptability.
b) ISSUES.
On the other hand, these are in my opinion the most evident issues:
1. The accuracy of the overall approach is not satisfactory. Indeed there are
at least two sources of errors to consider: the parser and the rule construction.
As far as the parser is concerned, its limitations are evident from the examples
shown in Section 7. and 8.1. Regarding the rules construction errors in this
phase are even more delicate to spot and correct as shown for the example
"if you are in an intersection when you see an emergency vehicle, continue through
the intersection". Here the authors had to look back at the original sentence to
recognize the error (a missing conjunction) because the rule is semantically correct
per sè.
2. The parsing method is handbook-specific: in particular section 8.2 mentions
that the California's driving manual is particularly well-suited for the proposed
extraction method. However, several issues are found with different handbooks.
3. A lot of manual work is required: Table 3 mentions that 539 of the 708 rules
were manually refined. Even though, as stated at the end of Section 7.1, the
authors did not read the manuals nor created the rules **completely** manually,
still the number of rules to be refined is high. Moreover, if I understand correctly,
also the classification of the rules is manual (this is mentioned at the end of
section 8.3).
4. Section 6.3.1. states that triples have shape (subject, verb, object). However,
in the following examples we see triples such as (continue, through, intersection),
(dangerous condition, at, rail) where "through" and "at" are prepositions and not
verbs. This is a conceptual inconsistency that needs to be addressed,
otherwise the semantics of the rules is undermined.
I understand that the goal of the article is to give a proof of concept regarding the extraction of the rules, but still 1-4 seem to significantly limit its applicability. I would like to hear from the authors about them.
c) QUESTIONS.
At the end of Section 8.4. I see that there are 416 rules in the so-called "Easy" category.
How many of these rules needed refinement before being used?
d) SUGGESTIONS.
Here are some suggestions with respect to my comments in b).
1. Parsing: I belive the approach may benefit from the (moderate) introduction of
machine learning solutions. Moderate means that the usage should stay in the
boundary of explainability.
Rule extraction: I would like to suggest a controlled use of LLMs.
For example, I think it would be possible to:
- compute the rule R1 as in the article,
- instruct via system prompt the LLM to use only words from the
parsed sentence in its answer and make it compute a rule R2,
- compare R1 and R2 against the extracted sentence as the ground truth (for
instance by counting how many entities in the ground truth they match each).
This could help identifying erros such as the one in "if you are in an
intersection when you see an emergency vehicle..." where "emergency" was not
mentioned in R1.
2. See 1.
3. For the classification of rules, one can integrate the use of some heuristics
(for example rules sharing the same words are likely related) to reduce the
supervision effort.
4. I was expecting to see something like this:
(continue, through, intersection) -> (self, continue through, intersection)
(dangerous condition, at, rail) -> (dangerous condition, detected at, rail).
e) CONCLUSION.
I think the article is suitable for acceptance, because the merits highlithed in a) remain valuable
and the approach (with its limitations) succeeds in showing this.
However it would strengthen the paper to include some improvements with respect to some of the concerns in b),
at least through additional proofs of concept.