There are many ways you can make mistakes in grammar rules. In case of XML-formatted rule files, there are recurrent mistakes such as:
Hasty generalization creates false positives (or reduces precision) of rules. It’s advisable to use the rule editor.
Sometimes, instead of correct forms, suggestions contain only messages that explain the kind of error.
correctionattribute to the
exampleto test whether the suggestion offered is correct during JUnit tests.
One of the common problems is not using parentheses
() to group
disjunctive groups in case of regular expressions with spaces. For
A would match any POS tag in case it contains
A. But when
used in an exception, you want to exclude exactly A. This is a bad way:
<exception postag_regexp="yes" postag="!A"/>
The correct form:
This way it would match the POS tag as a whole string – this is what
you actually want. Regular expressions have limited ways of expressing
negation (via sets like this:
[^A]) but using something inside an
exception enables you to negate the POS tag. In normal tokens, you can
negate_pos="yes" as a negation operator, like here:
<token negate_pos="yes" postag="A"/>
Exceptions in the rules can remain untested if they are not accompanied
example. Otherwise, you don’t really know if the exception does work.
Sometimes the corrections are quite not like the author intended - the
ordering of tokens encoded as
<match no="1".../> can be
correctionattribute to it to see if what your suggestion produces, especially if you’re creating a complex suggestion that uses existing tokens, changes their grammatical form etc.
Skipping enables matching non-contiguous sequences of tokens. However,
some sequences (such as noun phrases or verb phrases) might be broken
by punctuation characters, intervening connectives, other verb forms
etc. In Constraint Grammar, there is a notion of Barrier that specifies
such breaking-elements. In LanguageTool, we use exceptions for skipping
scope="next"). Add as many exceptions as necessary.
skip="1"on a token without an accompanying exception with
scope="next"(default value of the
scopeattribute). This exception will be matched over the skipped tokens.