The ASF is used to transfer the postags from one token to the lemma of another token. This allows for “nested matches” (putting matches inside matches within suggestions).
Say you want to write a rule that finds and corrects both these errors:
– He has many houses and boat. → boats
– He has many children and grandchild. → grandchildren
<rule id="MANY_HOUSES_AND_BOAT" name="He has many houses and boat(s)">
<pattern>
<token>many</token>
<token postag="NNP?S" postag_regexp="yes"/>
<token>and</token>
<marker>
<token postag="NNP?" postag_regexp="yes"/>
</marker>
</pattern>
<filter class="org.languagetool.rules.en.AdvancedSynthesizerFilter" args="lemmaFrom:4 lemmaSelect:NN.* postagFrom:2 postagSelect:NN.*"/>
<message>The plural noun '\2' and the singular noun '\4' do not seem to match.</message>
<example correction="boats">He has many houses and <marker>boat</marker>.</example>
<example correction="grandchildren">He has many children and <marker>grandchild</marker>.</example>
</rule>
As you can see the in the above example, you invoke the ASF by referencing org.languagetool.rules.en.AdvancedSynthesizerFilter
. You can replace en
with de
, fr
, ca
or es
if you want to use it in German, French, Catalan or Spanish.
The arguments (args
) of the ASF are pretty straightforward: choose the lemmaFrom
token number X, take the postagFrom
token number Y (here, the first token is numbered 1
). The elements lemmaSelect
and postagSelect
ensure that the token is correctly interpreted. If a token were houses
, lemmaSelect:N.*
makes sure this word is interpreted as a noun, not as the verb “to house something”. The same goes for postagSelect
.
Keep in mind the following things:
<suggestion>
<message>
(other than you would do with the suggestion. Otherwise, it won’t work)<token min="0">
— if the optional token isn’t there, the lemmaFrom
number will no longer be correct.If you combine the ASF with the unification function, you can cover even more general patterns and propose a correction.
For example, you want a rule that finds and corrects both these errors:
– I sang and dance every day. → danced
– He reads and wrote every day. → writes
<rule id="SANG_AND_DANCE" name="He sang and dance(d) every day">
<pattern>
<token postag="SENT_START"/>
<token postag="PRP|NNP" postag_regexp="yes"/>
<unify negate="yes">
<feature id="person"/>
<feature id="tense"/>
<token postag="VB[PZD]" postag_regexp="yes"/>
<unify-ignore>
<token>and</token>
</unify-ignore>
<marker>
<token postag="VB[PZD]" postag_regexp="yes"/>
</marker>
</unify>
</pattern>
<filter class="org.languagetool.rules.en.AdvancedSynthesizerFilter" args="lemmaFrom:5 lemmaSelect:V.* postagFrom:3 postagSelect:V.*"/>
<message>The verb '\5' doesn't seem to match '\3'.</message>
<example correction="danced">I sang and <marker>dance</marker> every day.</example>
<example correction="writes">He reads and <marker>wrote</marker> every day.</example>
</rule>
Note:
<marker>
quite freely inside and outside the <unification>
.