Wednesday, January 29, 2014

Parts of Speech 2

Problems with contextual rules


'building' can be a noun or verb.

Initially not found in the lexicon so I added it as ["NN","VBG"] meaning it's most often a noun but can be a verb with an "ing" ending.

The first part of the POS Tagger converts anything ending in "ing" to VBG.

Later rules convert many of the VBGs back to NNs.

Here's all of the rules that convert VBG to NN followed by examples where the rule should apply;

["VBG","NN","NEXTTAG","VBZ"] = Change VBG to NN if the next tag is VBZ (present tense tense verb).
The building is demolished.

["VBG","NN","PREVTAG","JJ"] = ...previous tag is adjective.
The large building near Main Street.

["VBG","NN","SURROUNDTAG","DT","IN"]
The building on Main Street.

["VBG","NN","NEXTWD","room"]
I'll wait for you in the meeting room.
I'll wait for you in the building room.

["VBG","NN","SURROUNDTAG","DT",","]
The building, the red chair and blue rug.

This rule only applies to the word "setting".
["VBG","NN","WDPREVTAG","DT","setting"]
The setting sun glowed red.

I made a test file with the examples above and then ran it through the tagger.

Words tagged as NN


./. The/DT building/NN ,/, the/DT
The/DT building/NN is/VBZ demolished/VBN
./. The/DT building/NN on/IN Main/NNP
in/IN the/DT building/NN room/NN ./.
the/DT building/NN room/NN ./. The/DT
The/DT large/JJ building/NN near/IN Main/NNP
the/DT red/JJ chair/NN and/CC blue/JJ
and/CC blue/JJ rug/NN ./. The/DT
./. The/DT setting/NN sun/NN glowed/VBD
The/DT setting/NN sun/NN glowed/VBD red/JJ
entered/VBD the/DT apartment/NN building/VBG ./.

Deadlock


In broader testing I found cases where "building" was still being wrongly tagged as a VBG:

"I entered the apartment building."
the/DT apartment/NN building/VBG ./. "/"

The error was that "apartment" was tagged as a NN instead of a JJ.

The rule ["NN","JJ","SURROUNDTAG","DT","NN"] should have converted "apartment" to a JJ, but the tag for "building" was still NN. Likewise, "building" would have been changed to "NN" if the previous word's tag was "JJ".

I don't see an obvious way around deadlocks like this.

Problem solved!


Removing a pre-tagging step from the jsPOS code fixed the problem.

// rule 8: convert a common noun to a present participle verb (i.e., a gerund)
if (tag.startsWith("NN") && words.endsWith("ing"))
    tag = "VBG";

Now both "building" and other nouns ending in 'ing' are correctly tagged as NN. Even "apartment" (before "building") is tagged as "JJ".

I'll need to check if any of the other pre-tagging rules cause problems.

No comments:

Post a Comment