Wednesday, January 29, 2014

Parts of Speech 2

Problems with contextual rules


'building' can be a noun or verb.

Initially not found in the lexicon so I added it as ["NN","VBG"] meaning it's most often a noun but can be a verb with an "ing" ending.

The first part of the POS Tagger converts anything ending in "ing" to VBG.

Later rules convert many of the VBGs back to NNs.

Here's all of the rules that convert VBG to NN followed by examples where the rule should apply;

["VBG","NN","NEXTTAG","VBZ"] = Change VBG to NN if the next tag is VBZ (present tense tense verb).
The building is demolished.

["VBG","NN","PREVTAG","JJ"] = ...previous tag is adjective.
The large building near Main Street.

["VBG","NN","SURROUNDTAG","DT","IN"]
The building on Main Street.

["VBG","NN","NEXTWD","room"]
I'll wait for you in the meeting room.
I'll wait for you in the building room.

["VBG","NN","SURROUNDTAG","DT",","]
The building, the red chair and blue rug.

This rule only applies to the word "setting".
["VBG","NN","WDPREVTAG","DT","setting"]
The setting sun glowed red.

I made a test file with the examples above and then ran it through the tagger.

Words tagged as NN


./. The/DT building/NN ,/, the/DT
The/DT building/NN is/VBZ demolished/VBN
./. The/DT building/NN on/IN Main/NNP
in/IN the/DT building/NN room/NN ./.
the/DT building/NN room/NN ./. The/DT
The/DT large/JJ building/NN near/IN Main/NNP
the/DT red/JJ chair/NN and/CC blue/JJ
and/CC blue/JJ rug/NN ./. The/DT
./. The/DT setting/NN sun/NN glowed/VBD
The/DT setting/NN sun/NN glowed/VBD red/JJ
entered/VBD the/DT apartment/NN building/VBG ./.

Deadlock


In broader testing I found cases where "building" was still being wrongly tagged as a VBG:

"I entered the apartment building."
the/DT apartment/NN building/VBG ./. "/"

The error was that "apartment" was tagged as a NN instead of a JJ.

The rule ["NN","JJ","SURROUNDTAG","DT","NN"] should have converted "apartment" to a JJ, but the tag for "building" was still NN. Likewise, "building" would have been changed to "NN" if the previous word's tag was "JJ".

I don't see an obvious way around deadlocks like this.

Problem solved!


Removing a pre-tagging step from the jsPOS code fixed the problem.

// rule 8: convert a common noun to a present participle verb (i.e., a gerund)
if (tag.startsWith("NN") && words.endsWith("ing"))
    tag = "VBG";

Now both "building" and other nouns ending in 'ing' are correctly tagged as NN. Even "apartment" (before "building") is tagged as "JJ".

I'll need to check if any of the other pre-tagging rules cause problems.

Saturday, January 25, 2014

Parts of Speech (POS)

Parts of Speech (POS)

I've added Percy Wegmann's (http://www.percywegmann.com/) implementation of the Brill POS tagger: jsPOS to REve.

I changed the parser to split the text into words and punctuation. I also split contractions like can't, we're, I'm, she'll, he'd
into
can n't, we 're, I 'm, she 'll, he 'd and then added lexical entries for these;

n't JJ (adjective)
're VB
'm VB
'd MD (modal)

's is mapped to [POS,VBZ] (possessive ending, is-verb).

Then I added a few steps to the post tagging, notably to change POS to VBZ.

I also added tests in the tagger to check that contextual rules only change a word's tag when the new tag is in the word's list of possible tags.

The original code only applied 8 transformational rules. I added a set of ~200 contextual rules which I borrowed from my python code, in turn borrowed from Brill.

Includes a few tags that never match?

Probably less than 85 % correct tagging.

The lexicon is from the WSJ corpus and includes many buisness and finance terms that are unlikely to be matched in dream descriptions. I removed several 100s of these.

Allgemeine
TuHulHulZote,
"non-interest-bearing"
"property-and-casualty"
"junk-bond-financed"
yff
"F.S.L.I.C"
"Bonds-b"
"DIAL-A-PIANO-LESSON"
"J.J.G.M."
"Asia\\",
"Junk-bond"
"Junk-bonds"
"junk-bond"

The lexicon did not include "I" and the method for tagging CD (cardinal numbers) was replaced.

The result uses the color text icons described in my previous post.


If you hover the mouse over a tag the description is shown in a tool tip.

Here's a list of the tags and their parts of speech;
  • CC: Coord Conjuncn
  • CD: Cardinal number
  • DT: Determiner
  • EX: Existential there
  • FW: Foreign Word
  • IN: Preposition or subordinating conjunction
  • JJ: Adjective
  • JJR: Adjective, comparative
  • JJS: Adjective, superlative
  • LS: List item marker
  • MD: Modal
  • NN: Noun, singular  or mass
  • NNP: Proper noun, singular
  • NNPS: Proper noun, plural
  • NNS: Noun, plural
  • POS: Possessive ending
  • PDT: Predeterminer
  • PRP: Personal pronoun
  • PRP$: Possessive Personal pronoun
  • RB: Adverb
  • RBR: Adverb, comparative
  • RBS: Adverb, superlative
  • RP: Particle
  • SYM: Symbol
  • TO: to
  • UH: Interjection
  • VB: verb, base form
  • VBD: verb, past tense
  • VBG: verb, gerund or present participle
  • VBN: verb, past participle
  • VBP: Verb, non-3rd person singular present
  • VBZ: Verb, 3rd person singular present
  • WDT: Wh-determiner
  • WP: Wh-pronoun
  • WP$: Possessive-Wh
  • WRB: Wh-adverb
  • !: Excalmation
  • ,: Comma
  • .: Sent-final punct
  • :: Mid-sent punct
  • $: Dollar sign
  • #: Pound sign
  • \: quote
  • (: Left paren
  • ): Right paren
The original pos-js is Copyright 2010, Percy Wegmann and is available at: https://github.com/fortnightlabs/pos-js
Licensed under the LGPLv3 license
http://www.opensource.org/licenses/lgpl-3.0.html

Friday, January 10, 2014

Style quirks 2

I've come up with about 70 categories of style quirks. This is too many to display in a long list. Instead, I created a div that displays a little text on a colored background. I put these in a compact table similar to the Elements search. The count or percent are displayed under each icon. Non-matching categories have greyed out icons.









Some things to remember


The expressions that identify misused words like "lose", "loose", "its", it's" don't catch all instances. They will also incorrectly report misuse for some word. If a dreamer consistently uses one of these words incorrectly the number of matches make this obvious.

For these misused words results are reported as a % of all correct and incorrect matches.

For "to!" that is 100 * "to!" count/ "to" count.

For things like past and present tense results are a percent of all words.

For rarely matching categories I just report the count. I usually just want to know if the feature is present or not.

Some grammar rules are questionable: split infinitives, end a sentence with a preposition. For these "fussy" categories the icons are purple. Some rule are misapplied by writers who are afraid of being wrong. A few of the search categories could be called hyper-corrections.

The spelling expression is just a list of about 100 commonly misspelled words.

Non-english is a short list of European language function words. Most sets will have a few matches. Same for latin and yiddish.

Friday, January 3, 2014

Phraseology

I want to find things that are characteristic of a dreamer's written expression. Like the stylistic quirks of the previous post. One approach is to look for phrases that a dreamer uses over and over.

My first approach was to split the whole of the dream text into words. I used an expression that includes hyphens and apostrophes as word characters. Next I found all sequences of 4 to 6 words that occurred greater than two times. This gives results that straddle sentence borders, so a better method would be to split the text into sentences before looking up the sequences. To be completely accurate you'd have to search a second time with a one word offset, as regular expression matching continues where it left off.

My second approach (the one I ended up using) was to just look for sequences of 10 to 30 character (not including a period) that start and end on word borders: \b[^\.]{10,30}\b. I think this gives more interesting results and also preserves punctuation. I only save phrases that match 2 or more times.

Here's the results for the 2nd 'Barb Sanders" set from DreamBank. Note how much self reporting of feeling the dreamer gives.
I am in a house I am in a room relationship I am frustrated I am surprised wheelchair each other I am concerned I look out the window and see We are attracted to each other comfortable I am afraid I feel embarrassedattracted to each other uncomfortable She says yesI am annoyed I am relieved I am disappointed She says no restaurant I am pleased They ignore me They leavepresentation We like each other I feel fear He is attracted to me and She agrees I am attracted to himinterested I am shocked immediately performance being helpful I need to pee I feel sad He is attracted to me I am embarrassed together again I feel a bit guilty Howard comes in conference the houserefrigerator I can't find it I am worried I am happy I am outdoors It is crowded disappointed another roominformation interesting I open the door I don't know why He says yes I am in an apartment the street I feel uncomfortable I like that about this I tell her to stop side by side the kitchen down the hall I am walking down a long convertible She is dying He smiles at mesexually attracted to each I am in a kitchenGinny is there I go outside disappears attracted to me I am very surprised very tired unconscious It is friendly I hesitate I am very tired I am upset I feel frustrated It is night I am angry I say I know I am in a classroom We love each other I am impressedembarrassed She says sure experience embankment I am walking down a sidewalk I am exhausted I walk outside She is annoyed with me He is angry I am amazed I am at a conference The phone rings He is very happy living room conversation intersectioncommunity college I don't like that I need help It is cute Charla is there It is very beautiful I am in a wheelchair Over and over It feels wonderful I feel terrible Glenn Ford seated on a small fireplace into the living room I need to go to the bathroom I go into the house coming at me I am running I am angry at her forI look more closely and see I am concerned for them vigorously I am walking through the roomsAnother woman and her friend I stand up and walk over to I feel some fear I walk into the bedroom and I walk into another room and find them for the day having fun he is dead He is shocked all around me 's bedroom I am hesitant I am very happy They laugh for the trip I say to Naomi of the car It is black and white I don't like himapartment building surprised Paulina is there I am in bed with Howard the elevator I say to him them again coming toward me He comes in and we talk at the same time I am moving into a house He does so I am in danger of being 't do anything It makes me nervousthe accessible bathroom perpendicular to each other I am having a hard time with Charla disability She laughs He nods yes reflection He loves me I am outside going down a It is going well I manage it She is upset with me are standing hilly path I walk down the street connection to the man on a tablewith the rest into the house I walk downstairs He follows I keep walking special treasure I feel fine about it auditorium They are good It is great fun earthquakedisruptive make love They agree Now I am in another room with They are very happy with eachphysically in a building Ernie comes in begin to make love performance area cigarettes sympathetic I am surprised because the human form I feel tired He comes with me Sometimes he's trying to kill of the houseI am in a bedroom I am tired Bonnie comes indecorations the performance my dream softwarefrustrated I want to leave I don't belong here I am angry at her into the conversation We haven't seen each other for I am walking down a church hallway looking for the room where several of us are going to have a church service, even though we are all different I can't find the room, then I But just then, I am aware of an odd snoring sound I make when I am breathing, thatcomes and goes I decide I will go to the I return down the hall, now being followed by Glenn Ford talk together I explain I am having difficulties with this odd noise in my breathing He says he has the same Now I am in the bathroom and I sit on the toilet only to see I call after him I am embarrassed and say Ididn't know you were in the He says I'm sorry But it's OKNow I get up and I want to finish dressing I ask him to leave for a He steps out into the hall Then I realize he has his shirt stuffed into the top of a dress I had been wearing, all balled up in his arms I open the door and ask him to come back in with the dressThen we stand close together trying to separate his shirt out of my dress It is warm and friendly and I like him very much I realize I need to go to the bathroom again I am glad we have decided to go to the same hospital together to figure out what is wrong with usI try to blow my nose to clear the nose but that only makes it hard to breath We leave, intending to be in the same hospital ward We are scared about the odd physical problem and happy to be together get together earrings on getting ready Now I am walking down the the man is I am uncomfortable doing this I am with a group of people in figure it out California I drive away We are walking down the hall I realize I am in danger andland in the water He does too frequently for Charla I say to her He and I are attracted to each He says noI am sexually excited swimming pool I feel badly

The "Pegasus" set, also from DreamBank give a much shorter set of phrases. Note much less reporting about feelings. More about working and driving. The phrases show that Pegasus had frequent flying dreams. There's also a couple of matches for horse racing. The description of Pegasus at DreamBank notes that he was a factory working, liked to bet on horse races. So the phraseology approach revealed a few basic characteristics of the dreamer.

The word "intimated" is a euphemism for sex. Interesting because Pegasus doesn't score very high for sex, but my expression for sex does not match "intimated".

I was in Rivertown It was night I was working in the shop the street It was dark I was flying I was in the country neighborhood intimated her I was driving my car I was driving my car and the There was snow on the ground I was looking at the entries I met Margie S I was in the screw dept I can't recall now I was working in the shop and I had a hard-on I was in the house I was in the basement and a I was looking at the results I was looking at the race away from me boyhood days I was hunting looking at me I was surprised I was working on the coke yard I was embarrassed I was going to school I was working at White's again I was working at White's my boyhood days I was on St I was working at White's and I was in Rivertown, my boyhood of my boyhood days I can't recall the name he disappeared I put on the brake pedal and I was looking for something they looked I was doing some kind of workThere was a stream of water was sleeping certain time the same time in the stove I put my arms around her and I I was driving a bus There was a lot of snow on the I thought he was going to It was high I was drawing coke I was working in the dairy and I looked at it I was watching boys play ball I saw Whitey Volker and I was driving my car at night I picked them I looked out the window and I about 3 ft surrendered I was the winnerI was out hunting didn't like but I flew up into the air and anything with it We were living in our old homeout of it I saw a man lying on the I looked into ittowards me I was by Grandma's I was in the city There were a lot of people I was home the ground something similar couldn't see me I went into the house and Iauditorium the fence to harm her In my sleep I was thinking of Ann and I were walking thru Then we went bowling I got hold of it and pulled it front of us with my nails She was beautiful barefooted I reached down and pulled her I was walking down the street I went to see Dr I was in the woodwork building I saw the double in the paper I looked it over I was in a colored between them I was watching them on the floor I came to a caveI was out in the country I was in church I shook hands with him and I was in PA I was intimating my wifecolored people running on the floor I went to the men's room and once in a while I was in the shop I can't recall the name now I told Andy P the picture He was a big man into the air I went into a building and We were supposed to be I looked at them and thought I looked into the sky and I each other I took a good look at it I was at the track and I heard I put them in my shirt pocket The wind started to blow the outside Sewing Machine Co another man and there were a lot of There was an atomic explosion I was sitting in a car with a Then he turned out to be ainches in diameter I got on top of her and in the flats it was dark I was trying to find out if I was at the race track It was clear water It was steep was running the country didn't shoot I looked at the time and itinto the lake , but couldn't pulled it out was doing it I looked in surprise at the pointing at me It was night and dark I was counting them We had a dog I put on a light in the off the ground at White's surrounded by a gang ofaway from him couldn't see I can't recall what it was I looked at them and said, " Bob and I were hunting I was watching a horse race I looked at her Ann and I went into a restaurant I was working at Dill's It was about 3 ft I was in bed with Ann We were at a party It was raining I went outside and looked at on the head I looked at the sky and it was One was coming at me side track I made a sign of the cross and but I can't recall them now We were walking in the country I was in the basement in Rivertown embankment I was downtown Broadway Ave Claire Ave I had a flashlight I was driving I was in Rivertown and the I put my arms around her, the street