Most students did very well on this assignment. The only consistent shortcoming was having unnecessary loops in the tag_errors function. These unnecessary loops lead to an increase in execution time of about 10%.
mean | 56.71 |
---|---|
standard deviation | 8.58 |
-
Use the unigram tagger to evaluate the accuracy of tagging of the romance and the adventure genres of the Brown corpus. Use a default tagger of NN as a backoff tagger. You should train the tagger on the first 90% of each genre, and test on the remaining 10%. (10 points)
t0 = nltk.DefaultTagger('NN')
adv_tagged_sents = brown.tagged_sents(categories='adventure')
adv_size = int(len(adv_tagged_sents) * 0.9)
adv_train_sents = adv_tagged_sents[:size]
adv_test_sents = adv_tagged_sents[size:]
adv_tag = nltk.UnigramTagger(adv_train_sents, backoff=t0)
adv_tag.evaluate(adv_test_sents)
rom_tagged_sents = brown.tagged_sents(categories='romance')
rom_size = int(len(rom_tagged_sents) * 0.9)
rom_train_sents = rom_tagged_sents[:size]
rom_test_sents = rom_tagged_sents[size:]
rom_tag = nltk.UnigramTagger(rom_train_sents, backoff=t0)
rom_tag.evaluate(rom_test_sents) -
Now let’s investigate the most common types of errors that our tagger makes. Write a function called tag_errors which will return all errors that our tagger made. It should accept two arguments, test, and gold, which should be lists of tagged sentences. The test sentences should be ones that have been automatically tagged, and the gold should be ones that have been manually corrected. The function should output a list of incorrect, correct tuples, e.g. [('VB', 'NN'), ('VBN', 'VBD'), ('NN', 'VB'), ('NN', 'VBD'), ('TO', 'IN')]. (15 points)
def tag_errors(test,gold):
'''returns list of tuples of (wrong,correct) given automatically tagged
data and the gold standard for that data'''
errors=[]
for testsent, goldsent in zip(test,gold):
for testpair, goldpair in zip(testsent,goldsent):
if testpair[1]!=goldpair[1]:
errors.append((testpair[1],goldpair[1]))
return errors -
Use the Unigram taggers you trained to tag the test data from the adventure and romance genres of the Brown corpus. HINT: Look at the batch_tag method of the UnigramTagger. (10 points)
adv_sents = brown.sents(categories='adventure')
adv_unknown = adv_sents[adv_size:]
adv_test = adv_tagger.batch_tag(adv_unknown)
rom_sents = brown.sents(categories='romance')
rom_unknown = rom_sents[rom_size:]
rom_test = rom_tagger.batch_tag(rom_unknown) -
Use your tag_errors function to find all the tagging errors for the romance and adventure genres of the Brown corpus. (10 points)
adv_errors = tag_errors(adv_test, adv_test_sents)
rom_errors = tag_errors(rom_test, rom_test_sents) - Now create frequency distributions of the tagging errors for the romance and adventure genres. (5 points)
adv_error_fd = nltk.FreqDist(adv_errors)
rom_error_fd = nltk.FreqDist(rom_errors) - What differences do you notice between the frequency distributions of the two genres? (No code required for this question) (5 points)
- How might we improve our tagging performance? (No code required for this question) (5 points)
In this homework you will practice part of speech tagging, and evaluating part of speech taggers. The homework covers material up to Nov. 12, and is due Nov. 19th.