npm install compromise-penn-tags
nlp("pour through a book").pennTags()
/*
[{
text: 'pour through a book',
terms: [
{ text: 'pour', penn: 'VBP', tags: [Array] },
{ text: 'through', penn: 'IN', tags: [Array] },
{ text: 'a', penn: 'WDT', tags: [Array] },
{ text: 'book', penn: 'NN', tags: [Array] }
]
}]
*/
This plugin is meant to supply a mapping between the standard Penn Tagset and the custom tagset in compromise.
This lets users evaluate the compromise POS-tagger by comparing it to other libraries or testing data.
Please note that tokenization choices vary considerably between pos-tagger libraries, making this comparison more difficult.
Compromise makes some unique decisions tokenizing punctuation and contractions.
Unlike most pos-taggers, compromise terms have many tags, including descendent, or assumed tags.
Compromise is also less-confident than most libraries about declaring whether a Noun is a Singular or Plural - if the penn-tag is NNPS
compromise may return NNP
instead.
the .pennTags()
method accepts the same options as the .json() method does.
nlp('in the town where I was born').pennTags({offset:true})
/*
[{
text: 'in the town where I was born',
terms: [
{ text: 'in', penn: 'IN', tags: [Array] },
{ text: 'the', penn: 'WDT', tags: [Array] },
{ text: 'town', penn: 'NN', tags: [Array] },
{ text: 'where', penn: 'CC', tags: [Array] },
{ text: 'I', penn: 'PRP', tags: [Array] },
{ text: 'was', penn: 'VB', tags: [Array] },
{ text: 'born', penn: 'VB', tags: [Array] }
],
offset: { index: 0, start: 0, length: 28 }
}]
*/
work-in-progress
MIT