compromise-output
TypeScript icon, indicating that this package has built-in type declarations

0.0.3 • Public • Published
a plugin for compromise

npm install compromise-output

Demo

const nlp = require('compromise')
nlp.extend(require('compromise-output'))

let doc = nlp('The Children are right to laugh at you, Ralph')

// generate an md5 hash for the document
doc.hash()
// 'KD83KH3L2B39_UI3N1X'

// create a html rendering of the document
doc.html({ '#Person+': 'red', '#Money+': 'blue' })
/*
<pre>
  <span>The Children are right to laugh at you, </span><span class="red">Ralph</span>
</pre>
*/

.hash()

this hash function incorporates the term pos-tags, and whitespace, so that tagging or normalizing the document will change the hash.

Md5 is not considered a very-secure hash, so heads-up if you're doing some top-secret work.

It can though, be used successfully to compare two documents, without looping through tags:

let docA = nlp('hello there')
let docB = nlp('hello there')
console.log(docA.hash() === docB.hash())
// true

docB.match('hello').tag('Greeting')
console.log(docA.hash() === docB.hash())
// false

if you're looking for insensitivity to punctuation, or case, you can normalize or transform your document before making the hash.

let doc = nlp(`He isn't... working  `)
doc.normalize({
  case: true,
  punctuation: true,
  contractions: true,
})

nlp('he is not working').hash() === doc.hash()
// true

.html({segments}, {options})

this turns the document into easily-to-display html output.

Special html characters within the document get escaped, in a simple way. Be extra careful when rendering untrusted input, against XSS and other forms of sneaky-html. This library is not considered a battle-tested guard against these security vulnerabilities.

let doc = nlp('i <3 you')
doc.html()
// <div>i &lt;3 you</div>

you can pass-in a mapping of tags to html classes, so that document metadata can be styled by css.

let doc = nlp('made by Spencer Kelly')
doc.html({
  '#Person+': 'red',
})
// <pre><span>made by </span><span class="red">Spencer Kelly</span></pre>

The library uses .segment() method, which is documented here.

by default, whitespace and punctuation are outside the html tag. This is sometimes awkward, and not-ideal.

the method returns html-strings by default, but the library uses Jason Miller's htm library so you can return React Components, or anything:

doc.html(
  {},
  {
    bind: React.createElement,
  }
)

MIT

Readme

Keywords

none

Package Sidebar

Install

npm i compromise-output

Weekly Downloads

3

Version

0.0.3

License

MIT

Unpacked Size

71.5 kB

Total Files

7

Last publish

Collaborators

  • spencermountain