kuromojist

2.0.0 • Public • Published

kuromojist

Calculate some costs ( like Mecab, %pw, %pc, %pC) from kuromoji.js (kuromojin)

Attention

This package use Unexposed API

  • Tokenizer.getLattice
  • lattice.nodes_end_at
  • Tokenizer.viterbi_searcher.search

Example

const { analyzeCosts } = require('kuromojist')
 
analyzeCosts('すもももももももものうち').then( result => {
  console.log(result)
})

This result

[
  { word_id: -1, surface_form: '', cost: 0, edge_cost: 0, shortest_cost: 0 },
  { word_id: 415760, surface_form: 'すもも', cost: 7546, edge_cost: -283, shortest_cost: 7263 },
  { word_id: 93220, surface_form: '', cost: 4669, edge_cost: -4158, shortest_cost: 7774 },
  { word_id: 1614710, surface_form: 'もも', cost: 7219, edge_cost: 17, shortest_cost: 15010 },
  { word_id: 93220, surface_form: '', cost: 4669, edge_cost: -4158, shortest_cost: 15521 },
  { word_id: 1614710, surface_form: 'もも', cost: 7219, edge_cost: 17, shortest_cost: 22757 },
  { word_id: 93100, surface_form: '', cost: 4816, edge_cost: -4442, shortest_cost: 23131 },
  { word_id: 62510, surface_form: 'うち', cost: 5796, edge_cost: -5198, shortest_cost: 23729 },
  { word_id: -1, surface_form: '', cost: 0, edge_cost: -2484, shortest_cost: 21245 }
]

Result Object

  • word_id
    • Same as kuromoji.tokenize
  • surface_form
    • Same as kuromoji.surface_form
  • cost
    • Word Cost
    • Mecab format: %pw
  • edge_cost
    • Lattice Edge Cost
    • Mecab format:%pC
  • shortest_cost
    • Minimum connection costs (accumulated)
    • Mecab format: %pc

Dependencies (1)

Dev Dependencies (5)

Package Sidebar

Install

npm i kuromojist

Weekly Downloads

10

Version

2.0.0

License

MIT

Last publish

Collaborators

  • inuscript
  • terrierscript