Nourishing Pear Medley

    budou

    0.1.2 • Public • Published

    budou-node

    npm version Build Status

    Node.js port of https://github.com/google/budou:

    English uses spacing and hyphenation as cues to allow for beautiful and legible line breaks. Certain CJK languages have none of these, and are notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in typography on web, and results in degradation of readability.

    Budou automatically translates CJK sentences into organized HTML code with lexical chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Google Cloud Natural Language API (NL API) to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing part-of-speech (pos) tagging and syntactic information. Processed chunks are wrapped with SPAN tag, so semantic units will no longer be split at the end of a line by specifying their display property as inline-block in CSS.

    Install

    Install budou-node using npm:

    npm install budou

    Or via yarn:

    yarn add budou

    How to use

    Get the parser by completing authentication with a credential file for NL API, which can be downloaded from Google Cloud Platform by navigating through "API Manager" > "Credentials" > "Create credentials" > "Service account key" > "JSON".

    The path of file can be set as an ENV var, GOOGLE_APPLICATION_CREDENTIALS , or passed as an option to the authenticate method.

    const Budou = require('budou')
     
    // Login to Cloud Natural Language API with credentials
    const parser = Budou.authenticate({ keyFilename: '/path/to/credentials.json' })
     
    // Set options and parse text for result
    const options = { attributes: { class: 'wordwrap' }, language: 'ja' }
    const result = await parser.parse('今日も元気です', options)
     
    console.log(result.html)
    // => "<span><span class="wordwrap">今日も</span><span class="wordwrap">元気です</span></span>"
     
    console.log(result.chunks[0].word) // => "今日も"
    console.log(result.chunks[1].word) // => "元気です"

    To make the semantic units in the output HTML wrap correctly at the end of the line target each <span> tag with display: inline-block in CSS.

    .wordwrap {
      display: inline-block;
    }

    See Original Docs for:

    Options

    parser.parse(text, options) method accepts options below in addition to the input text.

    Option Type Default Description
    attributes Object { class: 'ww' } A key-value mapping for attributes of output <span> tags.
    useCache Boolean true Whether to use caching. Helps reduce calls to NL API for repeated text.
    language String null Language of the text. If null is provided, NL API tries to detect from the input text.
    useEntity Boolean false Whether to use Entity mode.
    maxLength Number null Maximum chunk character length. If a chunk is longer than this it will not be wrapped in a <span> tag.

    Pricing

    Budou is backed up by Google Natural Language API, so cost may be incurred when using that API.

    In other languages including Japanese, the default parser uses Syntax Analysis and incurs cost according to monthly usage. If you enable Entity mode by specifying use_entity=True, the parser uses both of Syntax Analysis and Entity Analysis, which will incur additional cost.

    Google Cloud Natural Language API has free quota to start testing the feature at free of cost, but please refer to [Google Cloud Natural Language API Pricing Guide]> (https://cloud.google.com/natural-language/pricing) for more detailed pricing information.

    Disclaimer

    This Node.js library was derived from the original Budou python library https://github.com/google/budou licensed under Apache-2.0. In no way associated or endorsed.

    Keywords

    none

    Install

    npm i budou

    DownloadsWeekly Downloads

    1

    Version

    0.1.2

    License

    MIT

    Unpacked Size

    25.1 kB

    Total Files

    9

    Last publish

    Collaborators

    • jamsinclair