open-korean-text-node
TypeScript icon, indicating that this package has built-in type declarations

2.2.0 • Public • Published

open-korean-text-node

npm version Build Status

A nodejs binding for open-korean-text via node-java interface.

Dependency

Currently wraps open-korean-text 2.2.0

현재 이 프로젝트는 open-korean-text 2.2.0을 사용중입니다.

Requirement

Since it uses java code compiled with Java 8, make sure you have both Java 8 JDK and JRE installed.
For more details about installing java interface, see installation notes on below links.

이 프로젝트는 Java 8로 컴파일된 코드를 사용하기 때문에, Java 8 JDK/JRE가 설치되어 있어야 합니다.
Java interface의 설치에 관련된 더 자세한 사항은 아래 링크에서 확인하세요.

Installation

npm install --save open-korean-text-node

Usage

import OpenKoreanText from 'open-korean-text-node';
// or
const OpenKoreanText = require('open-korean-text-node').default;
  • See API section to get more informations.

Examples

API

OpenKoreanText

Tokenizing

OpenKoreanText.tokenize(textstring)Promise<IntermediaryTokens>;
OpenKoreanText.tokenizeSync(textstring)IntermediaryTokens;
  • text a target string to tokenize

Detokenizing

OpenKoreanText.detokenize(tokensIntermediaryTokensObject)Promise<string>;
OpenKoreanText.detokenize(wordsstring[])Promise<string>;
OpenKoreanText.detokenize(...wordsstring[])Promise<string>;
OpenKoreanText.detokenizeSync(tokensIntermediaryTokensObject)string;
OpenKoreanText.detokenizeSync(wordsstring[])string;
OpenKoreanText.detokenizeSync(...wordsstring[])string;
  • tokens an intermediary token object from tokenize
  • words an array of words to detokenize

Phrase Extracting

OpenKoreanText.extractPhrases(tokensIntermediaryTokens, options?: ExcludePhrasesOptions)Promise<KoreanToken>;
OpenKoreanText.extractPhrasesSync(tokensIntermediaryTokens, options?: ExcludePhrasesOptions)KoreanToken;
  • tokens an intermediary token object from tokenize or stem
  • options an object to pass options to extract phrases where
    • filterSpam - a flag to filter spam tokens. defaults to true
    • includeHashtag - a flag to include hashtag tokens. defaults to false

Normalizing

OpenKoreanText.normalize(textstring)Promise<string>;
OpenKoreanText.normalizeSync(textstring)string;
  • text a target string to normalize

Sentence Splitting

OpenKoreanText.splitSentences(textstring)Promise<Sentence[]>;
OpenKoreanText.splitSentencesSync(textstring)Sentence[];
  • text a target string to normalize
  • returns array of Sentence which includes:
    • text: string - the sentence's text
    • start: number - the sentence's start position from original string
    • end: number - the sentence's end position from original string

Custom Dictionary

OpenKoreanText.addNounsToDictionary(...wordsstring[])Promise<void>;
OpenKoreanText.addNounsToDictionarySync(...wordsstring[])void;
  • words words to add to dictionary

toJSON

OpenKoreanText.tokensToJsonArray(tokensIntermediaryTokensObject, keepSpace?: boolean)Promise<KoreanToken[]>;
OpenKoreanText.tokensToJsonArraySync(tokensIntermediaryTokensObject, keepSpace?: boolean)KoreanToken[];
  • tokens an intermediary token object from tokenize or stem
  • keepSpace a flag to omit 'Space' token or not, defaults to false

IntermediaryToken object

An intermediate token object required for internal processing.
Provides a convenience wrapper functionS to process text without using processor object

tokens.extractPhrases(options?: ExcludePhrasesOptions)Promise<KoreanToken>;
tokens.extractPhrasesSync(options?: ExcludePhrasesOptions)KoreanToken;
tokens.detokenize()Promise<string>;
tokens.detokenizeSync()string;
tokens.toJSON()KoreanToken[];
  • NOTE: tokens.toJSON() method is equivalent with OpenKoreanText.tokensToJsonArraySync(tokens, false)

KoreanToken object

A JSON output object which contains:

  • text: string - token's text
  • stem: string - token's stem
  • pos: stirng - type of token. possible entries are:
    • Word level POS: Noun, Verb, Adjective, Adverb, Determiner, Exclamation, Josa, Eomi, PreEomi, Conjunction, NounPrefix, VerbPrefix, Suffix, Unknown
    • Chunk level POS: Korean, Foreign, Number, KoreanParticle, Alpha, Punctuation, Hashtag, ScreenName, Email, URL, CashTag
    • Functional POS: Space, Others
  • offset: number - position from original string
  • length: number - length of text
  • isUnknown: boolean

Package Sidebar

Install

npm i open-korean-text-node

Weekly Downloads

6

Version

2.2.0

License

Apache-2.0

Unpacked Size

37.6 kB

Total Files

21

Last publish

Collaborators

  • rokoroku