Naked Panda Meditations

    annotatedtext-rehype
    TypeScript icon, indicating that this package has built-in type declarations

    1.0.5 • Public • Published

    annotatedtext-rehype

    Node.js CI

    A lightweight JavaScript library based on annotatedtext and rehype-parse for converting html documents into an annotated text format consumable by LanguageTool as AnnotatedText.

    Install

    This package is ESM only. Node 12+ is needed to use it, and it must be imported instead of required.

    npm:

    npm install annotatedtext-rehype

    Use

    build(text, parse, options = defaults)

    Returns Annotated Text as described by LanguageTool's API:

    {
      "annotation": [
        { "text": "A " },
        { "markup": "<b>" },
        { "text": "test" },
        { "markup": "</b>" }
      ]
    }

    Run the object through JSON.stringfy() to get a string suitable for passing to LanguageTool's data parameter.

    "use strict";
    
    var builder = require("annotatedtext-rehype");
    
    const annotatedtext = builder.build(text);
    var ltdata = JSON.stringify(annotatedtext);
    • text: The text from the html document in its original form.
    • options: (optional) See defaults.

    defaults

    annotatedtext-rehype uses following default functions used throughout.

    const defaults = {
      children(node) {
        return annotatedtext.defaults.children(node);
      },
      annotatetextnode(node) {
        return annotatedtext.defaults.annotatetextnode(node);
      },
      interpretmarkup(text = "") {
        let countP = (text.match(/\<\/p>/g) || []).length;
        let countH = (text.match(/\<\/h\d+>/g) || []).length;
        let countBr = (text.match(/\<br[\s\/]*>/g) || []).length;
        let coungNl = (text.match(/\n/g) || []).length;
        return "\n".repeat(2 * countP + 2 * countH + countBr + coungNl);
      },
      rehypeoptions: {
        emitParseErrors: false,
      },
    };

    Functions can be overriden by making a copy and assigning a new function.

    children(node)

    Expected to return an array of child nodes.

    annotatetextnode(node)

    Expected to return a structure for a text ast node with at least the following:

    • text is the natural language text from the node, devoid of all markup.
    • offset contains offsets used to extract markup text from the original document.
      • start is the offset start of the text
      • end is the offset end of the text
    {
      "text": "A snippet of the natural language text from the document.",
      "offset": {
        "start": 1,
        "end": 57
      }
    }

    If the node is not a text node, it must return null;

    interpretmarkup(node)

    Used to make sure LanguageTool knows when markup represents some form of whitespace.

    License

    MIT

    Install

    npm i annotatedtext-rehype

    DownloadsWeekly Downloads

    206

    Version

    1.0.5

    License

    MIT

    Unpacked Size

    9.03 kB

    Total Files

    6

    Last publish

    Collaborators

    • davidlday