rp-paragraph-splitter

1.0.0 • Public • Published

RP Paragraph Splitter

This is a component of the website for my IRC RP's logs that takes a huge blob of text and attempts to cut it into paragraphs that make sense. It does not handle the rejoining of cut IRC messages, as that would couple it with my logs website's code.

It has no dependencies, for the sentence tokenizers available on npm did not handle all the odd formattings you'd find in the clash of writing styles that we call RP. The biggest thing is eclipses and . not always ending sentences when they're inside quotation marks.

The input is expected to all be from one perspective dialogue-wise, though it will split upon meeting a character name as the first word. This requires it be hooked into a character getter (see settings under reference).

The tag parameter is for anything you want to associate all posts with for the next step of your code. For example, I use the character name on my logs website to create separators when the character changes.

I put it here in the hopes that someone else might find it interesting or useful, and it's released under the permissive ISC license.

Goal

The goal is to improve the reading experience for people reading up on RP logs on my site. It doesn't have to be perfect, just good enough.

Rules

The paragraph can be split if either of the below rules are true. The numbers can be tuned with the settings object. The current sentence will make up the "topic" sentence of the next paragraph.

  1. Length is past 45 words, the current sentence is 7 words long, the first dialogue has been done, and it's not in the middle of dialogue.
  2. The last sentence is one complete quotation, and the current is 7 words long.
  3. The first word is a character name.

Example

var rpps = require('rp-paragraph-splitter');
 
let text = `Paste a huge block of text here.`;
let tag = 'Character Name';
let paragraphs = rpps.Paragraph.split(text, tag);
 
for(let i = 0; i < paragraphs.length; ++i) {
  let paragraph = paragraphs[i];
 
  console.log(`${paragraph.toString()}\n`);
}

Reference

All the objects below are children of the main module object.

Sentence

The sentence tokenizer. You don't have to touch this to use it, but here it is anyway.

Properties

  • string text: The entire sentence text.
  • char first: the first character
  • char last: The last character.
  • string firstWord: The first word.
  • string lastWord: The last word.
  • int quoteCoount: The number of quotation marks.
  • int length: The number of words.
  • bool dialogue: The sentence opened or closed dialogue; i.e. had an odd number of quotation marks.
  • bool fullDialogue: The sentence started and ended on a '"'.

Functions

  • Sentence(text): Creates a sentence with the text.
  • string .toString(): Returns the text property.
  • Sentence[] Sentence.split(text): Splits the text into multiple Sentences

Paragraph

Properties

  • Sentence[] sentences: The sentences in this paragraph.
  • string tag: Arbitrary tag.

Functions

  • Paragraph(sentences, tag): Creates a sentence with the sentences.
  • string .toString(): Displays the paragraph content as text.
  • Paragraph[] Paragraph.split(sentences, tag): Groups the Sentences together into paragraphs.
  • Paragraph[] Paragraph.split(text, tag): Splits the text into Sentencess and turns them into paragraphs.

settings

  • int paragraphLength: The minimum length for rule 1.
  • ìnt topicLength: The topic sentence length for rule 1 and 2.
  • function characterCallback(string name): Where to ask for character, the rule looks for true or any non-null object as success. The name argument is the first word in lowercase.

Package Sidebar

Install

npm i rp-paragraph-splitter

Weekly Downloads

0

Version

1.0.0

License

ISC

Last publish

Collaborators

  • gisle