ocean-books

1.0.0 • Public • Published

Text to Markdown Converter

script to convert text files to Ocean MD

Requirements

Node.js must run on your machine.

Usage

There are several ways to use this script. Commands are listed in linux format but should work with windows equiavlents unless otherwise noted.

  • run node src/oceanconvert.js
  • chmod the src/oceanconvert.js file to be executable, then ./src/oceanconvert.js
  • run node src/oceanconvert.js -a, then run oceanconvert (only on systems with /usr/local/bin in the $PATH)

To convert a file, run oceanconvert [options] [file]. To convert all files in a folder, you can run oceanconvert [options] *, or for only text files, oceanconvert [options] *.txt. If you are doing this, you'll probably want the -o or -p "path" options.

Some other possible commands:

  • find . -type f ! -name '.*' -exec oceanconvert -oer {} \;

Ocean Markdown Metadata

Ocean Markdown files use basic YAML Front Matter (YFM) for holding metadata about each file. In order to be valid YFM, for Ocean Markdown, the following conditions must hold:

  • YFM is preceeded and followed by lines containing only ---
  • YFM occurs only at the very beginning of the file, i.e. --- is the very first line
  • YFM does not contain blank lines
  • Single values consist of a field name and value separated by a colon and space, e.g.: field: value
  • Multiple values consist of a field name, a colon, and then an indented list of values

Here is an example:

---
singleItem: This is a string.
multipleItem:
  - first
  - second
number: 0
boolean: true
---

The following fields are recognized in all Ocean Markdown files:

Field Type Req Description
id string * a unique string that identifies the document (auto-generated)
access enum yes either "research", or "encumbered" for documents that cannot be scrolled.
author str/array yes the author of the document
language string yes the two-character language code of the document
priority int, 5-10 yes how important it is (1 = most, 10 = least)
title string yes the title of the document, in the language of the document
titleShort string * a short title that won't break mobile design (required for long titles - TODO: define)
ocnmd_version number yes the version number for the ocean markdown spec used in the file, currently 1
sourceUrl string * a link to the content, for display in search results (required for scraped content)
wordsCount int * word count of the document (auto-generated)

Extended info:

category enum the religion to which the content relates (@TODO: get category names)
coverUrl string url linking to the representative image
documentType enum a document type (@TODO: define document types)
editor str/array who edited the document
needsEditing boolean if the text quality is bad, e.g. from OCR, mark this as true
publicationName string the name of the publication in which this document appeared
publicationEdition string the edition of a book
year int the year that the document was written

Primary texts and authors:

authorAbrv string abbreviated author name, only for central figures
titleAbrv string title abbreviation, e.g. GWB for Gleanings from the Writings of Baha'u'llah

Collection information:

collectionTitle string * the title for the collection (required for items in a collection)
collectionId string * a unique id for the collection, comprising the collectionTitle lowercased and dashed
collectionCoverUrl string url linking to the image for the collection

Language info:

titleEn string * the title of the work, in English (required for books in other languages)
originalLang string the original language from which a translation was made
searchLang array an array of language codes to search for the document
translationRef string a string that is consistent across translations of a single document
translator str/array who translated the document

Audio:

audio boolean whether the item has audio
audioUrl str/array url(s) linking to the audio file(s)
narrator str/array the narrator for the audio file

Conversion info:

_conversionOpts object * the settings used when converting the document (see oceanconvert.js)
_convertedFrom string * the file path or url from which the document was converted (see oceanconvert.js)

Below is a basic YFM template for new files that are being created by hand. It must go at the very beginning of the file.

---
author: 
title: 
language: en
sourceUrl: 
publicationName: 
year: 
translator: 
---

The full list of fields is as follows:

---
author: 
title: 
titleShort: 
access: 
language: en
priority: 9
ocnmd_version: 1
sourceUrl: 
category: 
coverUrl: 
documentType: 
editor: 
publicationName: 
publicationEdition: 
year: 
authorAbrv: 
titleAbrv: 
collectionTitle: 
collectionId: 
collectionCoverUrl: 
titleEn: 
originalLang: 
searchLang: 
translationRef: 
translator: 
audio: 
audioUrl: 
narrator: 
---

Ocean Markdown Cheat Sheet

Basic elements Display
Headers
# Header 1

Header 1

## Header 2

Header 2

### Header 3

Header 3

#### Header 4

Header 4

##### Header 5
Header 5
###### Header 6
Header 6
Emphasis
_italic text_ italic
**bold text** bold text
~~strikethough~~ strikethrough
Links
[Link text](https://example.com) Link text
Blockquotes
> Blockquote text
Blockquote text
>> Nested blockquote text
nested blockquote
Text with block attribute.{.blockquote}
Text with block attribute.
Horizontal Rules
*** or --- or ___
-------- or ========
Lists
* Bulleted item
- Bulleted item
+ Bulleted item
  • Bulleted item
  • Bulleted item
  • Bulleted item
1. Numbered item
2. Numbered item
  1. Numbered item
  2. Numbered item
1. Numbered item
··* nested item
··* nested item
2. Numbered item
  1. Numbered item
    • nested item
    • nested item
  2. Numbered item
Fixed Width Text
```Fixed width``` Fixed width
```
Fixed width
```
Fixed width
Tables
| Col 1   | Col 2        | Col 3   |
| ------- | :----------: | ------: |
| left    |   centered   |   right |
(a table)
Footnotes
Footnote references[^1] in sentence.[^2]
[^1]: Footnote text
Page numbers
[pg 1] [pg 1]
Block attributes
Block attributes follow the markdown-it-attrs rules. {.class #id id="" or attr=""}
This paragraph will
have a dropcap.{.dropcap}

This paragraph will have a dropcap.

This paragraph will
be centered.{.center}
This paragraph will be centered.
This paragraph will
be right aligned.{.right}

This paragraph will be right aligned.

This is

   a verse

     of poetry{.verse}
This is

   a verse

     of poetry.
This is
a list
with linebreaks.{.list}

This is some
a list
with linebreaks.

This is
a numbered block with dropcap. {#1.5}

1.5 This is a numbered paragraph.

This is
a numbered block as an atrribute with dropcap. {id="1.5" .dropcap}

1.5 This is a numbered paragraph.

Other classes:

  • .ed: editor
  • .sig: signature line, e.g. on letters
  • .sit: exhortation or sitilcent
  • .noid: no paragraph number

Readme

Keywords

none

Package Sidebar

Install

npm i ocean-books

Weekly Downloads

0

Version

1.0.0

License

ISC

Unpacked Size

70 kB

Total Files

14

Last publish

Collaborators

  • chadananda