mw-ocg-texter

0.3.3 • Public • Published

mw-ocg-texter

NPM

Build Status dependency status dev dependency status

Converts mediawiki collection bundles (as generated by mw-ocg-bundler) to stripped plaintext.

This is a proof-of-concept, but it could be used to archive or embed the textual content of wikipedia in a minimal amount of space.

Installation

Node version 0.8 and 0.10 are tested to work.

Install the node package dependencies.

npm install

Install other system dependencies.

apt-get install unzip

Generating bundles

You may wish to install the mw-ocg-bundler npm package to create bundles from wikipedia articles. The below text assumes that you have done so; ignore the mw-ocg-bundler references if you have bundles from some other source.

Running

To generate a plaintext file named out.txt from the en.wikipedia.org article "United States":

$SOMEPATH/bin/mw-ocg-bundler -v -o us.zip -h en.wikipedia.org "United States"
bin/mw-ocg-texter -o out.txt us.zip

In the above command $SOMEPATH is the place you installed mw-ocg-bundler; if you've used the directory structure recommended by mw-ocg-service this will be ../mw-ocg-bundler.

The default format does 80-column word wrap. If you would like to use "semantic" new lines (that is, newlines end paragraphs and there are no newlines within paragraphs) use the --no-wrap option:

bin/mw-ocg-texter --no-wrap -o out.txt us.zip

For other options, see:

bin/mw-ocg-texter --help

Standalone mode

To convert a single article without the bundle creation step, use:

bin/mw-ocg-texter -h en.wikipedia.org -t "United States"

The -h option specifies the hostname of the wiki, and the -t option gives the title to convert. The content will be fetched from the Wikimedia REST API and converted, with output to standard out (unless the -o option is given).

Other ideas

This backend should implement the Unicode Nearly Plain-Text Encoding of Mathematics to render math content.

Related Projects

License

GPLv2

(c) 2013-2014 by C. Scott Ananian

Readme

Keywords

Package Sidebar

Install

npm i mw-ocg-texter

Weekly Downloads

0

Version

0.3.3

License

GPL-2.0

Last publish

Collaborators

  • cscott