Converts mediawiki collection bundles (as generated by mw-ocg-bundler) to plaintext
Converts mediawiki collection bundles (as generated by mw-ocg-bundler) to stripped plaintext.
This is a proof-of-concept, but it could be used to archive or embed the textual content of wikipedia in a minimal amount of space.
Node version 0.8 and 0.10 are tested to work.
Install the node package dependencies.
Install other system dependencies.
apt-get install unzip
You may wish to install the mw-ocg-bundler npm package to create bundles
from wikipedia articles. The below text assumes that you have done
so; ignore the
mw-ocg-bundler references if you have bundles from
some other source.
To generate a plaintext file named
out.txt from the English
enwiki) wikipedia article "United States":
mw-ocg-bundler -o us.zip --prefix enwiki "United States"bin/mw-ocg-texter -o out.txt us.zip
The default format does 80-column word wrap. If you would like to
use "semantic" new lines (that is, newlines end paragraphs and there
are no newlines within paragraphs) use the
bin/mw-ocg-texter --no-wrap -o out.txt us.zip
For other options, see:
This backend should implement the Unicode Nearly Plain-Text Encoding of Mathematics to render math content.
(c) 2013-2014 by C. Scott Ananian