wikipedia-edit-stream

0.1.0 • Public • Published

wikipedia-edit-stream

Listen for page edit notifications from Wikipedia IRC and push them into a MongoDB collection to use as a test dataset.

All changes to WikiMedia Foundation installations of MediaWiki publish a message per change event to a corresponding IRC channel given a language and project name, e.g. en.wikipedia, fr.wikipedia, en.wikibooks, etc:

To test various pieces of the MongoDB ecosystem, we need datasets of all shapes and sizes, which makes the freely available, high volume change data from WikiMedia extremely useful as we can deploy new releases and configurations of MongoDB and start putting it under the real-world pressures instead of synthetic micro-benchmarks or machine generated datasets.

Configuration

The following customizations are available by setting environment variables.

MONGODB_URL MongoDB deployment to persist changes to, e.g. mongodb://username:password@hostname:port/db.

MONGODB_COLLECTION Collection to populate [Default: edits].

LANGUAGE Two letter language code of the WikiMedia project [Default: en].

PROJECT The WikiMedia project id to listen to [Default: wikipedia].

Deploy Your Own

CLI

npm i -g wikipedia-edit-stream mongodb-runner cross-env;
mongodb-runner start --name=wikipedia --port=27018;
cross-env MONGODB_URL=mongodb://localhost:27018/wikipedia wikipedia-edit-stream;

License

Apache 2.0

Readme

Keywords

none

Package Sidebar

Install

npm i wikipedia-edit-stream

Weekly Downloads

1

Version

0.1.0

License

Apache-2.0

Last publish

Collaborators

  • imlucas