wikipedia-dump-stream-parser

1.0.4 • Public • Published

wikipedia-dump-stream-parser

Pure fast javascript only solution to parse wikipedia xml dump.

import {createWikiPageStream} from "wikipedia-dump-stream-parser"

createWikiPageStream(process.stdin).on('data', (page)=>{
    // page.id, page.title, page.content 
}).on('end', ()=>{
    process.exit(0)
});

// skip first 100 pages
createWikiPageStream(process.stdin, 100).on('data', (page)=>{
   // page.id, page.title, page.content 
}).on('end', ()=>{
   process.exit(0)
});

Package Sidebar

Install

npm i wikipedia-dump-stream-parser

Weekly Downloads

1

Version

1.0.4

License

ISC

Last publish

Collaborators

  • mhzed