Have ideas to improve npm?Join in the discussion! »

    wiki-infobox

    0.4.0 • Public • Published

    WikiInfobox by @michalbe

    Simple Wikipedia infobox scraper.

    What?

    WikiInfobox is a simple Wikipedia infobox scraper. What is infobox? According to Wikipedia itself, an infobox template is a panel, usually in the top right of an article, next to the lead section, (in the desktop view) or at the very top of an article (in mobile view), that summarizes key features of the page's subject. Infoboxes may also include an image, and/ or a map.

    Infobox

    Why?

    • Question: Why do we need WikiInfobox library? Wikipedia has it's own API!
    • Answer: Of course, but it answers only with full content of the current page, with all the Wiki-specific markdown code, without any formatting. It's really painful to get useful information out of this. Wiki-infobox parses Wikipedia API's response, and serve magnificent JSON object!

    How to use:

    npm install wiki-infobox
    

    then:

    var infobox = require('wiki-infobox');
     
    var page = 'Warsaw Metro';
    var language = 'en';
     
    infobox(page, language, function(err, data){
      if (err) {
        // Oh no! Something goes wrong!
        return;
      }
     
      console.log(data);
      // {
      //   box_length: '275px',
      //   name:
      //     {
      //       type: 'text',
      //       value: 'Warsaw Metro<br>\'\'Metro Warszawskie\'\''
      //     },
      //   owner:
      //    {
      //      type: 'text',
      //      value: 'City of Warsaw'
      //    },
      //   locale:
      //    [ { type: 'link',
      //        text: 'Warsaw',
      //        url: 'http://en.wikipedia.org/wiki/Warsaw' },
      //      { type: 'link',
      //        text: 'Poland',
      //        url: 'http://en.wikipedia.org/wiki/Poland' } ],
      //   transit_type:
      //    { type: 'link',
      //      text: 'Rapid transit',
      //      url: 'http://en.wikipedia.org/wiki/Rapid transit' },
      //   lines: '1<ref name',
      //   stations: '21<ref name',
      //   ridership: '568,000 <small>(2012; ave. weekday)</small><ref name',
      //   annual_ridership: '139.17 million <small>(2012)</small><ref name',
      //   website: '{{url|www.metro.waw.pl|Metro Warszawskie}}',
      //   began_operation: '1995',
      //   operator: 'Metro Warszawskie',
      //   marks: '',
      //   vehicles: '',
      //   system_length: '{{convert|22.7|km|mi|1|abbr',
      //   track_gauge:
      //    { type: 'link',
      //      text: 'standard gauge',
      //      url: 'http://en.wikipedia.org/wiki/standard gauge' },
      //   map:
      //    { type: 'image',
      //      text: 'frameless',
      //      url: 'http://en.wikipedia.org/wiki/File:Metro w Warszawie 1 linia.svg' },
      //   map_name: '',
      //   map_state: '}}'
      // }
    });

    What's new?

    • 23 Nov 2014 v.0.4.0

      • Return error when page is not proper Wikipedia page with data
      • When wikipage we are asking for is a redirect page then return results from the final page
      • Support proper character encoding in the page title (so pages with special characters work without any problems now)
    • 18 Nov 2014 v.0.3.1

      • Support of multiple types of data in the same field, like text & links & images, etc.
      • Simple text is now returned as an object with two fields, type equal to text and value equal to the value fo the text node

    To Do

    Support of:

    • external links (like {{url|www.metro.waw.pl|Metro Warszawskie}})
    • templates (like {{flag|Poland}})
    • comments
    • somehow tidy HTML code inside fields
    • expressions (like {{ 3434 + 19817934 + 213123 }})
    • a lot of different things

    Contributing

    Interested in contributing to wiki-infobox? Read this first

    Keywords

    none

    Install

    npm i wiki-infobox

    DownloadsWeekly Downloads

    30

    Version

    0.4.0

    License

    none

    Last publish

    Collaborators

    • avatar