Nihilist Postmodern Mistake
    Share your code. npm Orgs help your team discover, share, and reuse code. Create a free org »

    awesome-querypublic

    Awesome Query

    Simple crawler that filters URL's based on their content. Uses concurrent PhantomJS instances to grab content. Built for awesome-lists.

    Quick Setup

    sudo npm i -g awesome-query

    Then restart your terminal[1].

    To get a list of conferences in 2015:

    awesome-query 2015 ~/awesome-lists/conferences.txt -v -o stdout

    Run awesome-query with no arguments for help.

    Usage

    You can also use awesome-query inside node.

    Pass the URL's as an array.

    var conferences = [
      'http://beyondtellerrand.com',
      'http://greatbritishnodeconf.co.uk',
      'http://coldfrontconf.com',
      'http://cssconf.eu'
    ];

    Pass the options as an object.

    var options = {
      'verbose': false,
      'rejectFailures': false,
      'urlsOnly': false,
      'timeout': 20000,
      'attempts': 3,
      'parallelization': 4
    };

    The selector should be either a RegExp object

    awesomeQuery(/london/, conferences, options).then(function (data) {
      console.log(data);
    });

    Or a function, it should return either a promise (any object with a then function), or anything else (that evaluates to true or false).

    var Levenshtein = require('levenshtein');
     
    var f = function (page) {
      return page.text.match(/\w+/g).some(function (word) {
        return new Levenshtein(word, 'reactive').distance < 4;
      });
    };
     
    awesomeQuery(f, conferences, options).then(function (data) {
      console.log(data);
    });

    Sample page object:

    {
      "html": "<head><meta name='author' content='Prabdeep'></head> <body><h1>Title</h1></body>",
      "text": "Title",
      "meta": [{"name": "author", "content": "Prabdeep"}],
      "title": ""
    }

    Credit

    Building this was made significantly easier by phridge and promise-queue

    To Do / Contributing

    Feel free to submit a pull request. These things need doing:

    • Anonomously collect usage stats
    • Gracefully handle a PhantomJS crash
    • Documentation
    • Tests

    Notes

    [1] - NPM doesn't support passing arguments to commands very well. To work-around this awesome-query automatically adds an alias to bashrc.

    Reloading bashrc requires restarting your terminal. This may not work on windows-based machines (tested on Ubuntu). You can substitute the awesome-query command with node PATH/TO/AWESOME/QUERY/src/bin/run.js.

    The command will break if you move the relevant files and persist if you remove them.

    License

    Copyright (c) 2015, Ashton War me@ashtonwar.com

    ISC

    install

    npm i awesome-query

    Downloadsweekly downloads

    21

    version

    0.1.6

    license

    ISC

    repository

    githubgithub

    last publish

    collaborators

    • avatar