Need Package Maintenance


    0.3.4 • Public • Published

    twitter-harvest NPM version Build Status Dependency Status Coverage percentage

    A simple continuous harvester for twitter

    This application is able to capture tweets which happen around the world. Currently it works only with the Twitter stream API 1.1.

    • You have to define or modify the cfg/cfg.json and create at least one capture agent in cfg/agents/ directory (enable to true).
    • You can activate mail alert from a SMTP account like gmail (see Private configuration and the mail_alert flag in main configuration)
    • If fs_out is true (default), the captured tweets are written to the file system with the following convention:
    • If todo_out is true (should be false by default), a kind of queue is created (directory 'data/TODO') where filenames to consume by an external process. This allow to write the tweets to any db
      • Note, that the number of files by directory is limited (depend of the OS), the filenames need to be consumed by the external process regularly to avoid issues





    $ npm install --save twitter-harvest


    node twitter-harvest.js

    Usage with forever

    $ npm install -g forever
    $ forever start twitter-harvest.js

    With forever it is possible to run the task 'forever'. And leave your session.

    Main configuration

      "agents_dir"    : "cfg/agents/",
      "data_dir"      : "./data/",
      "private_cfg"   : "./cfg/cfg-private.json",
      "mail_alert"    : false,
      "fs_out"        : true,
      "std_out"       : true,
      "todo_out"      : true  
    • agents_dir: path where to put the agent file
    • data_dir: path where to write the tweets on the file system
    • private_cfg: file where private data is stored (such as mail credential)
    • mail_alert: if true enable mail alerting in case of failure
    • fs_out: if true write the twitter data on the file system
    • std_out: if true write the twitter data on the console
    • todo_out: if true write the json filename in the 'data/TODO' dir (to be consumed by an other process to BD (mysql, ...)

    Agents configuration

    put all the agent definition files to the agent directory (one file per agent).

    $ cat cfg/agents/*.json
      "type_doc"            : "twitter",
      "enable"              : true,
      "type_filter"         : "track",
      "type_api"            : "stream",
      "name"                : "keywords-geneva",
      "filter"              : {
        "track"             : "genève,geneva,genebra,genevra,genf"
      "stream"              : "filter",
      "consumer_key"        : "...",
      "consumer_secret"     : "...",
      "access_token_key"    : "...",
      "access_token_secret" : "..."  

    to capture all the tweets where there is a mention of geneva word for several languages.

      "type_doc"            : "twitter",
      "enable"              : true,
      "type_filter"         : "locations",
      "type_api"            : "stream",
      "name"                : "location-geneva",
      "filter"              : {
        "locations"  : "5.77,45.85,7.15,46.80"
      "stream"              : "filter",
      "consumer_key"        : "...",
      "consumer_secret"     : "...",
      "access_token_key"    : "...",
      "access_token_secret" : "..."

    to capture all the tweets which are posted around Geneva area (Switzerland).

    • type_doc : 'twitter'
    • enable : if true this agent is launched
    • type_filter : locations | filter | follow
    • stream : filter | firehose (if you have the chance)
    • consumer_key, consumer_secret, access_token_key, access_token_secret : personal keys given by twitter for using their APIs

    more API twitter doc

    Private configuration

      "mail_service"    : "gmail",
      "mail_auth_user"  : "username",
      "mail_auth_path"  : "password",
      "mail_from"       : "alert_twitter_harvest",
      "mail_to"         : ""
    • mail_service : name of the mail service
    • mail_auth_user : username credential of the mail service
    • mail_auth_path : password credential of the mail service
    • mail_from : who will send the mail
    • mail_to : who want to be alerted

    One mail is also sent when the system is started, you should received this mail on your mail box if all well configured.

    note : supported mail system is given by nodemailer node module (here is the supported service, but only gmail was tested for gmail, it is possible you have to decrease the security level of your mail account (so don't use a personal account) and to authorize specifically the application by using this url:


    $ gulp


    Note that currently, we have 3 errors messages when twitter-harvest is launched. This is not important. Here are theses Error messages

    { [Error: Cannot find module './build/Release/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }
    { [Error: Cannot find module './build/default/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }
    { [Error: Cannot find module './build/Debug/DTraceProviderBindings'] code: 'MODULE_NOT_FOUND' }

    To do

    • add more tests
    • add extra option to add extra info in the output(from agents)
    • add other api interface (not only the streaming API)


    MIT © Arnaud Gaudinat

    Change log

    • 0.3.4:
      • chat the node twitter lib with Twit (for better handling of error)
    • 0.3.3:
      • add the TODO option and directory to allow writing in DB
      • add 2 digits on filenames and JSON extension
    • 0.3.2:
      • add JSONschema validation


    npm i twitter-harvest

    DownloadsWeekly Downloads






    Last publish


    • gaudinat