node package manager
Easy collaboration. Discover, share, and reuse code in your team. Create a free org »

twitter2pg

twitter2pg

Richard Wen
rrwen.dev@gmail.com

Module for extracting Twitter data to PostgreSQL databases.

npm version Build Status npm GitHub license Twitter

Install

  1. Install PostgreSQL
  2. Install Node.js
  3. Install twitter2pg via npm
npm install --save twitter2pg

For the latest developer version, see Developer Install.

Usage

The usage examples show how to get Twitter data into a PostgreSQL table named twitter_data with a tweets jsonb column:

row tweets
1 {...}
2 {...}
3 {...}
... ...

Create an appropriate PostgreSQL table with psql before running the usage examples:

  • -h: host address
  • -p: port number
  • -d: database name
  • -U: user name with table creation permissions
  • -c: PostgreSQL query
psql -h localhost -p 5432 -d postgres -U postgres -c "CREATE TABLE twitter_data(tweets jsonb);"

REST API

  1. Search for tweets with keyword twitter using a GET request
  2. Filter tweets with jsonata to only return the array inside statuses
  3. Insert the filtered tweets into a PostgreSQL table named twitter_data
  4. Each row of the tweets column in the twitter_data table contains one tweet
var twitter2pg = require('twitter2pg');
 
options = {
    pg: {},
    twitter: {},
    jsonata: 'statuses' // filter tweets for statuses array only
};
 
// (options_twitter) Twitter API options
options.twitter = {
    method: 'get', // get, post, delete, or stream
    path: 'search/tweets', // api path
    params: {q: 'twitter'} // query tweets
};
 
// (options_twitter_connection) Twitter API connection keys
options.twitter.connection =  {
    consumer_key: '***', // default: process.env.TWITTER_CONSUMER_KEY
    consumer_secret: '***', // default: process.env.TWITTER_CONSUMER_SECRET
    access_token_key: '***', // default: process.env.TWITTER_ACCESS_TOKEN_KEY
    access_token_secret: '***' // default: process.env.TWITTER_ACCESS_TOKEN_SECRET
};
 
// (options_pg) PostgreSQL options
// In query, $1 are the JSON tweets
options.pg = {
    table: 'twitter_data',
    column: 'tweets',
    query: 'INSERT INTO $options.pg.table($options.pg.column) SELECT * FROM json_array_elements($1);'
};
 
// (options_pg_connection) PostgreSQL connection details
options.pg.connection = {
    host: 'localhost', // default: process.env.PGHOST
    port: 5432, // default: process.env.PGPORT
    database: 'postgres', // default: process.env.PGDATABASE
    user: 'postgres', // default: process.env.PGUSER
    password: '***' // default: process.env.PGPASSWORD
};
 
// (twitter2pg_rest) Query tweets using REST API into PostgreSQL table
twitter2pg(options).catch(err => {
    console.error(err.message);
});

Stream API

  1. Stream tweets to track keyword twitter
  2. When a tweet is available, insert the tweet into a PostgreSQL table named twitter_data
  3. Each tweet is inserted as one row in the tweets column of the twitter_data table
var twitter2pg = require('twitter2pg');
 
options = {};
 
// (options_twitter) Twitter API options
options.twitter = {
    method: 'stream', // get, post, delete, or stream
    path: 'statuses/filter',// api path
    params: {track: 'twitter'} // track tweets
};
 
// (options_twitter_connection) Twitter API connection keys
options.twitter.connection =  {
    consumer_key: '***', // default: process.env.TWITTER_CONSUMER_KEY
    consumer_secret: '***', // default: process.env.TWITTER_CONSUMER_SECRET
    access_token_key: '***', // default: process.env.TWITTER_ACCESS_TOKEN_KEY
    access_token_secret: '***' // default: process.env.TWITTER_ACCESS_TOKEN_SECRET
};
 
// (options_pg) PostgreSQL options
// In query, $1 are the JSON tweets
options.pg = {
    table: 'twitter_data',
    column: 'tweets',
    query: 'INSERT INTO $options.pg.table($options.pg.column) VALUES($1);'
};
 
// (options_pg_connection) PostgreSQL connection details
options.pg.connection = {
    host: 'localhost', // default: process.env.PGHOST
    port: 5432, // default: process.env.PGPORT
    database: 'postgres', // default: process.env.PGDATABASE
    user: 'postgres', // default: process.env.PGUSER
    password: '***' // default: process.env.PGPASSWORD
};
 
// (twitter2pg_stream) Stream tweets into PostgreSQL table
var stream = twitter2pg(options);
stream.on('error', function(error) {
    console.error(error.message);
});

See Documentation for more details.

Contributions

Report Contributions

Reports for issues and suggestions can be made using the issue submission interface.

When possible, ensure that your submission is:

  • Descriptive: has informative title, explanations, and screenshots
  • Specific: has details of environment (such as operating system and hardware) and software used
  • Reproducible: has steps, code, and examples to reproduce the issue

Code Contributions

Code contributions are submitted via pull requests:

  1. Ensure that you pass the Tests
  2. Create a new pull request
  3. Provide an explanation of the changes

A template of the code contribution explanation is provided below:

## Purpose

The purpose can mention goals that include fixes to bugs, addition of features, and other improvements, etc.

## Description

The description is a short summary of the changes made such as improved speeds or features, and implementation details.

## Changes

The changes are a list of general edits made to the files and their respective components.
* `file_path1`:
    * `function_module_etc`: changed loop to map
    * `function_module_etc`: changed variable value
* `file_path2`:
    * `function_module_etc`: changed loop to map
    * `function_module_etc`: changed variable value

## Notes

The notes provide any additional text that do not fit into the above sections.

For more information, see Developer Install and Implementation.

Developer Notes

Developer Install

Install the latest developer version with npm from github:

npm install git+https://github.com/rrwen/twitter2pg

Install from git cloned source:

  1. Ensure git is installed
  2. Clone into current path
  3. Install via npm
git clone https://github.com/rrwen/twitter2pg
cd twitter2pg
npm install

Tests

  1. Clone into current path git clone https://github.com/rrwen/twitter2pg
  2. Enter into folder cd twitter2pg
  3. Ensure devDependencies are installed and available
  4. Run tests with a .env file (see tests/README.md)
  5. Results are saved to tests/log with each file corresponding to a version tested
npm install
npm test
npm test_rest
npm test_stream

Documentation

Use documentationjs to generate html documentation in the docs folder:

npm run docs

See JSDoc style for formatting syntax.

Upload to Github

  1. Ensure git is installed
  2. Inside the twitter2pg folder, add all files and commit changes
  3. Push to github
git add .
git commit -a -m "Generic update"
git push

Upload to npm

  1. Update the version in package.json
  2. Run tests and check for OK status (see tests/README.md)
  3. Generate documentation
  4. Login to npm
  5. Publish to npm
npm test
npm test_rest
npm test_stream
npm run docs
npm login
npm publish

Implementation

The module twitter2pg uses the following npm packages for its implementation:

npm Purpose
twitter Connections to the Twitter API REST and Streaming Application Programming Interfaces (APIs)
jsonata Query language to filter Twitter JSON data before inserting into PostgreSQL
pg Insert Twitter data Connect to PostgreSQL tables
twitter   <-- Extract Twitter data from API
    |
jsonata   <-- Filter Twitter JSON data
    |
   pg     <-- Insert filtered Twitter data into PostgreSQL table