gnip-reader
A simple node package to read Gnip records from the Gnip Search API.
Features
- Get multiple pages of Gnip query results with a single call.
- Get an estimated count for a Gnip query.
- Output is raw, untouched, Gnip JSON.
- Abort a multi-page query at any point.
- Manual paging is also supported.
If you want to put geolocated Gnip records on a map, feed the output to esri-gnip.
Requirements
Usage
The module provides a single class. Initialize an instance of the class using your Gnip credentials and some Search API stream info, then execute queries and estimates.
Installing
$ npm install gnip-reader
Creating a Reader
Initialize a GnipReader
object with username
, password
, account name
, and stream name
.
For example, if your account name is FooInc
and your stream name is test
, your query URL will be https://search.gnip.com/accounts/FooInc/search/test.json
. Create your GnipReader
like this:
var GnipReader = require('gnip-reader');var myReader = new GnipReader('foo@fooinc.com', 'apassword', 'FooInc', 'test');
Querying Gnip
Call .fullSearch(query, recordLimit, pageCallback(gnipRecords, pageNumber), finalCallback(err, allRecords))
to retrieve records page-by-page.
query
can either be a simple string as specified here or an object providing parameters as specified here.recordLimit
is the maximum total number of records you want to retrieve matching the query. Passnull
to keep paging until the query results are exhausted (or untilpageCallback()
returnsfalse
- see below). Use this to avoid burning through your Gnip allowance (tweets and requests). Note: this is different to the Gnip parameter ofmaxResults
which affects how Gnip pages query results.pageCallback()
should returntrue
to get more records, orfalse
to stop paging.gnipRecords
is an array of raw Gnip records for this page.pageNumber
starts at 1.
finalCallback()
when retrieval has completed.allRecords
will contain all the records retrieved up to that point. Unless an error caused the retrieval to complete,err
will benull
.
For example, the following will return all mentions of 'esri' in the past 30 days:
myReader.fullSearch('esri', null, function(data, pageNum) {console.log(data.length + ' records in page ' + pageNum);return true;}, function(err, allData) {if (!err) {console.log('Got ' + allData.length + ' records!');} else {console.error(err);}});
.fullSearch()
will optimize page sizes according to published Gnip ranges (currently 10...500 records per page) and any value passed into recordLimit
. It will attempt to minimize the number of requests made, and if possible minimize tweets requested.
.fullSearch()
will also detect duplicate tweets (by id) across pages. If you use .search()
and .next()
manually (see below), duplicates are not detected automatically.
Getting an Estimate
It's often best to get an estimate of the amount of data a query might return (since Gnip charge for the tweets AND the requests needed to get those tweets). An estimate returns an approximate number of tweets, and will tell you how they're broken down by day, hour, or minute (by default, hour
is used).
Call .estimate(query, callback(err, gnipEstimates))
to retrieve time-partitioned counts for the query.
query
can either be a simple string as specified here or an object providing parameters as specified here. Use an object to also specifyfromDate
ortoDate
, or that you want results partitioned bybucket
size ofday
orminute
.callback()
should be a function that accepts 2 parameters:err
: An error object (or null). See below.gnipEstimates
: An array of Gnip Estimates as described here (or null iferr
is not null).
myReader.estimate('esri', function(err, gnipEstimates) {if (err) {console.error(err);} else {console.log('Got ' + gnipEstimates.length + ' Gnip Estimates.');}});
Querying Gnip with Manual Paging
It is recommended that you use .fullSearch()
, but if for some reason .fullSearch()
doesn't cut it for you, you can use .search()
and .next()
to page through results manually. Here's how:
Initial query
Call .search(query, callback(err, gnipRecords, moreRecordsAvailable))
to retrieve records for a given query.
query
can either be a simple string as specified here or an object providing parameters as specified here.callback()
should be a function that accepts 3 parameters:err
: An error object (or null). See below.gnipRecords
: An array of Gnip records as described here (or null iferr
is not null).moreRecordsAvailable
:true
if there are more records to retrieve for this query.false
otherwise.
For example:
myReader.search('esri', function(err, gnipRecords, moreRecords) {if (err) {console.error(err);} else {console.log('Got ' + gnipRecords.length + ' Gnip Records. There are ' +(moreRecords?'':'no ') + 'more records.');}});
Get subsequent pages
If moreRecordsAvailable
is true
, you can get pages of subsequent records with .next()
, which takes the same parameters as .search()
.
Note: As specified by Gnip, the query
should not be modified between calls to .search()
and .next()
or between calls to .next()
and .next()
.
myReader.search('esri', function(err, gnipRecords, moreRecords) {if (!err) {console.log('Got ' + gnipRecords.length + ' Gnip Records. There are ' +(moreRecords?'':'no ') + 'more records.');if (moreRecords) {var count = 2,totalRecords = gnipRecords;var getMoreRecords = function() {myReader.next('esri', function(err, gnipRecords, moreRecords) {if (!err) {console.log('Page ' + count++ + ': ' + gnipRecords.length + ' more records');totalRecords = totalRecords.concat(gnipRecords);if (moreRecords) {getMoreRecords();} else {console.log('Got a total of ' + totalRecords.length + ' records!');}} else {console.error('Error geting page ' + --count + '. Aborting: ' + err);}});};// Kick off loading additional pages…getMoreRecords();}} else {console.error(err);}});
Or, using Async.js…
var async = require('async');myReader.search('esri', function(err, gnipRecords, moreRecords) {if (err) {console.error(err);} else {console.log('Got ' + gnipRecords.length + ' Gnip Records. There are ' +(moreRecords?'':'no ') + 'more records.');if (moreRecords) {var count = 2,totalRecords = gnipRecords;async.whilst(function() { return moreRecords; },function(callback) {myReader.next('esri', function(err, gnipRecords, getAnotherPage) {if (!err) {console.log('Page ' + count++ + ': ' + gnipRecords.length +' more records');totalRecords = totalRecords.concat(gnipRecords);moreRecords = getAnotherPage;}callback(err);});},function(err) {if (err) {console.error('Error geting page ' + --count + '. Aborting: ' + err);} else {console.log('Got a total of ' + totalRecords.length + ' records!');}});}}});
Query format
A query
parameter to .fullSearch()
, .estimate()
, .search()
and .next()
can be a single-line string as described here, or an object with parameters as described in the Gnip documenation here (or here for estimates).
Note that the Gnip fromDate
and toDate
parameters can be one of the following:
- A JavaScript
Date
object. - A moment-js
moment
. - A string in the Gnip date format 'YYYYMMDDHHmm'.
- A string that can be parsed by moment-js to a valid date.
In the case of estimates, the Gnip bucket
parameter can be one of the following:
day
hour
(the default)minute
Here is a sample complex query that searches for 'esri' today, and returns pages of 10 records at a time:
var moment = require('moment');var startTime = moment().startOf('day');var endTime = new Date();var complexQuery = {query: 'esri',maxResults: 10,fromDate: startTime,toDate: endTime};
Note: When calling .fullSearch()
, the value of maxResults
may be overriden to reduce the number of requests made against the Gnip Search API. When calling .search()
, the provided value of maxResults
is honored.
Error format
Errors are an object of the following structure:
{statusCode: <int>,url: <searchStreamUrl>,parameters: <stringifiedJSON>,error: { // As returned from Gnipmessage: <ErrorString>,sent: <UTCTimeString>}}
For example:
{statusCode: 422,url: 'https://search.gnip.com/accounts/FooInc/search/test.json',parameters: '{"query":"esri","maxResults":10,"fromDate":"201408190000","toDate":"201408200000","publisher":"twitter"}',error: {message: 'Could not accept your search request: Invalid date for query parameter \'toDate\'. Can\'t ask for activities from the future.\n',sent: '2014-08-19T22:22:49+00:00'}}
Known Limitations
- If
fromDate
andtoDate
are strings and fail to parse, an exception is thrown. - In edge-cases, while paging manually with
.search()
and.next()
, Gnip can return duplicate records across pages.gnip-reader
does not detect these duplicates. Duplicates are detected (by id) and removed when using.fullSearch()
.
Resources
- node.js documentation
- Gnip Documentation (incomplete - not all the attributes that are returned are described in this documentation).
Dependencies
gnip-reader makes use of the following amazing packages:
Issues
Find a bug or want to request a new feature? Please let us know by submitting an Issue.
Contributing
Anyone and everyone is welcome to contribute.
Licensing
Copyright 2014 Esri
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
A copy of the license is available in the repository's license.txt file.