Hive client using the Apache Thrift RPC system
Hive client with the following main features:
- fetch rows with optional batch size
- implement Node Readable Stream API (including
- hive multiple version support
- multiple query support through the
- advanced comments parsing
npm install thrift-hive
var hive = require'thrift-hive';// Client connectionvar client = hivecreateClientversion: '0.7.1-cdh3u2'server: '127.0.0.1'port: 10000timeout: 1000;// Execute callclientexecute'use default'// Query callclientquery'show tables'on'row'console.logdatabase;on'error'console.logerrmessage;clientend;;on'end'clientend;;;
We've added a function
hive.createClient to simplify coding. However, you
are free to use the raw Thrift API. The client take an
options object as its
argument andexpose an
execute and a
default to '0.7.1-cdh3u2'
default to '127.0.0.1'
default to 10000
default to 1000 milliseconds
A reference to the thrift client returned by
A reference to the thrift connection returned by
Close the Thrift connection
Execute a query and, when done, call the provided callback with an optional error.
Execute a query and return its results as an array of arrays (rows and columns). The size argument is optional and indicate the number of row to return on each fetch.
hive = require 'thrift-hive'# Client connectionclient = hivecreateClientversion: '0.7.1-cdh3u2'server: '127.0.0.1'port: 10000timeout: 1000# Executeclientexecute 'USE default'consolelog errmessage if errclientend
client.query function implement the EventEmitter API.
The following events are emitted:
rowEmitted for each row returned by Hive. Contains a two arguments, the row as an array and the row index.
row-firstEmitted after the first row returned by Hive. Contains a two arguments, the row as an array and the row index (always 0).
row-lastEmitted after the last row returned by Hive. Contains a two arguments, the row as an array and the row index.
errorEmitted when the connection failed or when Hive return an error.
endEmitted when there are no more rows to retrieve, not called if there was an error before.
bothConvenient event combining the
endevents. Emitted when an error occured or when there are no more rows to retrieve. Return the same arguments than the
endevent depending on the operation outturn.
The following code written in CoffeeScript is an example of piping data returned by the query into a writable stream.
fs = require 'fs'hive = require 'thrift-hive'# Client connectionclient = hivecreateClientversion: '0.7.1-cdh3u2'server: '127.0.0.1'port: 10000timeout: 1000# Execute queryclientquery'show tables'on 'row'thisemit 'data''Found ' + database + '\n'on 'error'clientendon 'end'clientendpipe fscreateWriteStream "/pipe.out"
Here's the same example as the one in the "Quick example" section but using the native thrift API.
var assert = require'assert';var thrift = require'thrift';var transport = require'thrift/lib/thrift/transport';var ThriftHive = require'../lib/0.7.1-cdh3u2/ThriftHive';// Client connectionvar options = transport: transportTBufferedTransport timeout: 1000;var connection = thriftcreateConnection'127.0.0.1' 10000 options;var client = thriftcreateClientThriftHive connection;// Execute queryclientexecute'use default'clientexecute'show tables'assertifErrorerr;clientfetchAlliferrconsole.logerrmessage;elseconsole.logdatabases;connectionend;;;;
For conveniency, we've added two functions,
may run multiple requests in sequential mode inside a same client connection. They
are both the same except how the last query is handled:
multi_executewill end with an
executecall, thus it's API is the same as the
multi_querywill end with a
querycall, thus it's API is the same as the
They accept the same arguments as their counterpart but the query may be an array or a string of queries. If it is a string, it will be split into multiple queries. Note, the parser is pretty light, removing ';' and comments but it seems to do the job.
Run the samples:
node samples/execute.jsnode samples/query.jsnode samples/style_native.jsnode samples/style_sugar.js
Run the tests with
Hive must be started with Thrift support. By default, the tests will connect to
Hive Thrift server on the host
localhost and the port
10000. Edit the file
"./test/config.json" if you wish to change the connection settings used accross
the tests. A database
test_database will be created if it does not yet exist
and all the tests will run on it.
npm install -g expressoexpresso -s