redshift-cli

command line for querying and managing a redshift cluster

undergoing a huge revamp. stay tuned

Useful for doing manual stuff on redshift and as a tool for cron jobs and such

###Install

    npm install -g redshift-cli
    redshift-cli --config=[some.config.json]
 
    //or (see quick query below)
 
    redshift-cli --config=[some.config.json] --query="select * from lala_land"
 
    //or (see auto run below)
 
    redshift-cli --config=[some.config.json] --autorun=/home/me/start.js

/*
    start.js:
 
    r.query('select * from foo');
    console.log('bye');
*/
 
/*
    config.json: 
    {
        autorun: 'start.js'
    }
*/
 
redshift-cli --config=....
 
// or
 
redshift-cli --config=[some.config.json] --autorun=/home/me/start.js
 

Will start the cli and execute start.js in the context of the CLI


    redshift-cli --query="select * from lala_land"

will execute this query and exit the process afterwards, exit code will indicate the successfulness of the query


{
    "connectionString": "[connection string including user and password (obtainable via redshift aws console) - WITHOUT THE DATABASE NAME]",
    "database": "[name of the database]"
}

{
    "connectionString": "[connection string including user and password (obtainable via redshift aws console) - WITHOUT THE DATABASE NAME]",
    "database": "[name of the database]",
    "accessKeyId": "[Access key with permissions to create temporary security token and access the s3 bucket where we store all the unloads]", 
    "secretAccessKey": "[Secret access key of the above]",  
    "region": "[amazon region to be used (e.g us-east-1)]",
    "defaultS3BucketName": "[bucket where we store unloads and loads]", 
    "sslEnabled": true  
}

more on optional config keys

redshift-cli config is using the awesome rc lib


Lets say we have table Foo in database meow:

id | name | bar
---+------+-----
1  | moo  | zzz
---+------+-----
2  | oom  | eek
---+------+-----
3  | dog  | rrr

etc...


redshift> r.query('select * from foo')

will print:

{
    "id": 1,
    "name": "moo",
    "bar": "zzz"
}
 
//and so on ...

redshift> r.query('select * from foo', { projection: 'bar' });

will print:

    zzz, eek, rrr

redshift> r.query('select * from foo', { filename: 'meow.json' });

will write meow.json:

{ "id": 1, "name": "moo", "bar": "zzz" }\n
{ "id": 2, "name": "oom", "bar": "eek" }\n
{ "id": 3, "name": "dog", "bar": "rrr" }\n

or with projection

redshift> r.query('select * from foo', { filename: 'meow.json', projection: 'id' });

will write meow.json:

1,2,3

redshift> r.describe();
 
//or
 
redshift> r.describe('dbname');
 
//
 
redshift> r.describe(function(err, results) { // do something with it });

 
redshift> r.showCreate('schema.tablename');
 

redshift> r.queryFromFile('zzz.sql')

will use content of zzz.sql as query and print the result

Another example:

redshift> r.queryFromFile('zzz.sql', { filename: 'result.json', projection: 'name' })

will use content of zzz.sql as query and save the result to 'result.json'


redshift> r.activeQueries();

will print a list of active queries with their pids


redshift> r.cancelQuery(1232);

will cancel query with pid 1232


/*
    given config.json contains: 
    { 
        "database": "meow",
        "defaultS3BucketName": "myunloads"
    }
*/
 
redshift> r.unloadData('foo')

will unload to s3://myunloads/meow/foo/[date string in iso format gmt 0 timezone]/data/0001_part... etc etc... using Temporary Security Credentials generate from sts using getSessionToken()

it will also create s3://myunloads/meow/foo/[date string in iso format gmt 0 timezone]/data.sql containing the sql used to select the data for unloading

redshift> r.unloadData('foo', 'select * from foo order by id', true, ',')

order by id when unloading, enable gzip and use ',' as delimiter


redshift> r.searchUnloads(inspectCallback)

will output:

redshift> 
    {
        meow: {
            foo: {
                2013-06-11T06:21:01.135Z: [
                    '0000_part_00',
                    '0001_part_00',
                    '0002_part_00',
                    '0003_part_00',
                    '0004_part_00',
                    '0005_part_00',
                    '0006_part_00',
                    '0007_part_00'
                ],
                2013-06-11T06:20:53.936Z: [
                    '0000_part_00',
                    '0001_part_00',
                    '0002_part_00',
                    '0003_part_00',
                    '0004_part_00',
                    '0005_part_00',
                    '0006_part_00',
                    '0007_part_00'
                ]
            }
        }
    }

redshift> r.loadData('foo', 'meow/foo/2013-06-11T06:21:01.135Z')

will load back data into foo.


use fixed credentials for unload/load operations (instead of temporary ones)

{
    "overrideS3SecurityCredentials": {
        "AccessKeyId": "[]",
        "SecretAccessKey": "[]"
    }
}

use gzip or replace default delimiter

{
    "delimiter": ",",
    "gzip": true
 
}

#####TODO -Add data verification for load operations - probably using data saved during unload (select count(*) and such) -Add auto table generation - need something like pg_dump -Auto 'select starttime, filename, err_reason from stl_load_errors order by starttime desc limit 100' with filename like '' when load fails