govpack

0.0.20 • Public • Published

govpack

govpack is a tool to help download and explore CKAN datasets

all YO data is belong to us

made for GovHack oz 2014

added DEMO at http://govpack.github.io/govpack/#68 or #1,#2,,,#100 etc (it's a copy of a.htm added at /gh-pages/index.html) which uses the big list of sites from http://instances.ckan.org/ the DEMO shows the hundred or so CKAN endpoints, BUT some of those are on the older v1, v2 'api/rest/dataset', 'api/2/rest/dataset' api's and not and not the latest v3 'api/3/action/' as fetched by govpack http://docs.ckan.org/en/ckan-1.8/apiv3.html -- so some don't show up

http://hackerspace.govhack.org/content/npm-install-g-govpack-or-github-govpackgovpack

also available on https://www.npmjs.org/package/govpack

npm install -f -g govpack
help wanted, status and fixes required detailed below...

image

govpack is a command line tool (and node module) that seeks out the metadata for ALL available data sets on a given CKAN endpoint, namely.... (X=0|1|2)

0 http://demo.ckan.org/api/3/action/current_package_list_with_resources

1 https://data.qld.gov.au/3/action/current_package_list_with_resources

2 https://data.gov.au/api/3/action/current_package_list_with_resources

CLI Usage:

 govpack {fetch:X} --> makes X.js module.exports=BigPackageList
 govpack {filter:X} --> makes X.txt filtered JSONP IIII(filtered_csv_metadata)
 govpack {download:X} --> downloads ./CSV/1.csv, ./CSV/2.csv,,, ./CSV/111.csv

the commands need to be run in that order because they depend on the previous result results are saved in the same folder as index.js ie in your global "./node_modules/govpack/index.js" folder downloaded "node_modules/govpack/format/1...n.format" files match up with the metadata in X.txt

Output paths will be improved

note: result paths will get changed to "node_modules/govpack/X/format/1...n.format" and have an option to put the results in a directory of your choice, which will be tidier and better for more ckans etc. With the X moved up to directory level, X.js and X.txt will have a common name like a.txt and b.txt for each.

From your node code:

GP=require('govpack');
GP({fetch:0},function(){console.log('Done!!')})
.....which returns....->
Please be patient while we fetch from API#0
Downloading:
http://demo.ckan.org/api/3/action/current_package_list_with_resources
SavingAs:
C:/A/N/node_modules/govpack/0.js

{filter:X ,format:'XYZ'}

As an option you may wish to set the format for the filter step to filter for some other filetype

govpack {filter:0 ,format:'KML'}
txt|xlsx|jpg|json|html|png|pdf|xls|cvs|gif|xml|
rdf|hdf5|kml|pptx|docx|doc|odp|dat|jar|zip|shp|etc

would all be okay format:'XYZ' (case insensitive) values to try but by far CSV is the most popular default.

a.htm

a.htm (shown in the image above) is the page that uses the JSONP 0.txt, and displays the filtered JSONP metadata generated by govpack from the CKAN records, namely...

  • links to the actual CSV files, (right click and choose Save File As)
  • CSV file size [where available]
  • table heading/description
  • field names (hased and colourized so all of the same fields light up in the same color)
  • field types
  • column and row counts

a.htm should be useful to look at,as a sample of the final ouput. I wanted to do search and autocomplete on the field names, this is now possible :-) also CKAN has many GET verbs (including one that does SQL queries) so with our refined JSONP metatata one could genarate other ajax calls, from a web page, to open up the data even further.

With the power of X (a simple integer as the primary key) more CKAN's can be added

 govpack {filter:X,format:'XLS'}

presently in the source code they are referenced at:

CK[0]={url:'http://demo.ckan.org/api/3/action/'}  // the demo data set as used by the CKAN docs
CK[1]={url:'https://data.qld.gov.au/api/3/action/'} //the state catalog of datasets
CK[2]={url:'https://data.gov.au/api/3/action/'}    //the national catalog of datasets 
CK[99]={url:'https://some_CKAN_action_endpoint/'} // ie ADD some more
// this CK[] array will probably end up in a seperate config file

objectified so we can describe them further and add more

NOW #2 (data.gov.au) is big and FAILS as a single request

 the code has some in progress (INCOMPLETE) calls 
 to fetch it as several pagenated sub requests (todo)
 namely GetBiggerList(x,cb){/*conglomerate page-enated package lists*/}

at one stage npm was not making the corect govpack.cmd or shell script

but as someone kindly pointed out the following 2 fixes worked!!

 1) "bin": {"govpack": "index.js"}, /*add to your package.json*/
 2)  and Add 
         #!/usr/bin/env node  
         to the top of your index.js file
funnily enough the shebang is useful on windows!!


"C:/A/N/node.exe" "C:/A/B/2/9/Ax/20/index.js" {fetch:1}
(works for me) but govpack {fetch:1} is better since your paths will vary

index.js has code that should make govpack to work as both a Command Line tool AND a module

if(require.main === module){/*Use from the CommandLine*/}
else{module.exports=init/*work as a module*/}

Finally (get me the data)

after having run govpack {fetch:0} and govpack {filter:0} you may also call

govpack {download:0} 

to download the filtered CSV file set from to disc

more endpoints/fixes and addtions are wecolme

CSV Tables Are Cool
but what's? inside $1600
col 2 is ???????? $12
zebra stripes are neat $1

now we know

email to govpack@gmail.com

Package Sidebar

Install

npm i govpack

Weekly Downloads

2

Version

0.0.20

License

BSD

Last publish

Collaborators

  • govpack