A tool concentrating on converting csv data to JSON with customised parser supporting
Nodejs csv to json converter. Fully featured:
Thanks all the contributors
Version 1.1.0 has added new features and optimised lib performance. It also introduced simpler APIs to use. Thus readme is re-written to adapt the preferred new APIs. The lib will support old APIs. To review the old readme please click here.
All changes are backward compatible.
Here is a free online csv to json service ultilising latest csvtojson module.
npm i --save csvtojson
/** csv filea,b,c1,2,34,5,6*/const csvFilePath='<path to csv file>'const csv=
//const csvReadStream -- Readable stream for csv sourceconst csv=
$ npm i -g csvtojson
$ csvtojson [options] <csv file path>
Convert csv file and save result to json file:
$ csvtojson source.csv > converted.json
Use multiple cpu-cores:
$ csvtojson --workerNum=4 source.csv > converted.json
Pipe in csv data:
$ cat ./source.csv | csvtojson > converted.json
const csv=const converter= //params see below Parameters section
converter is an instance of Converter which is a subclass of node.js
require('csvtojson') returns a constructor function which takes 2 arguments:
const csv=const converter=
Both arguments are optional.
Stream Options please read Stream Option from Node.JS
parserParameters is a JSON object like:
Following parameters are supported:
All parameters can be used in Command Line tool.
Converter class defined a series of events.
json event is emitted for each parsed CSV line. It passes JSON object and the row number of the CSV line in its callback function.
csv event is emitted for each CSV line. It passes an array object which contains cells content of one csv row.
csvRow is always an array of strings without types.
csv event is the fastest parse event while
data event is about 2 times slower. Thus if
csv is enough, for best performance, just use it without
data event is emitted for each parsed CSV line. It passes buffer of strigified JSON unless
objectMode is set true in stream option.
error event is emitted if there is any errors happened during parsing.
Note that if
error being emitted, the process will stop as node.js will automatically
unpipe() upper-stream and chained down-stream1. This will cause
end_parsed event never being emitted because
end event is only emitted when all data being consumed 2.
record_parsed event is emitted for each parsed CSV line. It is combination of
csv events. For better performance, try to use
end event is emitted when all CSV lines being parsed.
end_parsed event is emitted when all CSV lines being parsed. The only difference between
end events is
end_parsed will pass in a JSON array which contains all JSON objects. For better performance, try to use
end event instead.
done event is emitted either after
error. This indicates the processor has stopped.
if any error during parsing, it will be passed in callback.
the function in
preRawData will be called directly with the string from upper stream.
the function is called each time a file line being found in csv stream. the
lineIdx is the file line number in the file. The function should return a string to processor.
Transform happens after CSV being parsed before result being emitted or pushed to downstream. This means if
jsonObj is changed, the corresponding field in
csvRow will not change. Vice versa. The events will emit changed value and downstream will receive changed value.
Transform will cause some performance panelties because it voids optimisation mechanism. Try to use Node.js
Transform class as downstream for transformation instead.
One of the powerful feature of
csvtojson is the ability to convert csv line to a nested JSON by correctly defining its csv header row. This is default out-of-box feature.
Here is an example. Original CSV:
fieldA.title, fieldA.children.0.name, fieldA.children.0.id,fieldA.children.1.name, fieldA.children.1.employee.0.name,fieldA.children.1.employee.1.name, fieldA.address.0,fieldA.address.1, descriptionFood Factory, Oscar, 0023, Tikka, Tim, Joe, 3 Lame Road, Grantstown, A fresh new food factoryKindom Garden, Ceil, 54, Pillow, Amst, Tom, 24 Shaker Street, HelloTown, Awesome castle
The data above contains nested JSON including nested array of JSON objects and plain texts.
Using csvtojson to convert, the result would be like:
In case to not produce nested JSON, simply set
flatKeys:true in parameters.
csvtojson uses csv header row as generator of JSON keys. However, it does not require the csv source containing a header row. There are 4 ways to define header rows:
noheader:true. This will automatically add
fieldNheader to csv cells
// replace header row (first row) from original source with 'header1, header2'// original source has no header row. add 'field1' 'field2' ... 'fieldN' as csv header// original source has no header row. use 'header1' 'header2' as its header row
csvtojson has built-in workers to allow CSV parsing happening on another process and leave Main Process non-blocked. This is very useful when dealing with large csv data on a webserver so that parsing CSV will not block the entire server due to node.js being single threaded.
It is also useful when dealing with tons of CSV data on command line. Multi-CPU core support will dramatically reduce the time needed.
To enable multi-cpu core, simply do:
or in command line:
$ csvtojson --workerNum=4
This will create 3 extra workers. Main process will only be used for delegating data / emitting result / pushing to downstream. Just keep in mind, those operations on Main process are not free and it will still take a certain amount CPU time.
See here for how
csvtojson leverages CPU usage when using multi-cores.
There are some limitations when using multi-core feature:
csvtojson follows github convention for contributions. Here are some steps:
npm testlocally before pushing code back.
checkTypeis now false as it causes problems on some csv docs.