csv-events

0.0.23 • Public • Published

workflow Jest coverage

csv-events is a node.js library for reading CSV files featuring two classes:

  • CSVEventEmitter: a low level event emitter;
  • CSVReader: an application level object stream transformer.

Installation

npm install csv-events

CSVReader

This is an asynchronous CSV parser implemented as a stream.Transform from a binary readable stream representing utf-8 encoded input into a readable object stream.

Each CSV line read produces one output object. The mapping is defined when creating the CSVReader instance.

const {CSVReader} = require ('csv-events')

const csv = CSVReader ({
//  delimiter: ',',
//  skip: 0,           // header lines
//  rowNumField: '#',  // how to name the line # property
//  rowNumBase: 1,     // what # has the 1st not skipped line
//  empty: null,
    columns: [
      'id',            // 1st column: read as `id`, unquote
      null,            // 2nd column: to ignore
      {
        name: 'name',  // 3rd column: read as `name`
//      raw: true      // if you prefer to leave it quoted
      }, 
    ]
})

myReadUtf8CsvTextStream.pipe (csv)

for await (const {id, name} of csv) {
// do something with `id` and `name` 
}

Constructor Options

Name Default value Description
columns Array of column definitions (see below)
delimiter ',' Column delimiter
skip 0 Number of header lines to ignore
rowNumField null The name of the line # property (null for no numbering)
rowNumBase 1 - skip The 1st output record line #
empty null The value corresponding to zero length cell content
maxLength 1e6 The maximum cell length allowed (to prevent a memory overflow)

More on columns

Specifying columns is mandatory to create a CSVReader. It must be an array which every element is:

  • either null (for columns to bypass)
  • or a {name, raw} object
    • that can be shortened to a string name.

names are used as keys when constructing output objects.

Corresponding values are strings, except for the zero length case when the empty option value is used instead, null by default.

Normally, those string values come unquoted, but by using the raw option, one can turn off this processing. This may have sense in two cases:

  • the values read are immediately printed into another CSV stream, so quotes are reused;
  • for data guaranteed to be printed as is, reading raw cells content is slightly faster.

For CSV rows with less cells than columns.length, properties my be missing. The \n CSV will be read as a single {} object.

CSVEventEmitter

This is a synchronous CSV parser implemented as an event emitter.

const {CSVEventEmitter} = require ('csv-events')

const ee = new CSVEventEmitter ({
   mask: 0b101 // only 1st and 3rd column will be signaled
// delimiter: ',',
// empty: null,
// maxLength: 1e6,
})

const names = []; ee.on ('c', () => {
  if (ee.row !== 0n && ee.column === 1) names.push (ee.value)
})

ee.write ('ID,NAME\r\n')
ee.write ('1,admin\n')
ee.end ('2,user\n') // `names` will be ['admin', 'user']

Incoming data in form of Strings are supplied via the write and end synchronous methods (this API is loosely based on StringDecoder's one) producing a sequence of c ("cell") and r ("row") events.

No event carries any payload, though the parsed content details such as

  • row, column numbers;
  • unquoted cell content

are available via the CSVEventEmitter instance properties. This approach lets the application read selected portions of incoming text avoiding some overhead related to data not in use.

Constructor Options

Name Default value Description
mask Bit mask of required fields
delimiter ',' Column delimiter
empty null The value corresponding to zero length cell content
maxLength 1e6 The maximum buf.length allowed (inherently, the maximum length of write and end arguments)

Methods

Name Description
write (s) Append s to the internal buffer buf and emit all events for its parseable part; leave last unterminated cell source in buf
end (s) Execute write (s) and emit last events for the rest of buf and, finally, emits 'end'

Events

|Name|Payload|Description| |-|-| |c|column| Emitted for each cell which number satisfies mask when its content is available (via value and raw properties, see below)| |r| | Emitted for each row completed| |end| | Emitted by end (s)|

Properties

Name Type Description
unit Number or Bigint 1 corresponding to mask by type
row BigInt Number of the current row: 0n for the CSV header, if present
column Number Number of the current column, 0 based
index Number or Bigint Single bit mask corresponding to column (2**column)
buf String The internal buffer containing unparsed portion of the text gathered from write arguments
from Number Starting position of the current cell in buf
to Number Ending position of the current cell in buf
raw String Verbatim copy of buf between from and to, except row delimiters (computed property)
value String Unquoted raw, replaced with empty for a zero length string (computed property)

Limitations

Line Breaks

CSVEventEmitter and CSVReader recognize both:

  • CRLF ('\r\n', RFC 4180, Windows style) and
  • LF ('\n', UNIX style) as line breaks without any explicit option setting.

There is no way to apply CSVEventEmitter / CSVReader directly to texts generated with MacOS pre-X, Commodore, Amiga etc. neither any plans to implement such compatibility features.

CSV Validity

csv-events don't make any attempt to restore data from broken CSV source. So, a single unbalanced double quote will make all the rest of file lost.

The best csv-events can do in such case is not to waste too much memory keeping its internal buffer not bigger than maxLength characters.

Readme

Keywords

Package Sidebar

Install

npm i csv-events

Weekly Downloads

2

Version

0.0.23

License

MIT

Unpacked Size

38.6 kB

Total Files

11

Last publish

Collaborators

  • do-