tableschema-js

A library for working with Table Schema.

Features

Table class for working with data and schema
Schema class for working with schemas
Field class for working with schema fields
validate function for validating schema descriptors
infer function that creates a schema based on a data sample

Getting started
- Installation
Documentation
API Reference
Contributing
Changelog

Getting started

To use the library with webpack please replicate the webpack.config.js->node configuration - https://github.com/frictionlessdata/tableschema-js/blob/master/webpack.config.js

Installation

The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specify tableschema version range in your package.json file e.g. tabulator: ^1.0 which will be added by default by npm install --save.

NPM

$ npm install tableschema

CDN

<script src="//unpkg.com/tableschema/dist/tableschema.min.js"></script>

Documentation

Introduction

Let's start with a simple example for Node.js:

const {Table} = require('tableschema')

const table = await Table.load('data.csv')
await table.infer() // infer a schema
await table.read({keyed: true}) // read the data
await table.schema.save() // save the schema
await table.save() // save the data

And for browser:

https://jsfiddle.net/rollninja/ayngwd38/2/

After the script registration the library will be available as a global variable tableschema:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>tableschema-js</title>
  </head>
  <body>
    <script src="//unpkg.com/tableschema/dist/tableschema.min.js"></script>
    <script>
      const main = async () => {
        const table = await tableschema.Table.load('https://raw.githubusercontent.com/frictionlessdata/datapackage-js/master/data/data.csv')
        const rows = await table.read()
        document.body.innerHTML += `<div>${table.headers}</div>`
        for (const row of rows) {
          document.body.innerHTML += `<div>${row}</div>`
        }
      }
      main()
    </script>
  </body>
</html>

Working with Table

A table is a core concept in a tabular data world. It represents data with metadata (Table Schema). Let's see how we could use it in practice.

Consider we have some local csv file. It could be inline data or remote link - all supported by Table class (except local files for in-browser usage of course). But say it's data.csv for now:

city,location
london,"51.50,-0.11"
paris,"48.85,2.30"
rome,N/A

Let's create and read a table. We use static Table.load method and table.read method with a keyed option to get array of keyed rows:

const table = await Table.load('data.csv')
table.headers // ['city', 'location']
await table.read({keyed: true})
// [
//   {city: 'london', location: '51.50,-0.11'},
//   {city: 'paris', location: '48.85,2.30'},
//   {city: 'rome', location: 'N/A'},
// ]

As we could see our locations are just strings. But it should be geopoints. Also Rome's location is not available but it's also just a N/A string instead of JavaScript null. First we have to infer Table Schema:

await table.infer()
table.schema.descriptor
// { fields:
//   [ { name: 'city', type: 'string', format: 'default' },
//     { name: 'location', type: 'geopoint', format: 'default' } ],
//  missingValues: [ '' ] }
await table.read({keyed: true})
// Fails with a data validation error

Let's fix not available location. There is a missingValues property in Table Schema specification. As a first try we set missingValues to N/A in table.schema.descriptor. Schema descriptor could be changed in-place but all changes should be committed by table.schema.commit():

table.schema.descriptor['missingValues'] = 'N/A'
table.schema.commit()
table.schema.valid // false
table.schema.errors
// Error: Descriptor validation error:
//   Invalid type: string (expected array)
//    at "/missingValues" in descriptor and
//    at "/properties/missingValues/type" in profile

As a good citizens we've decided to check out schema descriptor validity. And it's not valid! We should use an array for missingValues property. Also don't forget to have an empty string as a missing value:

table.schema.descriptor['missingValues'] = ['', 'N/A']
table.schema.commit()
table.schema.valid // true

All good. It looks like we're ready to read our data again:

await table.read({keyed: true})
// [
//   {city: 'london', location: [51.50,-0.11]},
//   {city: 'paris', location: [48.85,2.30]},
//   {city: 'rome', location: null},
// ]

Now we see that:

locations are arrays with numeric latitude and longitude
Rome's location is a native JavaScript null

And because there are no errors on data reading we could be sure that our data is valid against our schema. Let's save it:

await table.schema.save('schema.json')
await table.save('data.csv')

Our data.csv looks the same because it has been stringified back to csv format. But now we have schema.json:

{
    "fields": [
        {
            "name": "city",
            "type": "string",
            "format": "default"
        },
        {
            "name": "location",
            "type": "geopoint",
            "format": "default"
        }
    ],
    "missingValues": [
        "",
        "N/A"
    ]
}

If we decide to improve it even more we could update the schema file and then open it again. But now providing a schema path and iterating thru the data using Node Streams:

const table = await Table.load('data.csv', {schema: 'schema.json'})
const stream = await table.iter({stream: true})
stream.on('data', (row) => {
  // handle row ['london', [51.50,-0.11]] etc
  // keyed/extended/cast supported in a stream mode too
})

It was only basic introduction to the Table class. To learn more let's take a look on Table class API reference.

Working with Schema

A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a url to a JSON file or a JSON object. The schema is initially validated (see validate below). By default validation errors will be stored in schema.errors but in a strict mode it will be instantly raised.

Let's create a blank schema. It's not valid because descriptor.fields property is required by the Table Schema specification:

const schema = await Schema.load({})
schema.valid // false
schema.errors
// Error: Descriptor validation error:
//         Missing required property: fields
//         at "" in descriptor and
//         at "/required/0" in profile

To not create a schema descriptor by hands we will use a schema.infer method to infer the descriptor from given data:

schema.infer([
  ['id', 'age', 'name'],
  ['1','39','Paul'],
  ['2','23','Jimmy'],
  ['3','36','Jane'],
  ['4','28','Judy'],
])
schema.valid // true
schema.descriptor
//{ fields:
//   [ { name: 'id', type: 'integer', format: 'default' },
//     { name: 'age', type: 'integer', format: 'default' },
//     { name: 'name', type: 'string', format: 'default' } ],
//  missingValues: [ '' ] }

Now we have an inferred schema and it's valid. We could cast data row against our schema. We provide a string input by an output will be cast correspondingly:

schema.castRow(['5', '66', 'Sam'])
// [ 5, 66, 'Sam' ]

But if we try provide some missing value to age field cast will fail because for now only one possible missing value is an empty string. Let's update our schema:

schema.castRow(['6', 'N/A', 'Walt'])
// Cast error
schema.descriptor.missingValues = ['', 'N/A']
schema.commit()
schema.castRow(['6', 'N/A', 'Walt'])
// [ 6, null, 'Walt' ]

We could save the schema to a local file. And we could continue the work in any time just loading it from the local file:

await schema.save('schema.json')
const schema = await Schema.load('schema.json')

It was only basic introduction to the Schema class. To learn more let's take a look on Schema class API reference.

Working with Field

Class represents a field in the schema.

Data values can be cast to native JavaScript types. Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema.

{
    'name': 'birthday',
    'type': 'date',
    'format': 'default',
    'constraints': {
        'required': True,
        'minimum': '2015-05-30'
    }
}

Following code will not raise the exception, despite the fact our date is less than minimum constraints in the field, because we do not check constraints of the field descriptor

var dateType = field.castValue('2014-05-29')

And following example will raise exception, because we set flag 'skip constraints' to false, and our date is less than allowed by minimum constraints of the field. Exception will be raised as well in situation of trying to cast non-date format values, or empty values

try {
    var dateType = field.castValue('2014-05-29', false)
} catch(e) {
    // uh oh, something went wrong
}

Values that can't be cast will raise an Error exception. Casting a value that doesn't meet the constraints will raise an Error exception.

Available types, formats and resultant value of the cast:

Type	Formats	Casting result
any	default	Any
array	default	Array
boolean	default	Boolean
date	default, any, <PATTERN>	Date
datetime	default, any, <PATTERN>	Date
duration	default	moment.Duration
geojson	default, topojson	Object
geopoint	default, array, object	[Number, Number]
integer	default	Number
number	default	Number
object	default	Object
string	default, uri, email, binary	String
time	default, any, <PATTERN>	Date
year	default	Number
yearmonth	default	[Number, Number]

Working with validate/infer

validate() validates whether a schema is a validate Table Schema accordingly to the specifications. It does not validate data against a schema.

Given a schema descriptor validate returns Promise with a validation object:

const {validate} = require('tableschema')

const {valid, errors} = await validate('schema.json')
for (const error of errors) {
  // inspect Error objects
}

Given data source and headers infer will return a Table Schema as a JSON object based on the data values.

Given the data file, example.csv:

id,age,name
1,39,Paul
2,23,Jimmy
3,36,Jane
4,28,Judy

Call infer with headers and values from the datafile:

const descriptor = await infer('data.csv')

The descriptor variable is now a JSON object:

{
  fields: [
    {
      name: 'id',
      title: '',
      description: '',
      type: 'integer',
      format: 'default'
    },
    {
      name: 'age',
      title: '',
      description: '',
      type: 'integer',
      format: 'default'
    },
    {
      name: 'name',
      title: '',
      description: '',
      type: 'string',
      format: 'default'
    }
  ]
}

API Reference

Table

Table representation

Table
- instance
  - .headers ⇒ Array.<string>
  - .schema ⇒ Schema
  - .iter(keyed, extended, cast, forceCast, relations, stream) ⇒ AsyncIterator | Stream
  - .read(limit) ⇒ Array.<Array> | Array.<Object>
  - .infer(limit) ⇒ Object
  - .save(target) ⇒ Boolean
- static
  - .load(source, schema, strict, headers, parserOptions) ⇒ Table

table.headers ⇒ `Array.<string>`

Headers

Returns: Array.<string> - data source headers

table.schema ⇒ `Schema`

Schema

Returns: Schema - table schema instance

table.iter(keyed, extended, cast, forceCast, relations, stream) ⇒ `AsyncIterator` | `Stream`

Iterate through the table data

And emits rows cast based on table schema (async for loop). With a stream flag instead of async iterator a Node stream will be returned. Data casting can be disabled.

Returns: AsyncIterator | Stream - async iterator/stream of rows:

[value1, value2] - base
{header1: value1, header2: value2} - keyed
[rowNumber, [header1, header2], [value1, value2]] - extended Throws:
TableSchemaError raises any error occurred in this process

Param	Type	Description
keyed	`boolean`	iter keyed rows
extended	`boolean`	iter extended rows
cast	`boolean`	disable data casting if false
forceCast	`boolean`	instead of raising on the first row with cast error return an error object to replace failed row. It will allow to iterate over the whole data file even if it's not compliant to the schema. Example of output stream: `[['val1', 'val2'], TableSchemaError, ['val3', 'val4'], ...]`
relations	`Object`	object of foreign key references in a form of `{resource1: [{field1: value1, field2: value2}, ...], ...}`. If provided foreign key fields will checked and resolved to its references
stream	`boolean`	return Node Readable Stream of table rows

table.read(limit) ⇒ `Array.<Array>` | `Array.<Object>`

Read the table data into memory

The API is the same as table.iter has except for:

Returns: Array.<Array> | Array.<Object> - list of rows:

[value1, value2] - base
{header1: value1, header2: value2} - keyed
[rowNumber, [header1, header2], [value1, value2]] - extended

Param	Type	Description
limit	`integer`	limit of rows to read

table.infer(limit) ⇒ `Object`

Infer a schema for the table.

It will infer and set Table Schema to table.schema based on table data.

Returns: Object - Table Schema descriptor

Param	Type	Description
limit	`number`	limit rows sample size

table.save(target) ⇒ `Boolean`

Save data source to file locally in CSV format with , (comma) delimiter

Returns: Boolean - true on success Throws:

TableSchemaError an error if there is saving problem

Param	Type	Description
target	`string`	path where to save a table data

Table.load(source, schema, strict, headers, parserOptions) ⇒ `Table`

Factory method to instantiate Table class.

This method is async and it should be used with await keyword or as a Promise. If references argument is provided foreign keys will be checked on any reading operation.

Returns: Table - data table class instance Throws:

TableSchemaError raises any error occurred in table creation process

Param	Type	Description
source	`string` \| `Array.<Array>` \| `Stream` \| `function`	data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents
schema	`string` \| `Object`	data schema in all forms supported by `Schema` class
strict	`boolean`	strictness option to pass to `Schema` constructor
headers	`number` \| `Array.<string>`	data source headers (one of): - row number containing headers (`source` should contain headers rows) - array of headers (`source` should NOT contain headers rows)
parserOptions	`Object`	options to be used by CSV parser. All options listed at https://csv.js.org/parse/options/. By default `ltrim` is true according to the CSV Dialect spec.

Schema

Schema representation

Schema
- instance
  - .valid ⇒ Boolean
  - .errors ⇒ Array.<Error>
  - .descriptor ⇒ Object
  - .primaryKey ⇒ Array.<string>
  - .foreignKeys ⇒ Array.<Object>
  - .fields ⇒ Array.<Field>
  - .fieldNames ⇒ Array.<string>
  - .getField(fieldName) ⇒ Field | null
  - .addField(descriptor) ⇒ Field
  - .removeField(name) ⇒ Field | null
  - .castRow(row, failFalst) ⇒ Array.<Array>
  - .infer(rows, headers) ⇒ Object
  - .commit(strict) ⇒ Boolean
  - .save(target) ⇒ boolean
- static
  - .load(descriptor, strict) ⇒ Schema

schema.valid ⇒ `Boolean`

Validation status

It always true in strict mode.

Returns: Boolean - returns validation status

schema.errors ⇒ `Array.<Error>`

Validation errors

It always empty in strict mode.

Returns: Array.<Error> - returns validation errors

schema.descriptor ⇒ `Object`

Descriptor

Returns: Object - schema descriptor

schema.primaryKey ⇒ `Array.<string>`

Primary Key

Returns: Array.<string> - schema primary key

schema.foreignKeys ⇒ `Array.<Object>`

Foreign Keys

Returns: Array.<Object> - schema foreign keys

schema.fields ⇒ `Array.<Field>`

Fields

Returns: Array.<Field> - schema fields

schema.fieldNames ⇒ `Array.<string>`

Field names

Returns: Array.<string> - schema field names

schema.getField(fieldName) ⇒ `Field` | `null`

Return a field

Returns: Field | null - field instance if exists

Param	Type
fieldName	`string`

schema.addField(descriptor) ⇒ `Field`

Add a field

Returns: Field - added field instance

Param	Type
descriptor	`Object`

schema.removeField(name) ⇒ `Field` | `null`

Remove a field

Returns: Field | null - removed field instance if exists

Param	Type
name	`string`

schema.castRow(row, failFalst) ⇒ `Array.<Array>`

Cast row based on field types and formats.

Returns: Array.<Array> - cast data row

Param	Type	Description
row	`Array.<Array>`	data row as an array of values
failFalst	`boolean`

schema.infer(rows, headers) ⇒ `Object`

Infer and set schema.descriptor based on data sample.

Returns: Object - Table Schema descriptor

Param	Type	Description
rows	`Array.<Array>`	array of arrays representing rows
headers	`integer` \| `Array.<string>`	data sample headers (one of): - row number containing headers (`rows` should contain headers rows) - array of headers (`rows` should NOT contain headers rows) - defaults to 1

schema.commit(strict) ⇒ `Boolean`

Update schema instance if there are in-place changes in the descriptor.

Returns: Boolean - returns true on success and false if not modified Throws:

TableSchemaError raises any error occurred in the process

Param	Type	Description
strict	`boolean`	alter `strict` mode for further work

Example

const descriptor = {fields: [{name: 'field', type: 'string'}]}
const schema = await Schema.load(descriptor)

schema.getField('name').type // string
schema.descriptor.fields[0].type = 'number'
schema.getField('name').type // string
schema.commit()
schema.getField('name').type // number

schema.save(target) ⇒ `boolean`

Save schema descriptor to target destination.

Returns: boolean - returns true on success Throws:

TableSchemaError raises any error occurred in the process

Param	Type	Description
target	`string`	path where to save a descriptor

Schema.load(descriptor, strict) ⇒ `Schema`

Factory method to instantiate Schema class.

This method is async and it should be used with await keyword or as a Promise.

Returns: Schema - returns schema class instance Throws:

TableSchemaError raises any error occurred in the process

Param	Type	Description
descriptor	`string` \| `Object`	schema descriptor: - local path - remote url - object
strict	`boolean`	flag to alter validation behaviour: - if false error will not be raised and all error will be collected in `schema.errors` - if strict is true any validation error will be raised immediately

Field

Field representation

Field
- new Field(descriptor, missingValues)
- .name ⇒ string
- .type ⇒ string
- .format ⇒ string
- .required ⇒ boolean
- .constraints ⇒ Object
- .descriptor ⇒ Object
- .castValue(value, constraints) ⇒ any
- .testValue(value, constraints) ⇒ boolean

new Field(descriptor, missingValues)

Constructor to instantiate Field class.

Returns: Field - returns field class instance Throws:

TableSchemaError raises any error occured in the process

Param	Type	Description
descriptor	`Object`	schema field descriptor
missingValues	`Array.<string>`	an array with string representing missing values

field.name ⇒ `string`

Field name

field.type ⇒ `string`

Field type

field.format ⇒ `string`

Field format

field.required ⇒ `boolean`

Return true if field is required

field.constraints ⇒ `Object`

Field constraints

field.descriptor ⇒ `Object`

Field descriptor

field.castValue(value, constraints) ⇒ `any`

Cast value

Returns: any - cast value

Param	Type	Description
value	`any`	value to cast
constraints	`Object` \| `false`

field.testValue(value, constraints) ⇒ `boolean`

Check if value can be cast

Param	Type	Description
value	`any`	value to test
constraints	`Object` \| `false`

validate(descriptor) ⇒ `Object`

This function is async so it has to be used with await keyword or as a Promise.

Returns: Object - returns {valid, errors} object

Param	Type	Description
descriptor	`string` \| `Object`	schema descriptor (one of): - local path - remote url - object

infer(source, headers, options) ⇒ `Object`

This function is async so it has to be used with await keyword or as a Promise.

Returns: Object - returns schema descriptor Throws:

TableSchemaError raises any error occured in the process

Param	Type	Description
source	`string` \| `Array.<Array>` \| `Stream` \| `function`	data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents
headers	`Array.<string>`	array of headers
options	`Object`	any `Table.load` options

DataPackageError

Base class for the all DataPackage/TableSchema errors.

If there are more than one error you could get an additional information from the error object:

try {
  // some lib action
} catch (error) {
  console.log(error) // you have N cast errors (see error.errors)
  if (error.multiple) {
    for (const error of error.errors) {
        console.log(error) // cast error M is ...
    }
  }
}

DataPackageError
- new DataPackageError(message, errors)
- .multiple ⇒ boolean
- .errors ⇒ Array.<Error>

new DataPackageError(message, errors)

Create an error

Param	Type	Description
message	`string`
errors	`Array.<Error>`	nested errors

dataPackageError.multiple ⇒ `boolean`

Whether it's nested

dataPackageError.errors ⇒ `Array.<Error>`

List of errors

TableSchemaError

Base class for the all TableSchema errors.

Contributing

The project follows the Open Knowledge International coding standards. There are common commands to work with the project:

$ npm install
$ npm run test
$ npm run build

Changelog

Here described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted commit history.

v1.12

Added support for infinite numbers: NaN, INF, -INF

v1.11

Improved data/time validation using a conversion table and moment.js (#170)

v1.10

Rebased on csv-parse@4

v1.9

Fix bug:

URI format must have the scheme protocol to be valid (#135)

v1.8

Improved behaviour:

Automatically detect the CSV delimiter if one isn't explicit set

v1.7

New API added:

added forceCast flag to the the table.iter/read methods

v1.6

Improved behaviour:

improved validation of string and geojson types
added heuristics to the infer function

v1.5

New API added:

added format option to the Table constructor
added encoding option to the Table constructor

v1.4

Improved behaviour:

Now the infer functions support formats inferring

v1.3

New API added:

error.rowNumber if available
error.columnNumber if available

v1.2

New API added:

Table.load and infer now accept Node Stream as a source argument

v1.1

New API added:

Table.load and infer now accepts parserOptions

v1.0

This version includes various big changes, including a move to asynchronous inference.

v0.2

First stable version of the library.

tableschema

tableschema-js

Features

Contents

Getting started

Installation

NPM

CDN

Documentation

Introduction

Working with Table

Working with Schema

Working with Field

Working with validate/infer

API Reference

Table

table.headers ⇒ Array.<string>

table.schema ⇒ Schema

table.iter(keyed, extended, cast, forceCast, relations, stream) ⇒ AsyncIterator | Stream

table.read(limit) ⇒ Array.<Array> | Array.<Object>

table.infer(limit) ⇒ Object

table.save(target) ⇒ Boolean

Table.load(source, schema, strict, headers, parserOptions) ⇒ Table

Schema

schema.valid ⇒ Boolean

schema.errors ⇒ Array.<Error>

schema.descriptor ⇒ Object

schema.primaryKey ⇒ Array.<string>

schema.foreignKeys ⇒ Array.<Object>

schema.fields ⇒ Array.<Field>

schema.fieldNames ⇒ Array.<string>

schema.getField(fieldName) ⇒ Field | null

schema.addField(descriptor) ⇒ Field

schema.removeField(name) ⇒ Field | null

schema.castRow(row, failFalst) ⇒ Array.<Array>

schema.infer(rows, headers) ⇒ Object

schema.commit(strict) ⇒ Boolean

schema.save(target) ⇒ boolean

Schema.load(descriptor, strict) ⇒ Schema

Field

new Field(descriptor, missingValues)

field.name ⇒ string

field.type ⇒ string

field.format ⇒ string

field.required ⇒ boolean

field.constraints ⇒ Object

field.descriptor ⇒ Object

field.castValue(value, constraints) ⇒ any

field.testValue(value, constraints) ⇒ boolean

validate(descriptor) ⇒ Object

infer(source, headers, options) ⇒ Object

DataPackageError

new DataPackageError(message, errors)

dataPackageError.multiple ⇒ boolean

dataPackageError.errors ⇒ Array.<Error>

TableSchemaError

Contributing

Changelog

v1.12

v1.11

v1.10

v1.9

v1.8

v1.7

v1.6

v1.5

v1.4

v1.3

v1.2

v1.1

v1.0

v0.2

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

table.headers ⇒ `Array.<string>`

table.schema ⇒ `Schema`

table.iter(keyed, extended, cast, forceCast, relations, stream) ⇒ `AsyncIterator` | `Stream`

table.read(limit) ⇒ `Array.<Array>` | `Array.<Object>`

table.infer(limit) ⇒ `Object`

table.save(target) ⇒ `Boolean`

Table.load(source, schema, strict, headers, parserOptions) ⇒ `Table`

schema.valid ⇒ `Boolean`

schema.errors ⇒ `Array.<Error>`

schema.descriptor ⇒ `Object`

schema.primaryKey ⇒ `Array.<string>`

schema.foreignKeys ⇒ `Array.<Object>`

schema.fields ⇒ `Array.<Field>`

schema.fieldNames ⇒ `Array.<string>`

schema.getField(fieldName) ⇒ `Field` | `null`

schema.addField(descriptor) ⇒ `Field`

schema.removeField(name) ⇒ `Field` | `null`

schema.castRow(row, failFalst) ⇒ `Array.<Array>`

schema.infer(rows, headers) ⇒ `Object`

schema.commit(strict) ⇒ `Boolean`

schema.save(target) ⇒ `boolean`

Schema.load(descriptor, strict) ⇒ `Schema`

field.name ⇒ `string`

field.type ⇒ `string`

field.format ⇒ `string`

field.required ⇒ `boolean`

field.constraints ⇒ `Object`

field.descriptor ⇒ `Object`

field.castValue(value, constraints) ⇒ `any`

field.testValue(value, constraints) ⇒ `boolean`

validate(descriptor) ⇒ `Object`

infer(source, headers, options) ⇒ `Object`

dataPackageError.multiple ⇒ `boolean`

dataPackageError.errors ⇒ `Array.<Error>`

Weekly Downloads