A data structure for representing tabular data.
Intended for use as:
- the data structure generated by loading data into the browser,
- the input and output type for data transformations (e.g. filtering and aggregation), and
- the input type for data visualizations.
This data structure accommodates both relational data tables as well as aggregated multidimensional data.
The purpose of this data structure is to serve as a common data table representation for use in the Chiasm project. By using this data structure, components for data access, data transformation, and interactive visualization can be interoperable.
What Problem Does This Solve?
Most of the D3-based data visualization examples are organized such that the data-specific logic is intertwined with data visualization logic. This is a barrier that makes it more difficult to adapt existing visualization examples to new data, or to create reusable visualization components.
As another example, consider the Heatmap Example. This example has the following lines:
// The size of the buckets in the CSV data file.// This could be inferred from the data if it weren't sparse.var xStep = 864e5yStep = 100;
In addition, it is useful to explicitly represent the types of each column so that they can be checked for compatibility with various "shelves" of visualization components such as
shapeColumn, corresponding to mappings from data columns (also called "variables", "fields", or "attributes") visual marks and channels. This enables user interfaces that are aware of column type restrictions for certain visualization, such as dropdown menus restricted by column type, or drag & drop interfaces that know where a given column can and cannot be dropped.
Visual Marks and Channels Diagram from Munzner: Visualization Analysis and Design
var ChiasmDataset = ;var dataset =data:name: "Joe" age: 29 birthday: 1986 11 17name: "Jane" age: 31 birthday: 1985 1 15metadata:columns:name: "name" type: "string"name: "age" type: "number"name: "birthday" type: "date";var promise = ChiasmDataset;promise;
Data Structure Reference
A property set to
true if this dataset contains aggregated multidimensional data where each row represents a cell of a data cube. This value may be omitted or set to
false if this dataset contains a relational data table where each row represents an individual entity or event.
When set to
true, each row contains values for dimensions and measures. In this case, each column is interpreted as either as a dimension or measure, depending on the value of the isDimension property.
An array of column descriptor objects. Each of these objects must have the properties name, label and type. The order of these objects may be used in visualizations (e.g. to define the order of axes in parallel coordinates, or the order of columns in an Excel-like table representation).
The name of the column. This corresponds to the keys in each row object of dataset.data.
The label of the column. This is a human-readable string that may be used in user interface elements that represent the column such as column selection widgets or axis labels in visualizations.
The type of the column. This is a string, and must be either "number", "string", or "date".
If this property is set to
true, then this column represents a data cube dimension. This property is only relevant if isCube is set to
If the column represents a dimension and is of type "number" or "date", then it is assumed to represent the result of binned aggregation. In this case, the interval property must be defined.
The interval between bins. This property is only relevant if:
- the dataset is aggregated (isCube ==
- the column is a dimension (isDimension is true), and
- the column type is either "number" or "date".
If the column type is "number", then this property is expected to be a number. This is the width of the numeric bins used, for example, in a histogram or heat map. Only fixed-width numeric binning is supported (variable width binning is not supported at this time).
If the column type is "date", then this property is expected to be a string corresponding to one of the interval types defined in d3-time. This includes, for example, "minute", "hour", "day", "week", "month", and "year".
Variable width intervals are planned to be supported in the future.
The domain of this column. This corresponds to the notion of domain in D3 scales.
If the column type is "string", then the domain is an array of unique string values that occur in the column. The ordering of these strings will determine, for example, the postion of bars in a bar chart, or the order in which values are mapped to colors using color scales.
If the column type is "number", then the domain is an array containing two numbers, the minimum (
domain) and the maximum (
domain). This is how numeric domains are typically represented in D3 numeric scales.
Date objects, the minimum (
domain) and the maximum (
domain). This is how temporal domains are typically represented in D3 time scales.
The following libraries touch upon some related aspects, such as data transformation and data-to-visualization mapping:
- Datalib A powerhouse for parsing and transforming data.
- Voyager This project implements automatic mapping from data to visualization, based on explicitly represented knowledge of compatibilities between column types and visual encodings.
- Plywood This can parse "split-apply-combine" data transformation expressions and evaluate them in memory or in a database.
The overall goal of this project is to serve as the core data structure exchanged between Chiasm components for representing tabular data. The following issues comprise the roadmap: