Parse, tag and query xlsx-data

This package is used to parse structured tabular data from CSV or XLSX. Stored data can be browsed and transformed into a desired two-dimensional result table.

Its primary purpose is to deliver data for pptx-automizer. This project is Work in progress. Nevertheless, you might already use automizer-data to handle table files coming from statistical analytics software. The example-xlsx in __test__/data-folder is based on GESStabs.

Storage and querying is done with Prisma ORM tools.

Installation

As a package

If you are working on an existing project, you can add automizer-data to it using npm or yarn. Run

$ yarn add automizer-data

$ npm install automizer-data

in the root folder of your project. This will download and install the most recent version into your existing project.

As a cloned repository

If you want to see how it works and you like to run own tests, you should clone this repository and install the dependencies:

$ git clone git@github.com:singerla/automizer-data.git my-project
$ cd my-project
$ yarn install
$ yarn prisma generate

Prisma studio

You can open prisma studio and take a look at the data:

$ yarn prisma studio

A lot of good stuff can be found at prisma.io.

Import Data

According to parser's configuration, parsed data will sliced, tagged and separated into two-dimensional tables.

The Database contains:

Categories: Generic nouns to describe the basic structure of your project
Tags: Values of a certain category
Sheets: Two-dimensional tables and their additional info

Each Sheet will contain:

a collection of rows
a collection of columns
the two-dimensional table body
a collection of tags
a collection of metadata that came along with the sheet

Example usage

import { PrismaClient } from '@prisma/client'
import { Parser, Store } from '../src/index';
import { ParserOptions, Tagger, RawResultInfo } from "../src/types";

const store = new Store(
  new PrismaClient()
)

const config = <ParserOptions> {
  // This string separates tables if found in Column A
  separator: 'Table Separator',
  // A row that fits to any of the strings below will be
  // separated into "meta"-field if found in Col A
  metaMap: {
    base: ['BASE'],
    topBox: ['Top-2-Box (1-2)'],
    bottomBox: ['Bottom-2-Box (4-5)'],
    mean: ['Mean Value']
  },
  // Rows that equal to one of the labels below will be skipped.
  skipRows: [
    'company',
    '* Annotation',
    '(Sum of answers)'
  ],
  // A callback function to be applied to every body row
  renderRow: (row: string[]): (number|null)[] => {
    return row.map(cell => {
      if(cell === ' ') return null
      else return Number(cell)
    })
  },
  // The info array of each sub-table will be passed to this
  // callback. Tagging can be fine tuned here.
  renderTags: (info: RawResultInfo[], tags: Tagger): void => {
    info.forEach((info, level) => {
      let cat
      // info.key contains the info string's original section
      // could be 'body' or 'info'
      if(info.key === 'body') {
        cat = 'vartitle'
      } else if(info.value.indexOf('- ') === 0) {
        cat = 'measure'
      } else if(level === 0) {
        // We strip the table separator and pass CountryName
        info.value = info.value.replace('Table Separator – ', '')
        cat = 'country'
      } else if(level === 1) {
        cat = 'variable'
      } else if(level === 2) {
        cat = 'questionText'
      }
      if(cat) {
        tags.push(cat, info.value)
      }
    })
  },
}

const parse = new Gesstabs(config)
const file = `${__dirname}/data/test-data.xlsx`
const datasheets = await parse.fromXlsx(filename)
const summary = await store.run(datasheets)

Intermediate JSON

Xlsx-Parser will tranform tabular data into an intermediate JSON object. The closer your input data comes to this format, the easier it will be to implement a new parser type.

{
  "tags": [
    {
      "category": "country",
      "value": "Norway"
    },
    {
      "category": "variable",
      "value": "Q12"
    },
    {
      "category": "category",
      "value": "Bar soap"
    },
    {
      "category": "subgroup",
      "value": "Age"
    },
    {
      "category": "measure",
      "value": "nominal"
    }
  ],
  "columns": ["Total", "19-29", "30-39", "40-69"],
  "rows": ["answer 1", "answer 2", "answer 3"],
  "data": [
    [29,18,36,12],
    [39,19,24,11],
    [19,28,46,10]
  ],
  "meta": {
    "significance": [
      [null,null,"h",null],
      [null,"h","l",null],
      [null,null,"h","l"]
    ]
  }
}

Query Data

As all the Sheets are tagged, our queries will use tags to find the desired datasets.

import { getData, Store } from '../src';
import { all } from '../src/filter';
import { value } from '../src/cell';

// A selector is an array of tags.
const selector = [
    {
      category: 'country',
      value: 'Norway'
    },
    {
      category: "variable",
      value: "Q12"
    }
]

// The grid will define rows, cols and a callback
// to run inside a target cell.
const grid = {
    rows: all('row'),
    columns: all('column'),
    cell: value
}

const result = await getData(selector, grid)

// automizer-data will convert the result directly into
// a pptx-automizer-object. 
const chartData = result.toSeriesCategories()

automizer-data

Parse, tag and query xlsx-data

Installation

As a package

As a cloned repository

Prisma studio

Import Data

Example usage

Intermediate JSON

Query Data

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

automizer-data

Parse, tag and query xlsx-data

Installation

As a package

As a cloned repository

Prisma studio

Import Data

Example usage

Intermediate JSON

Query Data

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads