Newly Potted Mandrakes

    automizer-data
    TypeScript icon, indicating that this package has built-in type declarations

    0.1.2 • Public • Published

    Parse, tag and query xlsx-data

    This package is used to parse structured tabular data from CSV or XLSX. Stored data can be browsed and transformed into a desired two-dimensional result table.

    Its primary purpose is to deliver data for pptx-automizer. This project is Work in progress. Nevertheless, you might already use automizer-data to handle table files coming from statistical analytics software. The example-xlsx in __test__/data-folder is based on GESStabs.

    Storage and querying is done with Prisma ORM tools.

    Installation

    As a package

    If you are working on an existing project, you can add automizer-data to it using npm or yarn. Run

    $ yarn add automizer-data
    

    or

    $ npm install automizer-data
    

    in the root folder of your project. This will download and install the most recent version into your existing project.

    As a cloned repository

    If you want to see how it works and you like to run own tests, you should clone this repository and install the dependencies:

    $ git clone git@github.com:singerla/automizer-data.git my-project
    $ cd my-project
    $ yarn install
    $ yarn prisma generate
    

    Prisma studio

    You can open prisma studio and take a look at the data:

    $ yarn prisma studio
    

    A lot of good stuff can be found at prisma.io.

    Import Data

    According to parser's configuration, parsed data will sliced, tagged and separated into two-dimensional tables.

    The Database contains:

    • Categories: Generic nouns to describe the basic structure of your project
    • Tags: Values of a certain category
    • Sheets: Two-dimensional tables and their additional info

    Each Sheet will contain:

    • a collection of rows
    • a collection of columns
    • the two-dimensional table body
    • a collection of tags
    • a collection of metadata that came along with the sheet

    Example usage

    import { PrismaClient } from '@prisma/client'
    import { Parser, Store } from '../src/index';
    import { ParserOptions, Tagger, RawResultInfo } from "../src/types";
    
    const store = new Store(
      new PrismaClient()
    )
    
    const config = <ParserOptions> {
      // This string separates tables if found in Column A
      separator: 'Table Separator',
      // A row that fits to any of the strings below will be
      // separated into "meta"-field if found in Col A
      metaMap: {
        base: ['BASE'],
        topBox: ['Top-2-Box (1-2)'],
        bottomBox: ['Bottom-2-Box (4-5)'],
        mean: ['Mean Value']
      },
      // Rows that equal to one of the labels below will be skipped.
      skipRows: [
        'company',
        '* Annotation',
        '(Sum of answers)'
      ],
      // A callback function to be applied to every body row
      renderRow: (row: string[]): (number|null)[] => {
        return row.map(cell => {
          if(cell === ' ') return null
          else return Number(cell)
        })
      },
      // The info array of each sub-table will be passed to this
      // callback. Tagging can be fine tuned here.
      renderTags: (info: RawResultInfo[], tags: Tagger): void => {
        info.forEach((info, level) => {
          let cat
          // info.key contains the info string's original section
          // could be 'body' or 'info'
          if(info.key === 'body') {
            cat = 'vartitle'
          } else if(info.value.indexOf('- ') === 0) {
            cat = 'measure'
          } else if(level === 0) {
            // We strip the table separator and pass CountryName
            info.value = info.value.replace('Table Separator – ', '')
            cat = 'country'
          } else if(level === 1) {
            cat = 'variable'
          } else if(level === 2) {
            cat = 'questionText'
          }
          if(cat) {
            tags.push(cat, info.value)
          }
        })
      },
    }
    
    const parse = new Gesstabs(config)
    const file = `${__dirname}/data/test-data.xlsx`
    const datasheets = await parse.fromXlsx(filename)
    const summary = await store.run(datasheets)

    Intermediate JSON

    Xlsx-Parser will tranform tabular data into an intermediate JSON object. The closer your input data comes to this format, the easier it will be to implement a new parser type.

    {
      "tags": [
        {
          "category": "country",
          "value": "Norway"
        },
        {
          "category": "variable",
          "value": "Q12"
        },
        {
          "category": "category",
          "value": "Bar soap"
        },
        {
          "category": "subgroup",
          "value": "Age"
        },
        {
          "category": "measure",
          "value": "nominal"
        }
      ],
      "columns": ["Total", "19-29", "30-39", "40-69"],
      "rows": ["answer 1", "answer 2", "answer 3"],
      "data": [
        [29,18,36,12],
        [39,19,24,11],
        [19,28,46,10]
      ],
      "meta": {
        "significance": [
          [null,null,"h",null],
          [null,"h","l",null],
          [null,null,"h","l"]
        ]
      }
    }

    Query Data

    As all the Sheets are tagged, our queries will use tags to find the desired datasets.

    import { getData, Store } from '../src';
    import { all } from '../src/filter';
    import { value } from '../src/cell';
    
    // A selector is an array of tags.
    const selector = [
        {
          category: 'country',
          value: 'Norway'
        },
        {
          category: "variable",
          value: "Q12"
        }
    ]
    
    // The grid will define rows, cols and a callback
    // to run inside a target cell.
    const grid = {
        rows: all('row'),
        columns: all('column'),
        cell: value
    }
    
    const result = await getData(selector, grid)
    
    // automizer-data will convert the result directly into
    // a pptx-automizer-object. 
    const chartData = result.toSeriesCategories()

    Keywords

    none

    Install

    npm i automizer-data

    DownloadsWeekly Downloads

    3

    Version

    0.1.2

    License

    MIT

    Unpacked Size

    295 kB

    Total Files

    112

    Last publish

    Collaborators

    • singerla