@pomgui/pdf-tables-parser
TypeScript icon, indicating that this package has built-in type declarations

0.1.0 • Public • Published

pdf-tables-parser

Library to extract text tables from pdf files.

Background (why)

Sometimes your server has to retrieve information from pdf files E.g. financial reports, where the information is inside tables (rows, columns).

However there's no an easy way to extract this information from Nodejs applications. All the alterantives I tried need an extra processing to get the tables I wanted, so finally I decided to create one of my own.

Demo

You can test online the library here

Installation

$ npm install -g @pomgui/pdf-tables-parser

Usage

const
    { PdfDocument } = require('@pomgui/pdf-tables-parser'),
    fs = require('fs');

const pdf = new PdfDocument();
pdf.load('report.pdf')
    .then(() => fs.writeFileSync('report.json', JSON.stringify(pdf, null, 2), 'utf8'))
    .catch(err => console.error(err));

Result Example

{
  "numPages": 1,
  "pages": [
    {
      "pageNumber": 1,
      "tables": [
        {
          "tableNumber": 1,
          "numrows": 65,
          "numcols": 3,
          "data": [
            ["name", "age", "amount"],
            ["John", "49", "150,000.00"],
            ["Mary", "25", "10,000.00"],
            ["..."]
          ]
        }
      ]
    }
  ]
}

Package Sidebar

Install

npm i @pomgui/pdf-tables-parser

Weekly Downloads

1,925

Version

0.1.0

License

BSD-3-Clause

Unpacked Size

22.5 kB

Total Files

15

Last publish

Collaborators

  • wpomier