pdf-template-parse

    0.0.5 • Public • Published

    pdf-template-parse GitHub license PRs Welcome Join the chat at https://gitter.im/pdf-template-parse/community

    A JavaScript frontend cross-browser compatible 'PDF parser w/ template engine' to convert pdf documents into organized data objects.

    Live Demo: Click Here

    Install

    Install with npm:

    npm install pdf-template-parse

    Install with yarn:

    yarn add pdf-template-parse

    Introduction

    This module exposes two functions:

    1 - pdfParse (character & location extraction)

    import { pdfParse } from 'pdf-template-parse';

    pdfParse takes a pdf file and returns a promise. Promise resolves all the character data (character code, text, x, y, width) found in the provided document allowing the user to process the raw data themselves.

    2 - pdfTemplateParse (character extraction & templating)

    import pdfTemplateParse from 'pdf-template-parse';

    pdfTemplateParse takes a pdf file and a template file and returns a promise. Promise resolves all the values / tables declared in the template file. (see example below for sample template file)

    Example Usage

    Example 1: helloWorldDemo.pdf

    sample pdf download: helloWorldDemo.pdf

    import { pdfParse } from 'pdf-template-parse';
    import pdf from './samplePdf/helloWorldDemo.pdf';
     
    const characterData = pdfParse(pdf);
    console.log({ characterData });

    Output: (console screenshot) example one console screenshot

    ** Note: the promise will not resolve if the browser tab is not visible.

    Example 2: helloWorldDemo.pdf w/ template file

    Template file: helloWorldDemo.json

    {
      "captureList": [
        {
          "name": "1",
          "type": "value",
          "rules": {
            "all": {
              "bounds": {
                "top": 220,
                "left": 70,
                "bottom": 230,
                "right": 140
              }
            }
          }
        },
        {
          "name": "2",
          "type": "value",
          "rules": {
            "all": {
              "bounds": {
                "top": 220,
                "left": 150,
                "bottom": 230,
                "right": 200
              }
            }
          }
        },
        {
          "name": "1+2",
          "type": "value",
          "rules": {
            "all": {
              "bounds": {
                "top": 220,
                "left": 70,
                "bottom": 230,
                "right": 200
              }
            }
          }
        }
      ]
    }

    Code:

    import pdfTemplateParse from 'pdf-template-parse';
    import pdf from './samplePdf/helloWorldDemo.pdf';
    import template from './sampleFile/helloWorldDemo.json';
     
    const data = pdfTemplateParse(pdf, template);
    console.log({ data });

    Output: (console screenshot)

    example two console screenshot

    ** Note: the promise will not resolve if the browser tab is not visible.

    Todo

    • Add tests
    • Replace char_offset option with character map detection
    • Add value validation.
    • Add template validation.
    • Add node support (either remove canvas dependency or add node canvas package)

    Authors

    License 📄

    This project is licensed under the MIT License - see the LICENSE file for details

    Install

    npm i pdf-template-parse

    DownloadsWeekly Downloads

    3

    Version

    0.0.5

    License

    MIT

    Unpacked Size

    77.1 kB

    Total Files

    10

    Last publish

    Collaborators

    • tomrule007