@vtfk/pdf-splitter

1.1.1 • Public • Published

pdf-splitter

NodeJS package for splitting pdfs, based on given ranges or keywords. Uses PDFtk and node-pdftk for splitting, and PDF.js for pdf-text-reading

Requirements

Make sure you have PDFtk installed. Save the path to the executable as an environment variable "PDFTK_EXT".

For example in .env

PDFTK_EXT="<installationPath>/PDFtk/bin/pdftk"

Installing

$ npm install @vtfk/pdf-splitter

Usage

With array of page-ranges

Specify which pages you want to split into new documents

Description Value
Page one and three as separate documents ['1', '3']
Page one to four (inclusive) as doc and page three, six, and eight to ten (inclusive) as doc ['1-4', '3 6 8-10']
const splitPdf = require('@vtfk/pdf-splitter')

const pdfToSplit = {
    pdfPath: 'a pdf.pdf',
    ranges: ['1-4', '3 6 8-10', '4 2'],
    outputDir: 'path/to/outputDirectory', // Optional, defaults to directory of the input pdf
    outputName: 'nameForResultingPdfs' // Optional, defaults to the <NameOfPdf>-<index>.pdf
}

const result = await splitPdf(pdfToSplit)
console.log(result)

With array of keywords/sentences

Specify on which keywords/sentences you want to split the document on (EVERY word/sentence must be present for it to split on that page - see option "orKeywords" for the SOME instead of EVERY)

NOTE: At least one keyword or sentence must be unique for the document

const splitPdf = require('@vtfk/pdf-splitter')

const pdfToSplit = {
    pdfPath: 'a pdf.pdf',
    keywords: ['a unique sentence for the page you want to split on', 'word', 'another'],
    outputDir: 'path/to/outputDirectory', // Optional, defaults to directory of the input pdf
    outputName: 'nameForResultingPdfs' // Optional, defaults to the <NameOfPdf>-<index>.pdf
}

const result = await splitPdf(pdfToSplit)
console.log(result)

Options

options.onlyPagesWithKeywords

Only return the pages where the keywords are present as separate documents

const splitPdf = require('@vtfk/pdf-splitter')

const pdfToSplit = {
    pdfPath: 'a pdf.pdf',
    keywords: ['a unique sentence for the page you want to split on', 'word', 'another'],
    outputDir: 'path/to/outputDirectory', // Optional, defaults to directory of the input pdf
    outputName: 'nameForResultingPdfs', // Optional, defaults to the <NameOfPdf>-<index>.pdf
    onlyPagesWithKeywords: true
}

const result = await splitPdf(pdfToSplit)
console.log(result)

options.orKeywords Only require ONE of the keywords to be present on the page, for it to split on that page

const splitPdf = require('@vtfk/pdf-splitter')

const pdfToSplit = {
    pdfPath: 'a pdf.pdf',
    keywords: ['a unique sentence for the page you want to split on', 'word', 'another'], // will split if one of these are present on the page
    outputDir: 'path/to/outputDirectory', // Optional, defaults to directory of the input pdf
    outputName: 'nameForResultingPdfs', // Optional, defaults to the <NameOfPdf>-<index>.pdf
    orKeywords: true // Optional, defaults to false
}

const result = await splitPdf(pdfToSplit)
console.log(result)

Dependents (0)

Package Sidebar

Install

npm i @vtfk/pdf-splitter

Weekly Downloads

30

Version

1.1.1

License

MIT

Unpacked Size

234 kB

Total Files

16

Last publish

Collaborators

  • robinelli
  • karl-einarb
  • maccyber
  • zrrrzzt
  • sherex
  • matsand
  • runely
  • jorgtho