text-from-pdf
TypeScript icon, indicating that this package has built-in type declarations

1.1.2 • Public • Published

PDF-TO-TEXT

A pdf to text wrapper to extract text from a pdf. It works with searchable and non-searchable(images) PDFs

PDF CI

Installation

npm install text-from-pdf

Mac Users

brew install poppler

Linux Users

sudo apt-get update && sudo apt-get install poppler-utils

Windows Users

No installation required

Usage

  1. Standard Input PDF with horizontally aligned text:
     const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>');
     console.log(text)
  2. Input PDF's with vertically aligned text:
     const options = {
       rotationDegree: -90,
     };
     $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
     $ console.log(text)
  3. Text from first and second page:
     const options = {
        firstPageToConvert: 1,
        lastPageToConvert: 2,
     };
     $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
     $ console.log(text)
  4. Text from third to fifth page:
     const options = {
        firstPageToConvert: 3,
        lastPageToConvert: 5,
     };
     $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
     $ console.log(text)
  5. Enable Progressbar logging:
     const options = {
        firstPageToConvert: 1,
        lastPageToConvert: 1,
        enableProgressBarLogging: true
     };
     $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
     $ console.log(text)

Features request

Fork, add your changes and create a pull request

Package Sidebar

Install

npm i text-from-pdf

Weekly Downloads

1,591

Version

1.1.2

License

Apache-2.0

Unpacked Size

26.6 kB

Total Files

7

Last publish

Collaborators

  • fasatrix