Neurological Phenomenon Multiplexer
    Have ideas to improve npm?Join in the discussion! »

    pdf-gold-digger

    0.1.1 • Public • Published

    pdf-gold-digger

    Pdf information extraction library based on pdf.js and node.js with various output formats.

    GitHub npm GitHub commits since tagged version GitHub last commit doc

    Install

    npm install -g pdf-gold-digger

    Usage

    pdfdig -i some_file.pdf

    Avaliable commands

    pdfdig -h
    ex. pdfdig -i input-file -o output_directory -f json
      
      --input    or  -i   pdf file location (required)
      --output   or  -o   pdf file location (optional default "out")
      --debug    or  -d   show debug information (optional - default "false")
      --format   or  -f   format (optional - default "text") - ("text,json,xml,html") 
      --font     or  -t   extract fonts as ttf files (optional)
      --password or  -p   password
      --help     or  -h   display this help message
      --version  or  -v   display version information

    Advanced usage

    git clone https://github.com/vane/pdf-gold-digger
    sh demo.sh

    and see results in out directory

    Documentation

    pdf-gold-digger

    Features:

    • extract text
      • separate each page
      • separate each line
      • separate font information
    • extract images
    • output formats
      • text -f text (default)
      • json -f json
      • xml -f xml
      • html -f html
    • specify output directory

    TODO:

    • load pdf from remote location
      • from url
    • output to markdown format
    • pack output to zip
    • extract tables
    • extract forms
    • extract drawings
    • extract text from glyphs
      • ability to provide input file for glyph path to letter
      • detect when unicode is not provided or mangled
      • get bounding box from text and draw it on canvas
      • use tesseract.js as optional fallback

    Install

    npm i pdf-gold-digger

    DownloadsWeekly Downloads

    3

    Version

    0.1.1

    License

    MIT

    Unpacked Size

    64.1 kB

    Total Files

    34

    Last publish

    Collaborators

    • avatar