tesseract.js-node

0.1.0 • Public • Published

tesseract.js-node

A focused node-only version of tesseract.js.

Why?

tesseract.js is developed for both node and browser, and includes (in my opinion) bloated functionality like automatic downloading of traineddata-files in the background.

At the time of writing, it also does not have any tests for node-environment (only browser). Example issue where this matters: https://github.com/naptha/tesseract.js/issues/339.

I just wanted a way to use Tesseract 4.0 in a node project without all this extra functionality and background downloads from third-party servers.

Usage

Download traineddata-files from somewhere, e.g. officially:

mkdir tessdata
cd tessdata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/fin.traineddata

Then use the library in a node project:

const getWorker = require('tesseract.js-node');
const worker = await getWorker({
  tessdata: '/path/to/tessdata',    // where .traineddata-files are located
  languages: ['eng', 'fin']         // languages to load
});
const text = await worker.recognize('/path/to/image', 'eng');

You can supply the input image in various ways:

// path to image
const text = await worker.recognize('/path/to/image', 'eng');
// Buffer
const text = await worker.recognize(fs.readFileSync('/path/to/image'), 'eng');
// Buffer (from node-canvas)
const text = await worker.recognize(canvas.toBuffer('image/png'), 'eng');

See tesseract.test.js for other examples.

Development

npm test

Useful resources:

Credits

Thanks to tesseract.js-core contributors for the groundwork!

License

Apache License 2.0

Readme

Keywords

none

Package Sidebar

Install

npm i tesseract.js-node

Weekly Downloads

26

Version

0.1.0

License

Apache License 2.0

Unpacked Size

12 MB

Total Files

9

Last publish

Collaborators

  • codeclown