This package has been deprecated

Author message:

Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.

obsidian-text-extract
TypeScript icon, indicating that this package has built-in type declarations

1.0.4 • Public • Published

Obsidian Text Extract Library

Work In Progress - Use with care, seriously.

What is this?

A library, designed for Obsidian plugins, to extract text from PDFs and images. It works by sharing a common cache and pool of workers between all library users.

It is currently used in Omnisearch

How does it work?

Since extracting text from PDFs and images takes a lot of resource, the main idea of this library is to make a globally available pool of workers, shared among all Obsidian plugins that wish to use it. As such, it is important to not change the namespace or indexedDB database name. Doing so would put an unnecessary strain on Obsidian that could crash it, and more generally will waste the device's resources. Be responsible.

Installation & Usage

First, install it with a fixed version:

"dependencies": {
    "obsidian-text-extract": "1.0.3"
}

(Yes I messed up with npm, and submitted the first version as 1.0.0. Sorry.)

To use it:

import { getPdfText, getImageText } from 'obsidian-text-extract'

async function getTextFromFile(
  file: TFile
): Promise<string> {
  let content: string
  if (file.path.endsWith('.pdf')) {
    content = await getPdfText(file)
  } else if (file.path.endsWith('.png')) {
    content = await getImageText(file)
  }
  return content
}

Limitations

Text extraction does not work on mobile; calling the functions will just immediately return an empty string.

Build

You'll need Rust, wasm-pack, and pnpm.

$ pnpm i
$ pnpm run build

Rust is quite slow to compile, so the first build will take some time.

Readme

Keywords

none

Package Sidebar

Install

npm i obsidian-text-extract

Weekly Downloads

0

Version

1.0.4

License

GPL-3.0

Unpacked Size

1.27 MB

Total Files

5

Last publish

Collaborators

  • scambier