Obsidian Text Extract Library

Work In Progress - Use with care, seriously.

What is this?

A library, designed for Obsidian plugins, to extract text from PDFs and images. It works by sharing a common cache and pool of workers between all library users.

It is currently used in Omnisearch

How does it work?

Since extracting text from PDFs and images takes a lot of resource, the main idea of this library is to make a globally available pool of workers, shared among all Obsidian plugins that wish to use it. As such, it is important to not change the namespace or indexedDB database name. Doing so would put an unnecessary strain on Obsidian that could crash it, and more generally will waste the device's resources. Be responsible.

Installation & Usage

First, install it with a fixed version:

"dependencies": {
    "obsidian-text-extract": "1.0.3"
}

(Yes I messed up with npm, and submitted the first version as 1.0.0. Sorry.)

To use it:

import { getPdfText, getImageText } from 'obsidian-text-extract'

async function getTextFromFile(
  file: TFile
): Promise<string> {
  let content: string
  if (file.path.endsWith('.pdf')) {
    content = await getPdfText(file)
  } else if (file.path.endsWith('.png')) {
    content = await getImageText(file)
  }
  return content
}

Limitations

Text extraction does not work on mobile; calling the functions will just immediately return an empty string.

Build

You'll need Rust, wasm-pack, and pnpm.

$ pnpm i
$ pnpm run build

Rust is quite slow to compile, so the first build will take some time.

obsidian-text-extract

Obsidian Text Extract Library

What is this?

How does it work?

Installation & Usage

Limitations

Build

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

obsidian-text-extract

Obsidian Text Extract Library

What is this?

How does it work?

Installation & Usage

Limitations

Build

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads