doc-textify
TypeScript icon, indicating that this package has built-in type declarations

1.0.2 • Public • Published

npm version

Doc-Textify

Doc-Textify is a TypeScript library and command-line tool that extracts and cleans text from various document formats.

🚀 Features

  • Multi-format support:

    • Microsoft Word (.docx)
    • PowerPoint (.pptx)
    • Excel (.xlsx)
    • OpenOffice/LibreOffice (.odt, .odp, .ods)
    • PDF (.pdf)
    • Plain text (.txt)
    • HTML (.html, .htm)
  • Content cleaning: removes extra whitespace, handles custom line delimiters.

  • Configurable options: set newline delimiter, minimum characters to extract, and toggle error logging.

📦 Library Usage

Install the package and import it in your project:

npm install doc-textify --save
import { docTextify } from 'doc-textify'

// async/await version
try {
    const text = await docTextify('path/to/file.pdf')
} catch (e) {
    console.error(err)
}

// or callback version
docTextify('path/to/file.pdf')
    .then(text => console.log(text))
    .catch(err => console.error(err))

Default options:

try {
  const text = await docTextify('path/to/file.pdf', {
      newlineDelimiter: '\n', // output content delimiter
      minCharsToExtract: 0, // number of chars required to output the content, default disabled (0)
      outputErrorToConsole: true // log error to console
      })
  } catch (e) {
      console.error(err)
  }

🚀 CLI Usage (Optional)

If you prefer a ready-made command, the doc-textify CLI wraps the same functionality:

Installation

Global install to use the doc-textify command anywhere:

npm install -g doc-textify

Or install locally:

npm install doc-textify --save

Command

doc-textify <path/to/document> [options]

Options

Option Description Default
-n, --newlineDelimiter Line delimiter to insert "\n"
-m, --minCharsToExtract Minimum number of characters to extract 0 (disabled)
-h, --help Display help message

Example

doc-textify document.docx -n "\r\n" -m 20 > output.txt

📥 Installation from Source

git clone https://github.com/johaven/doc-textify.git
cd doc-textify
npm install
npm run build    # outputs compiled files into /dist
npm run test     # test parsing

🤝 Contributing

  1. Fork the repository
  2. Create a branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m "Add my feature"
  4. Push to your branch: git push origin feature/my-feature
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License.

Package Sidebar

Install

npm i doc-textify

Weekly Downloads

0

Version

1.0.2

License

MIT

Unpacked Size

24.3 kB

Total Files

25

Last publish

Collaborators

  • johaven