N8N node for processing PDF and Excel files with advanced features including OCR and form handling.
-
Basic Features:
- Extract text from PDF files
- Get metadata information
- Process files from path or binary data
-
Advanced Features (new):
- OCR text extraction using Tesseract.js
- PDF form field processing
- Support for multiple languages (English, Vietnamese)
- Memory efficient processing
- Read Worksheet: Read data from specific worksheet
- Get Worksheets: List all worksheets in file
- Multiple Formats: Support for .xls and .xlsx formats
- Data Validation: Basic data validation for cell values
- Node.js v18+
- n8n v1.0+
- TypeScript v5.0+
New dependencies:
- Tesseract.js for OCR
- pdf-lib for form processing
- Go to Settings > Community Nodes
- Select Install a node from npm registry
- Enter
n8n-nodes-pdf-excel
- Click Install
# Install with dependencies
npm install n8n-nodes-pdf-excel tesseract.js pdf-lib
# Or link for development
npm link n8n-nodes-pdf-excel
- Add "PDF & Excel Processor" node
- Select "PDF" as file type
- Choose operation:
- Extract Text
- Get Metadata
- Provide file path or binary data
- Execute node
- Add "PDF & Excel Processor" node
- Select "PDF Advanced" as file type
- Choose operation:
- Extract Text with OCR
- Process Form Fields
- Optional: Configure OCR settings
- Execute node
- Add "PDF & Excel Processor" node
- Select "Excel" as file type
- Choose operation:
- Read Worksheet
- Get Worksheets
- For worksheet reading:
- Specify sheet name (optional)
- Execute node
git clone https://github.com/your-repo/n8n-nodes-pdf-excel.git
cd n8n-nodes-pdf-excel
npm install
npm run build
npm test
npm run lint
- [x] Basic PDF text extraction
- [x] Basic Excel data reading
- [x] Advanced PDF features (OCR, forms)
- [ ] Advanced Excel features (formulas, styling)
- [x] Performance optimizations
MIT
Contributions are welcome! Please read our contributing guidelines for details.