LLM XLSX Parser

A Node.js module that converts Excel (XLSX) files into LLM-friendly formats. This package transforms spreadsheet data into multiple representations (visual images, CSV, and structured records) that Large Language Models can easily understand and process.

Why This Package Exists

LLMs struggle with tabular data for three key reasons:

Poor Spatial Awareness: These models have difficulty reading information side-to-side and are much better at processing data from top to bottom.
Pattern Recognition: LLMs excel at recognizing patterns. The repeating record structure this package creates reinforces the model's inherent pattern-matching abilities.
Distance Problem: In traditional tables, there's often significant distance between column values (especially in row 100+) and their headers. By formatting data as records, every value is immediately paired with its column name.

Inspiration: This package was inspired by discussions in the OpenAI community about how to format Excel files best for API ingestion, where developers shared techniques for making tabular data more LLM-friendly.

Features

📊 Multi-format Conversion: Transforms XLSX files into images, CSV, and structured records
🧠 LLM-Optimized: Formats data specifically for optimal LLM comprehension
🖼️ Visual Processing: Generates images to help LLMs understand spatial relationships
⚙️ Configurable: Customizable image generation and processing options
📦 NPM Module: Easy to integrate into existing projects

Installation

npm install llm-xlsx-parser

Setup

Get a Google Gemini API key from Google AI Studio
Set up your environment:

# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Or pass the API key directly in the options.

Note: While any Gemini model can be used, this author has only had success with gemini-2.5-pro.

How It Works

The package converts traditional spreadsheet data like this:

name	age	favorite color
Steve	56	red
Ava	1	pink
Donna	50	purple

Into this LLM-friendly format:

name: Steve
age: 56
favorite color: red

name: Ava
age: 1
favorite color: pink

name: Donna
age: 50
favorite color: purple

This transformation makes it much easier for LLMs to:

Understand the relationship between values and their column names
Process data in a top-to-bottom reading pattern
Recognize the repeating record structure

Usage

Basic Usage

import parseXlsx from "llm-xlsx-parser";

const result = await parseXlsx(
  "path/to/your/file.xlsx",
  "output/formatted-data.txt"
);

console.log(result); // LLM-formatted data

Image Output Mode

import parseXlsx from "llm-xlsx-parser";

// Generate only an image (no LLM processing)
const imagePath = await parseXlsx(
  "data/spreadsheet.xlsx",
  "output/spreadsheet-image.png",
  {
    outputImage: true, // Enable image output mode
    maxRows: 100,
    maxCols: 50,
    fontSize: 12,
    cellPadding: 4,
  }
);

console.log(`Image saved to: ${imagePath}`);

Advanced Usage with Options

import parseXlsx from "llm-xlsx-parser";

const result = await parseXlsx(
  "data/spreadsheet.xlsx",
  "output/formatted-data.txt",
  {
    maxRows: 100, // Maximum rows to process for image
    maxCols: 50, // Maximum columns to process for image
    viewportWidth: 1920, // Browser viewport width for image
    viewportHeight: 1080, // Browser viewport height for image
    fontSize: 10, // Font size for image generation
    cellPadding: 4, // Cell padding for image generation
    fullPage: true, // Capture full page screenshot
    outputImage: false, // Set to true for image output mode
    geminiApiKey: "your-key", // API key (if not in environment)
    systemPrompt: "Custom formatting prompt...", // Custom system prompt
  }
);

API Reference

`parseXlsx(xlsxPath, outputPath, options)`

Converts an XLSX file into LLM-friendly formats.

Parameters

xlsxPath (string): Path to the XLSX file to convert
outputPath (string): Path where the formatted data will be saved
options (object, optional): Configuration options

Options

Option	Type	Default	Description
`maxRows`	number	`50`	Maximum rows to process for image generation
`maxCols`	number	`40`	Maximum columns to process for image generation
`viewportWidth`	number	`1920`	Browser viewport width for image generation
`viewportHeight`	number	`1080`	Browser viewport height for image generation
`fontSize`	number	`8`	Font size for image generation
`cellPadding`	number	`2`	Cell padding for image generation
`fullPage`	boolean	`true`	Whether to capture full page screenshot
`outputImage`	boolean	`false`	Output image as primary result (skips LLM)
`geminiApiKey`	string	`process.env.GEMINI_API_KEY`	Gemini API key (not needed for image mode)
`systemPrompt`	string	Built-in prompt	Custom system prompt for formatting

Returns

Promise: The LLM-formatted data (LLM mode) or image file path (image mode)

Throws

Error: If Gemini API key is missing or invalid
Error: If XLSX file cannot be read
Error: If image generation fails

Example

Run the included example:

# Clone this repository
git clone https://github.com/your-username/llm-xlsx-parser.git
cd llm-xlsx-parser

# Install dependencies
npm install

# Set up your API key
echo "GEMINI_API_KEY=your_api_key_here" > .env

# Run the example
npm run example

Output Modes

The package supports two primary output modes:

1. LLM Analysis Mode (Default)

Processes the spreadsheet through Google Gemini AI and outputs formatted text analysis. This mode provides data in three formats:

📋 Structured Records: Key-value pairs for each row (primary format)
📊 CSV Data: Traditional comma-separated values
🖼️ Visual Image: Screenshot of the spreadsheet for spatial context (used internally)

This multi-format approach ensures LLMs can understand both the data content and its spatial relationships.

2. Image Output Mode

Generates and saves a visual image of the spreadsheet as the primary output. This mode:

🖼️ Creates a PNG image of the spreadsheet data
⚡ Skips LLM processing for faster execution
💰 No API costs - doesn't require Gemini API key
🎨 Highly customizable image generation options

Both modes can be used independently or combined based on your needs.

Processing Steps

LLM Analysis Mode

📖 File Reading: Reads the XLSX file and extracts data
🔄 Format Conversion: Converts data to CSV and structured records
🖼️ Image Generation: Creates a visual representation using Playwright
📤 LLM Processing: Sends all formats to Gemini for formatting
📝 Output: Returns LLM-optimized data format
🧹 Cleanup: Removes temporary files

Image Output Mode

📖 File Reading: Reads the XLSX file and extracts data
🖼️ Image Generation: Creates a visual representation using Playwright
💾 Save Image: Saves the image to the specified output path
📝 Output: Returns the image file path

Why Use Multiple Formats?

Records: Optimal for LLM processing and understanding
CSV: Familiar format for data validation and backup
Image: Helps LLMs understand complex layouts and spatial relationships

This combination addresses the limitations of traditional tabular data presentation to AI models.

Requirements

Node.js 18+ (ES modules support)
Google Gemini API key
Internet connection for processing

Dependencies

@google/genai - Google Gemini AI integration
xlsx - Excel file parsing
canvas - Image rendering
playwright - Browser automation for screenshots
dotenv - Environment variable management

Error Handling

The module includes comprehensive error handling:

try {
  const result = await parseXlsx("file.xlsx", "output.txt");
  console.log("Success:", result);
} catch (error) {
  if (error.message.includes("Gemini API key")) {
    console.error("API key issue:", error.message);
  } else if (error.message.includes("XLSX")) {
    console.error("File reading issue:", error.message);
  } else {
    console.error("General error:", error.message);
  }
}

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

ISC

Support

For issues and questions:

Create an issue on GitHub
Check the documentation
Review the example code

Note: This module uses Google Gemini AI for processing and requires an API key. The package is designed to make spreadsheet data more accessible to LLMs by addressing their spatial processing limitations.

llm-xlsx-parser

LLM XLSX Parser

Why This Package Exists

Features

Installation

Setup

How It Works

Usage

Basic Usage

Image Output Mode

Advanced Usage with Options

API Reference

parseXlsx(xlsxPath, outputPath, options)

Parameters

Options

Returns

Throws

Example

Output Modes

1. LLM Analysis Mode (Default)

2. Image Output Mode

Processing Steps

LLM Analysis Mode

Image Output Mode

Why Use Multiple Formats?

Requirements

Dependencies

Error Handling

Contributing

License

Support

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

`parseXlsx(xlsxPath, outputPath, options)`

Weekly Downloads