llm-html-compressor
TypeScript icon, indicating that this package has built-in type declarations

1.0.0 • Public • Published

llm-html-compressor

npm version License: MIT

A specialized HTML compressor designed to optimize HTML content for use as context with Large Language Models (LLMs). Removes unnecessary whitespace, comments, and other "noise" from HTML documents to make them more suitable for LLM processing while preserving semantically important content.

Installation

npm install llm-html-compressor
# or
yarn add llm-html-compressor

Why use llm-html-compressor?

When using HTML content as context for LLMs, unnecessary elements like whitespace, comments, and certain attributes can:

  1. Consume token quota without adding value
  2. Add noise that makes it harder for the LLM to focus on important content
  3. Increase the chance of context truncation

This library provides targeted optimizations specifically designed for LLM context usage, distinct from traditional HTML minifiers which focus on network transmission size.

Usage

Basic Usage

import { compress } from 'llm-html-compressor';

const html = `
<!DOCTYPE html>
<html>
  <!-- This is a comment -->
  <head>
    <title>Example</title>
    <style>
      body { font-family: Arial, sans-serif; }
    </style>
  </head>
  <body>
    <div class="container" id="main">
      <h1>Hello, World!</h1>
      <p style="color: blue; font-size: 16px;">This is an example.</p>
    </div>
    <script>
      console.log('Hello');
    </script>
  </body>
</html>
`;

const compressed = compress(html);
console.log(compressed);
// Output: <!DOCTYPE html><html><head><title>Example</title><style>body { font-family: Arial, sans-serif; }</style></head><body><div class="container" id="main"><h1>Hello, World!</h1><p style="color:blue;font-size:16px;">This is an example.</p></div><script>console.log('Hello');</script></body></html>

Advanced Usage with Custom Options

import { createCompressor } from 'llm-html-compressor';

const compressor = createCompressor({
  removeComments: true,
  collapseWhitespace: true,
  removeEmptyAttributes: true,
  removeStyleTags: true,
  removeScriptTags: true,
  preserveLineBreaks: false,
  removeDataAttributes: true,
  removeHiddenElements: true,
  minifyInlineCSS: true,
  removeClassAttributes: true,
  removeIdAttributes: false
});

const html = `... your HTML here ...`;
const compressed = compressor.compress(html);

API

Functions

compress(html: string): string

Compresses HTML using default options.

createCompressor(options?: Partial<CompressionOptions>): HtmlCompressor

Creates a compressor instance with custom options.

Classes

HtmlCompressor

The main compressor class that can be instantiated directly.

import { HtmlCompressor } from 'llm-html-compressor';

const compressor = new HtmlCompressor(options);
const result = compressor.compress(html);

CompressionOptions

Option Type Default Description
removeComments boolean true Removes HTML comments
collapseWhitespace boolean true Collapses multiple whitespace characters into a single space
removeEmptyAttributes boolean true Removes attributes with empty values
removeStyleTags boolean false Removes <style> tags and their content
removeScriptTags boolean false Removes <script> tags and their content
preserveLineBreaks boolean false Preserves line breaks when collapsing whitespace
removeDataAttributes boolean false Removes data-* attributes
removeHiddenElements boolean false Removes elements with display:none or hidden attribute
minifyInlineCSS boolean false Minifies inline CSS in style attributes
removeClassAttributes boolean false Removes class attributes
removeIdAttributes boolean false Removes id attributes

Use Cases

  • Preprocessing HTML for RAG (Retrieval-Augmented Generation) systems
  • Optimizing web page content for chatbots and assistants
  • Reducing token usage when working with HTML documentation
  • Making HTML content more digestible for code analysis with LLMs

License

MIT

Package Sidebar

Install

npm i llm-html-compressor

Weekly Downloads

0

Version

1.0.0

License

MIT

Unpacked Size

18.9 kB

Total Files

7

Last publish

Collaborators

  • tkattkat