llm-stream-processor-js

A lightweight utility for processing streaming responses from Large Language Models (LLMs), with special handling for <think> blocks and content parsing.

Features

Process streaming LLM responses with callback-based event handling
Intelligently parse and separate <think> blocks from final content
Automatic JSON detection in content responses
Support for chunk prefixes and end delimiters common in SSE streams
Zero dependencies
Works directly in the browser without bundling
TypeScript declarations included

Installation

Direct inclusion in HTML

<script src="https://cdn.jsdelivr.net/gh/mingzilla/llm-stream-processor-js@latest/llm-stream-processor.js"></script>

TypeScript Support for Direct Inclusion

If you're using TypeScript with direct script inclusion, you can reference the type definitions in one of these ways:

Download the definition file and place it in your project, then reference it in your tsconfig.json:

{
  "compilerOptions": {
    "typeRoots": ["./typings", "./node_modules/@types"]
  }
}

And create a folder structure:

your-project/
├── typings/
│   └── llm-stream-processor-js/
│       └── index.d.ts  // Copy contents from llm-stream-processor.d.ts

Reference the declaration file directly using a triple-slash directive:
```
/// <reference path="./typings/llm-stream-processor.d.ts" />
```

Use the CDN for the declaration file:

// In your TypeScript file
declare module 'llm-stream-processor-js';
// Then add a reference in your HTML
// <script src="https://cdn.jsdelivr.net/gh/mingzilla/llm-stream-processor-js@latest/llm-stream-processor.js"></script>

NPM

npm install @mingzilla/llm-stream-processor-js

Usage

Basic Usage with Streaming API

It works well with api-client-js.

// Create a stream processor instance
const processor = LlmStreamProcessor.createInstance({
  chunkPrefix: "data: ",  // Optional: Strip this prefix from each chunk (common in SSE)
  endDelimiter: "[DONE]"  // Optional: String that signals the end of the stream
});

// Process streaming response from an LLM API
let contentWithoutThinkBlock;
ApiClient.stream(
  ApiClientInput.postJson('https://api.example.com/llm/generate', {
    prompt: "Explain quantum computing. <think>I should start with the basics.</think>"
  }, {
    'Accept': 'text/event-stream'
  }),
  () => console.log('Stream started'), // onStart
  (chunk) => {
    // Process each chunk through the LLM processor
    processor.processChunk(
      chunk,
      () => console.log('Processing started'),
      () => console.log('Think block started'),
      (thinkChunk) => console.log('Think chunk:', thinkChunk),
      (fullThinkText) => console.log('Think complete:', fullThinkText),
      () => console.log('Content started'),
      (contentChunk) => {
        console.log('Content chunk:', contentChunk);
        // Update UI with new content
        document.getElementById('response').innerText += contentChunk;
      },
      (fullContent, parsedJson) => {
        console.log('Content complete:', fullContent);
        contentWithoutThinkBlock = fullContent;
      },
      (fullThink, fullContent, parsedJson) => console.log('All complete'),
      (error) => console.error('Error:', error)
    );
  },
  (fullResponse) => {
    // When the stream is complete, finalize processing. This triggers 'Content complete' to be executed
    processor.finalize();
    // if you want to exclude the <think> block from the fullResonse, do the below
    fullResponse.body = contentWithoutThinkBlock;
    // ...
  },
  (error) => {
    processor.finalize(); // if you want the error case to also trigger completion.
    console.error('Stream error:', error);
  }
);

Handling Server-Sent Events (SSE)

Many LLM APIs use Server-Sent Events (SSE) for streaming. The processor can handle SSE format:

const processor = LlmStreamProcessor.createInstance({
  chunkPrefix: "data: ",  // Remove "data: " prefix from SSE events
  endDelimiter: "[DONE]"  // Common end signal in SSE streams
});

// Now process chunks as they come in...

Extracting JSON From Responses

The processor automatically attempts to parse JSON in the content:

processor.processChunk(
  chunk,
  // ...other callbacks...
  (fullContent, parsedJson) => {
    if (parsedJson) {
      // The response contained valid JSON
      console.log('Parsed JSON:', parsedJson);
      
      // For example, extracting choices from an OpenAI-like response
      if (parsedJson.choices && parsedJson.choices[0]) {
        const generatedText = parsedJson.choices[0].message.content;
        document.getElementById('response').innerText = generatedText;
      }
    }
  },
  // ...other callbacks...
);

API Reference