A browser-friendly library for running large language models (LLMs) directly in the browser using Wllama. This library provides a simple interface to load .gguf
or .bin
models (e.g., from Hugging Face) and generate text completions, including streaming token support.
- Plug-and-Play: Easy to integrate into your web projects.
-
Local or Remote Models: Load a URL from Hugging Face or pass local
File
objects. -
Token-by-Token Streaming: Handle partial results in real-time via
onNewToken
callback. - Templates: Leverages Jinja to format chat-based prompts.
- Lightweight: Bundles a minimal set of dependencies.
npm install browser-llm-engine
Or with Yarn:
yarn add browser-llm-engine
import { createLlmEngine, CHAT_ROLE, PRESET_MODELS } from 'browser-llm-engine';
(async () => {
// 1) Create an engine instance
const llm = createLlmEngine({
// Optional: provide custom WASM paths or config
wasmPaths: {}
});
// 2) Load a preset model from the library
const modelUrl = PRESET_MODELS["SmolLM2 (360M)"].url;
await llm.loadModel(modelUrl, {
progressCallback: (progress) => console.log(`Loading: ${progress}%`),
});
// 3) Generate a completion
const result = await llm.createCompletion("Hello from the browser!");
console.log("Full model response:", result);
// 4) Clean up
await llm.exit();
})();
That’s it! You have a working LLM in the browser.
To get partial tokens as they are generated, supply an onNewToken
callback:
const llm = createLlmEngine();
await llm.loadModel(PRESET_MODELS["SmolLM2 (360M)"].url);
let outputSoFar = "";
await llm.createCompletion("What's the weather today?", {
nPredict: 128,
sampling: { temp: 0.7, penalty_repeat: 1.1 },
onNewToken: (token) => {
outputSoFar += token;
console.log("Streamed token:", token);
}
});
console.log("Final streamed output:", outputSoFar);
If you want to load the model from your local machine:
<input type="file" id="modelFile" multiple />
<script type="module">
import { createLlmEngine } from 'browser-llm-engine';
const fileInput = document.getElementById("modelFile");
const llm = createLlmEngine();
fileInput.addEventListener("change", async () => {
try {
// fileInput.files is a FileList
await llm.loadModel(fileInput.files);
console.log("Model loaded locally!");
} catch (error) {
console.error("Failed to load local model:", error);
}
});
</script>
The library includes a models.json
with references to a few hosted models. You can get them via:
import { PRESET_MODELS } from 'browser-llm-engine';
console.log("Available models:", PRESET_MODELS);
Feel free to add or remove entries if you fork this library.
Creates a new engine instance.
-
Parameters:
-
config
(Object) – Optional configuration, e.g.{ wasmPaths: { ... } }
.
-
Loads the model from either a remote URL or local File
objects.
-
Parameters:
-
source
(String | File[] | FileList) – The source of the model. -
options
(Object) – Additional load options:-
progressCallback
(function):(progress) => {}
for tracking loading progress -
useCache
(Boolean): Cache the model for faster reloads -
allowOffline
(Boolean): If false, tries to fetch from network
-
-
Takes an array of messages (each with role
and content
) and formats them into a single prompt with Jinja.
Creates the text completion for a given prompt
.
-
Parameters:
-
prompt
(String) – The text to generate from. -
options
(Object) – Fine-tuning generation:-
nPredict
(Number) – Maximum tokens to predict (default 512) -
sampling
(Object) – e.g.{ temp: 0.7, penalty_repeat: 1.1 }
-
onNewToken
(function) – A callback for streaming tokens
-
-
Cleans up resources used by Wllama.
-
Example:
await llm.exit();
If you want to develop locally:
- Clone the repo:
git clone https://github.com/you/browser-llm-engine.git cd browser-llm-engine
- Install dependencies:
npm install
- Build the library:
This will create
npm run build
dist/
with both ESM and CJS bundles. -
(Optional) Start a dev server (if you add a script in
package.json
):npm run dev
- Open
index.html
(or any dev test page) in your browser to play around with the library.
This project is released under the MIT License. Feel free to fork, adapt, and contribute!
Happy coding and enjoy using your LLM in the browser!