Run AI models locally on your machine
Pre-built bindings are provided with a fallback to building from source with cmake✨ New! Try the beta of version 3.0.0
✨ (included: function calling, automatic chat wrapper detection, embedding support, and more)
- Run a text generation model locally on your machine
- Metal and CUDA support
- Pre-built binaries are provided, with a fallback to building from source without
node-gyp
or Python - Chat with a model using a chat wrapper
- Use the CLI to chat with a model without writing any code
- Up-to-date with the latest version of
llama.cpp
. Download and compile the latest release with a single CLI command. - Force a model to generate output in a parseable format, like JSON, or even force it to follow a specific JSON schema
npm install --save node-llama-cpp
This package comes with pre-built binaries for macOS, Linux and Windows.
If binaries are not available for your platform, it'll fallback to download the latest version of llama.cpp
and build it from source with cmake
.
To disable this behavior set the environment variable NODE_LLAMA_CPP_SKIP_DOWNLOAD
to true
.
import {fileURLToPath} from "url";
import path from "path";
import {LlamaModel, LlamaContext, LlamaChatSession} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const model = new LlamaModel({
modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
const q2 = "Summarize what you said";
console.log("User: " + q2);
const a2 = await session.prompt(q2);
console.log("AI: " + a2);
For more examples, see the getting started guide
To contribute to node-llama-cpp
read the contribution guide.
- llama.cpp: ggerganov/llama.cpp