Official Python client library for Moondream, a fast multi-function VLM. This client can target either the Moondream Cloud or a Moondream Server. Both are free, though the cloud has a limits on the free tier.
Moondream goes beyond the typical VLM "query" ability to include more visual functions. These include:
- caption: Generate descriptive captions for images
- query: Ask questions about image content
- detect: Find bounding boxes around objects in images
- point: Identify the center location of specified objects in images
You can try this out anytime on Moondream's playground.
Install the package using npm:
npm install moondream
Choose how you want to run it:
- Moondream Cloud: (with 5,000 free requests/day): get a free API key from the Moondream cloud console.
- Moondream Server: Run it locally by installing and running the Moondream server.
Once you've done at least one of these, try running this code:
import { vl } from "moondream";
import fs from "fs";
// For Moondream Cloud
const model = new vl({
apiKey: "<your-api-key>",
});
// ...or a local Moondream Server
const model = new vl({
endpoint: "http://localhost:2020/v1",
});
// Read an image file
const image = fs.readFileSync("path/to/image.jpg");
// Basic usage examples
async function main() {
// Generate a caption for the image
const caption = await model.caption({
image: image,
length: "normal",
stream: false
});
console.log("Caption:", caption);
// Ask a question about the image
const answer = await model.query({
image: image,
question: "What's in this image?",
stream: false
});
console.log("Answer:", answer);
// Stream the response
const stream = await model.caption({
image: image,
length: "normal",
stream: true
});
for await (const chunk of stream.caption) {
process.stdout.write(chunk);
}
}
main();
// Cloud inference
const model = new vl({
apiKey: "your-api-key",
});
// Local inference
const model = new vl({
endpoint: "http://localhost:2020/v1",
});
Generate a caption for an image.
const result = await model.caption({
image: image,
length: "normal",
stream: false
});
// Generate a caption with streaming (default: False)
const stream = await model.caption({
image: image,
length: "normal",
stream: true
});
Ask a question about an image.
const result = await model.query({
image: image,
question: "What's in this image?",
stream: false
});
// Ask a question with streaming (default: False)
const stream = await model.query({
image: image,
question: "What's in this image?",
stream: true
});
Detect specific objects in an image.
const result = await model.detect({
image: image,
object: "car"
});
Get coordinates of specific objects in an image.
const result = await model.point({
image: image,
object: "person"
});
Encodes an image provided as a Buffer
or a Base64EncodedImage
into a Base64-encoded JPEG. If the image is already in Base64 format, the method returns it unchanged.
const encodedImage = await model.encodeImage(imageBuffer);
- Buffer: Raw binary image data
- Base64EncodedImage: An object in the format
{ imageUrl: string }
, whereimageUrl
contains a Base64-encoded image
- CaptionOutput:
{ caption: string | AsyncGenerator }
- QueryOutput:
{ answer: string | AsyncGenerator }
- DetectOutput:
{ objects: Array<Object> }
- PointOutput:
{ points: Array<Point> }
- Region: Bounding box with coordinates (
x_min
,y_min
,x_max
,y_max
) - Point: Coordinates (
x
,y
) indicating the object center