The Model Context Protocol (MCP) is an open standard developed by Anthropic that defines how applications share context with large language models (LLMs).
MCP lets you give natural language instructions to a language model, and it can discover and run APIs automatically on your behalf.
This an npm package that bundles up an MCP server for Replicate's HTTP API so you can run Replicate API commands using prompts like these:
Search Replicate for upscaler models and compare them
Show me the latest Replicate models created by @fofr
Generate an image using black-forest-labs/flux-schnell
Upscale that image using the best upscaler model
To get started, see the following guides:
- Using Replicate MCP with Claude Desktop
- Using Replicate MCP with Cursor
- Using Replicate MCP with GitHub Copilot in VS Code
You can run the MCP server directly via npx
.
Create a Replicate API token at replicate.com/account/api-tokens, then run:
export REPLICATE_API_TOKEN="My Bearer Token"
npx -y replicate-mcp@latest
There is a partial list of existing clients at modelcontextprotocol.io. If you already have a client, consult their documentation to install the MCP server.
For clients with a configuration JSON, it might look something like this:
{
"mcpServers": {
"replicate": {
"command": "npx",
"args": ["-y", "replicate-mcp", "--client=claude", "--tools=dynamic"],
"env": {
"REPLICATE_API_TOKEN": "My Bearer Token"
}
}
}
}
There are two ways to expose endpoints as tools in the MCP server:
- Exposing one tool per endpoint, and filtering as necessary
- Exposing a set of tools to dynamically discover and invoke endpoints from the API
You can run the package on the command line to discover and filter the set of tools that are exposed by the MCP Server. This can be helpful for large APIs where including all endpoints at once is too much for your AI's context window.
You can filter by multiple aspects:
-
--tool
includes a specific tool by name -
--resource
includes all tools under a specific resource, and can have wildcards, e.g.my.resource*
-
--operation
includes just read (get/list) or just write operations
If you specify --tools=dynamic
to the MCP server, instead of exposing one tool per endpoint in the API, it will
expose the following tools:
-
list_api_endpoints
- Discovers available endpoints, with optional filtering by search query -
get_api_endpoint_schema
- Gets detailed schema information for a specific endpoint -
invoke_api_endpoint
- Executes any endpoint with the appropriate parameters
This allows you to have the full set of API endpoints available to your MCP Client, while not requiring that all of their schemas be loaded into context at once. Instead, the LLM will automatically use these tools together to search for, look up, and invoke endpoints dynamically. However, due to the indirect nature of the schemas, it can struggle to provide the correct properties a bit more than when tools are imported explicitly. Therefore, you can opt-in to explicit tools, the dynamic tools, or both.
See more information with --help
.
All of these command-line options can be repeated, combined together, and have corresponding exclusion versions (e.g. --no-tool
).
Use --list
to see the list of available tools, or see below.
Different clients have varying abilities to handle arbitrary tools and schemas.
You can specify the client you are using with the --client
argument, and the MCP server will automatically
serve tools and schemas that are more compatible with that client.
-
--client=<type>
: Set all capabilities based on a known MCP client- Valid values:
openai-agents
,claude
,claude-code
,cursor
- Example:
--client=cursor
- Valid values:
Additionally, if you have a client not on the above list, or the client has gotten better over time, you can manually enable or disable certain capabilities:
-
--capability=<name>
: Specify individual client capabilities- Available capabilities:
-
top-level-unions
: Enable support for top-level unions in tool schemas -
valid-json
: Enable JSON string parsing for arguments -
refs
: Enable support for $ref pointers in schemas -
unions
: Enable support for union types (anyOf) in schemas -
formats
: Enable support for format validations in schemas (e.g. date-time, email) -
tool-name-length=N
: Set maximum tool name length to N characters
-
- Example:
--capability=top-level-unions --capability=tool-name-length=40
- Example:
--capability=top-level-unions,tool-name-length=40
- Available capabilities:
- Filter for read operations on cards:
--resource=cards --operation=read
- Exclude specific tools while including others:
--resource=cards --no-tool=create_cards
- Configure for Cursor client with custom max tool name length:
--client=cursor --capability=tool-name-length=40
- Complex filtering with multiple criteria:
--resource=cards,accounts --operation=read --tag=kyc --no-tool=create_cards
// Import the server, generated endpoints, or the init function
import { server, endpoints, init } from "replicate-mcp/server";
// import a specific tool
import listCollections from "replicate-mcp/tools/collections/list-collections";
// initialize the server and all endpoints
init({ server, endpoints });
// manually start server
const transport = new StdioServerTransport();
await server.connect(transport);
// or initialize your own server with specific tools
const myServer = new McpServer(...);
// define your own endpoint
const myCustomEndpoint = {
tool: {
name: 'my_custom_tool',
description: 'My custom tool',
inputSchema: zodToJsonSchema(z.object({ a_property: z.string() })),
},
handler: async (client: replicate, args: any) => {
return { myResponse: 'Hello world!' };
})
};
// initialize the server with your custom endpoints
init({ server: myServer, endpoints: [listCollections, myCustomEndpoint] });
The following tools are available in this MCP server.
-
list_collections
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/collections
The response will be a paginated JSON list of collection objects:
{ "next": "null", "previous": null, "results": [ { "name": "Super resolution", "slug": "super-resolution", "description": "Upscaling models that create high-quality images from low-quality images." } ] }
-
get_collections
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/collections/super-resolution
The response will be a collection object with a nested list of the models in that collection:
{ "name": "Super resolution", "slug": "super-resolution", "description": "Upscaling models that create high-quality images from low-quality images.", "models": [...] }
-
create_deployments
(write
): Create a new deployment:Example cURL request:
curl -s \ -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "my-app-image-generator", "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "hardware": "gpu-t4", "min_instances": 0, "max_instances": 3 }' \ https://api.replicate.com/v1/deployments
The response will be a JSON object describing the deployment:
{ "owner": "acme", "name": "my-app-image-generator", "current_release": { "number": 1, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "created_at": "2024-02-15T16:32:57.018467Z", "created_by": { "type": "organization", "username": "acme", "name": "Acme Corp, Inc.", "avatar_url": "https://cdn.replicate.com/avatars/acme.png", "github_url": "https://github.com/acme" }, "configuration": { "hardware": "gpu-t4", "min_instances": 1, "max_instances": 5 } } }
-
update_deployments
(write
): Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.Example cURL request:
curl -s \ -X PATCH \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{"min_instances": 3, "max_instances": 10}' \ https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be a JSON object describing the deployment:
{ "owner": "acme", "name": "my-app-image-generator", "current_release": { "number": 2, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "created_at": "2024-02-15T16:32:57.018467Z", "created_by": { "type": "organization", "username": "acme", "name": "Acme Corp, Inc.", "avatar_url": "https://cdn.replicate.com/avatars/acme.png", "github_url": "https://github.com/acme" }, "configuration": { "hardware": "gpu-t4", "min_instances": 3, "max_instances": 10 } } }
Updating any deployment properties will increment the
number
field of thecurrent_release
. -
list_deployments
(read
): Get a list of deployments associated with the current account, including the latest release configuration for each deployment.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/deployments
The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:
{ "next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw", "previous": null, "results": [ { "owner": "replicate", "name": "my-app-image-generator", "current_release": { "number": 1, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "created_at": "2024-02-15T16:32:57.018467Z", "created_by": { "type": "organization", "username": "acme", "name": "Acme Corp, Inc.", "avatar_url": "https://cdn.replicate.com/avatars/acme.png", "github_url": "https://github.com/acme" }, "configuration": { "hardware": "gpu-t4", "min_instances": 1, "max_instances": 5 } } } ] }
-
delete_deployments
(write
): Delete a deploymentDeployment deletion has some restrictions:
- You can only delete deployments that have been offline and unused for at least 15 minutes.
Example cURL request:
curl -s -X DELETE \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be an empty 204, indicating the deployment has been deleted.
-
get_deployments
(read
): Get information about a deployment by name including the current release.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/deployments/replicate/my-app-image-generator
The response will be a JSON object describing the deployment:
{ "owner": "acme", "name": "my-app-image-generator", "current_release": { "number": 1, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "created_at": "2024-02-15T16:32:57.018467Z", "created_by": { "type": "organization", "username": "acme", "name": "Acme Corp, Inc.", "avatar_url": "https://cdn.replicate.com/avatars/acme.png", "github_url": "https://github.com/acme" }, "configuration": { "hardware": "gpu-t4", "min_instances": 1, "max_instances": 5 } } }
-
create_deployments_predictions
(write
): Create a prediction for the deployment and inputs you provide.Example cURL request:
curl -s -X POST -H 'Prefer: wait' \ -d '{"input": {"prompt": "A photo of a bear riding a bicycle over the moon"}}' \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H 'Content-Type: application/json' \ https://api.replicate.com/v1/deployments/acme/my-app-image-generator/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a
"starting"
state and need to be retrieved using thepredictions.get
endpiont.For a complete overview of the
deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
-
list_hardware
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/hardware
The response will be a JSON array of hardware objects:
[ { "name": "CPU", "sku": "cpu" }, { "name": "Nvidia T4 GPU", "sku": "gpu-t4" }, { "name": "Nvidia A40 GPU", "sku": "gpu-a40-small" }, { "name": "Nvidia A40 (Large) GPU", "sku": "gpu-a40-large" } ]
-
get_account
(read
): Returns information about the user or organization associated with the provided API token.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/account
The response will be a JSON object describing the account:
{ "type": "organization", "username": "acme", "name": "Acme Corp, Inc.", "github_url": "https://github.com/acme" }
-
create_models
(write
): Create a model.Example cURL request:
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H 'Content-Type: application/json' \ -d '{"owner": "alice", "name": "hot-dog-detector", "description": "Detect hot dogs in images", "visibility": "public", "hardware": "cpu"}' \ https://api.replicate.com/v1/models
The response will be a model object in the following format:
{ "url": "https://replicate.com/alice/hot-dog-detector", "owner": "alice", "name": "hot-dog-detector", "description": "Detect hot dogs in images", "visibility": "public", "github_url": null, "paper_url": null, "license_url": null, "run_count": 0, "cover_image_url": null, "default_example": null, "latest_version": null }
Note that there is a limit of 1,000 models per account. For most purposes, we recommend using a single model and pushing new versions of the model as you make changes to it.
-
list_models
(read
): Get a paginated list of public models.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models
The response will be a pagination object containing a list of model objects.
See the
models.get
docs for more details about the model object. -
delete_models
(write
): Delete a modelModel deletion has some restrictions:
- You can only delete models you own.
- You can only delete private models.
- You can only delete models that have no versions associated with them. Currently you'll need to delete the model's versions before you can delete the model itself.
Example cURL request:
curl -s -X DELETE \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world
The response will be an empty 204, indicating the model has been deleted.
-
get_models
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world
The response will be a model object in the following format:
{ "url": "https://replicate.com/replicate/hello-world", "owner": "replicate", "name": "hello-world", "description": "A tiny model that says hello", "visibility": "public", "github_url": "https://github.com/replicate/cog-examples", "paper_url": null, "license_url": null, "run_count": 5681081, "cover_image_url": "...", "default_example": {...}, "latest_version": {...}, }
The model object includes the input and output schema for the latest version of the model.
Here's an example showing how to fetch the model with cURL and display its input schema with jq:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world \ | jq ".latest_version.openapi_schema.components.schemas.Input"
This will return the following JSON object:
{ "type": "object", "title": "Input", "required": ["text"], "properties": { "text": { "type": "string", "title": "Text", "x-order": 0, "description": "Text to prefix with 'hello '" } } }
The
cover_image_url
string is an HTTPS URL for an image file. This can be:- An image uploaded by the model author.
- The output file of the example prediction, if the model author has not set a cover image.
- The input file of the example prediction, if the model author has not set a cover image and the example prediction has no output file.
- A generic fallback image.
The
default_example
object is a prediction created with this model.The
latest_version
object is the model's most recently pushed version. -
search_models
(write
): Get a list of public models matching a search query.Example cURL request:
curl -s -X QUERY \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: text/plain" \ -d "hello" \ https://api.replicate.com/v1/models
The response will be a paginated JSON object containing an array of model objects.
See the
models.get
docs for more details about the model object.
-
list_models_examples
(read
): List example predictions made using the model. These are predictions that were saved by the model author as illustrative examples of the model's capabilities.If you want all the examples for a model, use this operation.
If you just want the model's default example, you can use the
models.get
operation instead, which includes adefault_example
object.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world/examples
The response will be a pagination object containing a list of example predictions:
{ "next": "https://api.replicate.com/v1/models/replicate/hello-world/examples?cursor=...", "previous": "https://api.replicate.com/v1/models/replicate/hello-world/examples?cursor=...", "results": [...] }
Each item in the
results
list is a prediction object.
-
create_models_predictions
(write
): Create a prediction using an official model.If you're not running an official model, use the
predictions.create
operation instead.Example cURL request:
curl -s -X POST -H 'Prefer: wait' \ -d '{"input": {"prompt": "Write a short poem about the weather."}}' \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H 'Content-Type: application/json' \ https://api.replicate.com/v1/models/meta/meta-llama-3-70b-instruct/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a
"starting"
state and need to be retrieved using thepredictions.get
endpiont.For a complete overview of the
deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
-
get_models_readme
(read
): Get the README content for a model.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world/readme
The response will be the README content as plain text in Markdown format:
# Hello World Model This is an example model that...
-
list_models_versions
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world/versions
The response will be a JSON array of model version objects, sorted with the most recent version first:
{ "next": null, "previous": null, "results": [ { "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "created_at": "2022-04-26T19:29:04.418669Z", "cog_version": "0.3.0", "openapi_schema": {...} } ] }
-
delete_models_versions
(write
): Delete a model version and all associated predictions, including all output files.Model version deletion has some restrictions:
- You can only delete versions from models you own.
- You can only delete versions from private models.
- You cannot delete a version if someone other than you has run predictions with it.
- You cannot delete a version if it is being used as the base model for a fine tune/training.
- You cannot delete a version if it has an associated deployment.
- You cannot delete a version if another model version is overridden to use it.
Example cURL request:
curl -s -X DELETE \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa
The response will be an empty 202, indicating the deletion request has been accepted. It might take a few minutes to be processed.
-
get_models_versions
(read
): Example cURL request:curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa
The response will be the version object:
{ "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "created_at": "2022-04-26T19:29:04.418669Z", "cog_version": "0.3.0", "openapi_schema": {...} }
Every model describes its inputs and outputs with OpenAPI Schema Objects in the
openapi_schema
property.The
openapi_schema.components.schemas.Input
property for the replicate/hello-world model looks like this:{ "type": "object", "title": "Input", "required": ["text"], "properties": { "text": { "x-order": 0, "type": "string", "title": "Text", "description": "Text to prefix with 'hello '" } } }
The
openapi_schema.components.schemas.Output
property for the replicate/hello-world model looks like this:{ "type": "string", "title": "Output" }
For more details, see the docs on Cog's supported input and output types
-
create_predictions
(write
): Create a prediction for the model version and inputs you provide.If you're running an official model, use the
models.predictions.create
operation instead.Example cURL request:
curl -s -X POST -H 'Prefer: wait' \ -d '{"version": "replicate/hello-world:5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H 'Content-Type: application/json' \ https://api.replicate.com/v1/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a
"starting"
state and need to be retrieved using thepredictions.get
endpiont.For a complete overview of the
predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases. -
list_predictions
(read
): Get a paginated list of all predictions created by the user or organization associated with the provided API token.This will include predictions created from the API and the website. It will return 100 records per page.
Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/predictions
The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:
{ "next": null, "previous": null, "results": [ { "completed_at": "2023-09-08T16:19:34.791859Z", "created_at": "2023-09-08T16:19:34.907244Z", "data_removed": false, "error": null, "id": "gm3qorzdhgbfurvjtvhg6dckhu", "input": { "text": "Alice" }, "metrics": { "predict_time": 0.012683 }, "output": "hello Alice", "started_at": "2023-09-08T16:19:34.779176Z", "source": "api", "status": "succeeded", "urls": { "web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu", "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu", "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel" }, "model": "replicate/hello-world", "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa" } ] }
id
will be the unique ID of the prediction.source
will indicate how the prediction was created. Possible values areweb
orapi
.status
will be the status of the prediction. Refer to get a single prediction for possible values.urls
will be a convenience object that can be used to construct new API requests for the given prediction. If the requested model version supports streaming, this will have astream
entry with an HTTPS URL that you can use to construct anEventSource
.model
will be the model identifier string in the format of{model_owner}/{model_name}
.version
will be the unique ID of model version used to create the prediction.data_removed
will betrue
if the input and output data has been deleted. -
cancel_predictions
(write
): Cancel a prediction that is currently running.Example cURL request that creates a prediction and then cancels it:
# First, create a prediction PREDICTION_ID=$(curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "input": { "prompt": "a video that may take a while to generate" } }' \ https://api.replicate.com/v1/models/minimax/video-01/predictions | jq -r '.id') # Echo the prediction ID echo "Created prediction with ID: $PREDICTION_ID" # Cancel the prediction curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/predictions/$PREDICTION_ID/cancel
-
get_predictions
(read
): Get the current state of a prediction.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu
The response will be the prediction object:
{ "id": "gm3qorzdhgbfurvjtvhg6dckhu", "model": "replicate/hello-world", "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": { "text": "Alice" }, "logs": "", "output": "hello Alice", "error": null, "status": "succeeded", "created_at": "2023-09-08T16:19:34.765994Z", "data_removed": false, "started_at": "2023-09-08T16:19:34.779176Z", "completed_at": "2023-09-08T16:19:34.791859Z", "metrics": { "predict_time": 0.012683 }, "urls": { "web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu", "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu", "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel" } }
status
will be one of:-
starting
: the prediction is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the prediction. -
processing
: thepredict()
method of the model is currently running. -
succeeded
: the prediction completed successfully. -
failed
: the prediction encountered an error during processing. -
canceled
: the prediction was canceled by its creator.
In the case of success,
output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass theAuthorization
header to request them.In the case of failure,
error
will contain the error encountered during the prediction.Terminated predictions (with a status of
succeeded
,failed
, orcanceled
) will include ametrics
object with apredict_time
property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won't include time waiting for the prediction to start.All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.
You must save a copy of any data or files in the output if you'd like to continue using them. The
output
key will still be present, but it's value will benull
after the output has been removed.Output files are served by
replicate.delivery
and its subdomains. If you use an allow list of external domains for your assets, addreplicate.delivery
and*.replicate.delivery
to it. -
-
create_trainings
(write
): Start a new training of the model version you specify.Example request body:
{ "destination": "{new_owner}/{new_name}", "input": { "train_data": "https://example.com/my-input-images.zip" }, "webhook": "https://example.com/my-webhook" }
Example cURL request:
curl -s -X POST \ -d '{"destination": "{new_owner}/{new_name}", "input": {"input_images": "https://example.com/my-input-images.zip"}}' \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H 'Content-Type: application/json' \ https://api.replicate.com/v1/models/stability-ai/sdxl/versions/da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf/trainings
The response will be the training object:
{ "id": "zz4ibbonubfz7carwiefibzgga", "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf", "input": { "input_images": "https://example.com/my-input-images.zip" }, "logs": "", "error": null, "status": "starting", "created_at": "2023-09-08T16:32:56.990893084Z", "urls": { "web": "https://replicate.com/p/zz4ibbonubfz7carwiefibzgga", "get": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga", "cancel": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga/cancel" } }
As models can take several minutes or more to train, the result will not be available immediately. To get the final result of the training you should either provide a
webhook
HTTPS URL for us to call when the results are ready, or poll the get a training endpoint until it has finished.When a training completes, it creates a new version of the model at the specified destination.
To find some models to train on, check out the trainable language models collection.
-
list_trainings
(read
): Get a paginated list of all trainings created by the user or organization associated with the provided API token.This will include trainings created from the API and the website. It will return 100 records per page.
Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/trainings
The response will be a paginated JSON array of training objects, sorted with the most recent training first:
{ "next": null, "previous": null, "results": [ { "completed_at": "2023-09-08T16:41:19.826523Z", "created_at": "2023-09-08T16:32:57.018467Z", "error": null, "id": "zz4ibbonubfz7carwiefibzgga", "input": { "input_images": "https://example.com/my-input-images.zip" }, "metrics": { "predict_time": 502.713876 }, "output": { "version": "...", "weights": "..." }, "started_at": "2023-09-08T16:32:57.112647Z", "source": "api", "status": "succeeded", "urls": { "web": "https://replicate.com/p/zz4ibbonubfz7carwiefibzgga", "get": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga", "cancel": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga/cancel" }, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf" } ] }
id
will be the unique ID of the training.source
will indicate how the training was created. Possible values areweb
orapi
.status
will be the status of the training. Refer to get a single training for possible values.urls
will be a convenience object that can be used to construct new API requests for the given training.version
will be the unique ID of model version used to create the training. -
cancel_trainings
(write
): Cancel a training -
get_trainings
(read
): Get the current state of a training.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga
The response will be the training object:
{ "completed_at": "2023-09-08T16:41:19.826523Z", "created_at": "2023-09-08T16:32:57.018467Z", "error": null, "id": "zz4ibbonubfz7carwiefibzgga", "input": { "input_images": "https://example.com/my-input-images.zip" }, "logs": "...", "metrics": { "predict_time": 502.713876 }, "output": { "version": "...", "weights": "..." }, "started_at": "2023-09-08T16:32:57.112647Z", "status": "succeeded", "urls": { "web": "https://replicate.com/p/zz4ibbonubfz7carwiefibzgga", "get": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga", "cancel": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga/cancel" }, "model": "stability-ai/sdxl", "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf" }
status
will be one of:-
starting
: the training is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the training. -
processing
: thetrain()
method of the model is currently running. -
succeeded
: the training completed successfully. -
failed
: the training encountered an error during processing. -
canceled
: the training was canceled by its creator.
In the case of success,
output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass theAuthorization
header to request them.In the case of failure,
error
will contain the error encountered during the training.Terminated trainings (with a status of
succeeded
,failed
, orcanceled
) will include ametrics
object with apredict_time
property showing the amount of CPU or GPU time, in seconds, that the training used while running. It won't include time waiting for the training to start. -
-
get_default_webhooks_secret
(read
): Get the signing secret for the default webhook endpoint. This is used to verify that webhook requests are coming from Replicate.Example cURL request:
curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/webhooks/default/secret
The response will be a JSON object with a
key
property:{ "key": "..." }
-
create_files
(write
): Create a file by uploading its content and optional metadata.Example cURL request:
curl -X POST https://api.replicate.com/v1/files \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ -H 'Content-Type: multipart/form-data' \ -F 'content=@/path/to/archive.zip;type=application/zip;filename=example.zip' \ -F 'metadata={"customer_reference_id": 123};type=application/json'
The request must include:
-
content
: The file content (required) -
type
: The content / MIME type for the file (defaults toapplication/octet-stream
) -
filename
: The filename (required, ≤ 255 bytes, valid UTF-8) -
metadata
: User-provided metadata associated with the file (defaults to{}
, must be valid JSON)
-
-
list_files
(read
): Get a paginated list of all files created by the user or organization associated with the provided API token.Example cURL request:
curl -s \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/files
The response will be a paginated JSON array of file objects, sorted with the most recent file first.
-
delete_files
(write
): Delete a file. Once a file has been deleted, subsequent requests to the file resource return 404 Not found.Example cURL request:
curl -X DELETE \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/files/cneqzikepnug6xezperrr4z55o
-
download_files
(read
): Download a file by providing the file owner, access expiry, and a valid signature.Example cURL request:
curl -X GET "https://api.replicate.com/v1/files/cneqzikepnug6xezperrr4z55o/download?expiry=1708515345&owner=mattt&signature=zuoghqlrcnw8YHywkpaXQlHsVhWen%2FDZ4aal76dLiOo%3D"
-
get_files
(read
): Get the details of a file.Example cURL request:
curl -s \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ https://api.replicate.com/v1/files/cneqzikepnug6xezperrr4z55o