> Universal client for LLM providers with OpenAI-compatible interface
llm-polyglot
extends the OpenAI SDK to provide a consistent interface across different LLM providers. Use the same familiar OpenAI-style API with Anthropic, Google, and others.
Native API Support Status:
Provider API | Status | Chat | Basic Stream | Functions/Tool calling | Function streaming | Notes |
---|---|---|---|---|---|---|
OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | Direct SDK proxy |
Anthropic | ✅ | ✅ | ✅ | ❌ | ❌ | Claude models |
✅ | ✅ | ✅ | ✅ | ❌ | Gemini models + context caching | |
Azure | 🚧 | ✅ | ✅ | ❌ | ❌ | OpenAI model hosting |
Cohere | ❌ | - | - | - | - | Not supported |
AI21 | ❌ | - | - | - | - | Not supported |
Stream Types:
- Basic Stream: Simple text streaming
- Partial JSON Stream: Progressive JSON object construction during streaming
- Function Stream: Streaming function/tool calls and their results
OpenAI-Compatible Hosting Providers:
These providers use the OpenAI SDK format, so they work directly with the OpenAI client configuration:
Provider | How to Use | Available Models |
---|---|---|
Together | Use OpenAI client with Together base URL | Mixtral, Llama, OpenChat, Yi, others |
Anyscale | Use OpenAI client with Anyscale base URL | Mistral, Llama, others |
Perplexity | Use OpenAI client with Perplexity base URL | pplx-* models |
Replicate | Use OpenAI client with Replicate base URL | Various open models |
# Base installation
npm install llm-polyglot openai
# Provider-specific SDKs (as needed)
npm install @anthropic-ai/sdk # For Anthropic
npm install @google/generative-ai # For Google/Gemini
import { createLLMClient } from "llm-polyglot";
// Initialize provider-specific client
const client = createLLMClient({
provider: "anthropic" // or "google", "openai", etc.
});
// Use consistent OpenAI-style interface
const completion = await client.chat.completions.create({
model: "claude-3-opus-20240229",
messages: [{ role: "user", content: "Hello!" }],
max_tokens: 1000
});
The llm-polyglot library provides support for Anthropic's API, including standard chat completions, streaming chat completions, and function calling. Both input paramaters and responses match exactly those of the OpenAI SDK - for more detailed documentation please see the OpenAI docs: https://platform.openai.com/docs/api-reference
The anthropic sdk is required when using the anthropic provider - we only use the types provided by the sdk.
bun add @anthropic-ai/sdk
const client = createLLMClient({ provider: "anthropic" });
// Standard completion
const response = await client.chat.completions.create({
model: "claude-3-opus-20240229",
messages: [{ role: "user", content: "Hello!" }]
});
// Streaming
const stream = await client.chat.completions.create({
model: "claude-3-opus-20240229",
messages: [{ role: "user", content: "Hello!" }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
// Tool/Function calling
const result = await client.chat.completions.create({
model: "claude-3-opus-20240229",
messages: [{ role: "user", content: "Analyze this data" }],
tools: [{
type: "function",
function: {
name: "analyze",
parameters: {
type: "object",
properties: {
sentiment: { type: "string" }
}
}
}
}]
});
The llm-polyglot library provides support for Google's Gemini API including:
- Standard chat completions with OpenAI-compatible interface
- Streaming chat completions with delta updates
- Function/tool calling with automatic schema conversion
- Context caching for token optimization (requires paid API key)
- Grounding support with Google Search integration
- Safety settings and model generation config
- Session management for stateful conversations
- Automatic response transformation with source attribution
The Google generative-ai sdk is required when using the google provider:
bun add @google/generative-ai
To use any of the above functionality, the schema matches OpenAI's format since we translate the OpenAI params spec into Gemini's model spec.
const client = createLLMClient({ provider: "google" });
// Standard completion
const completion = await client.chat.completions.create({
model: "gemini-1.5-flash-latest",
messages: [{ role: "user", content: "Hello!" }],
max_tokens: 1000
});
// With grounding (Google Search)
const groundedCompletion = await client.chat.completions.create({
model: "gemini-1.5-flash-latest",
messages: [{ role: "user", content: "What are the latest AI developments?" }],
groundingThreshold: 0.7,
max_tokens: 1000
});
// With safety settings
const safeCompletion = await client.chat.completions.create({
model: "gemini-1.5-flash-latest",
messages: [{ role: "user", content: "Tell me a story" }],
additionalProperties: {
safetySettings: [{
category: "HARM_CATEGORY_HARASSMENT",
threshold: "BLOCK_MEDIUM_AND_ABOVE"
}]
}
});
// With session management
const sessionCompletion = await client.chat.completions.create({
model: "gemini-1.5-flash-latest",
messages: [{ role: "user", content: "Remember this: I'm Alice" }],
additionalProperties: {
sessionId: "user-123"
}
});
Context Caching is a feature specific to Gemini that helps cut down on duplicate token usage by allowing you to create a cache with a TTL:
// Create a cache
const cache = await client.cacheManager.create({
model: "gemini-1.5-flash-8b",
messages: [{ role: "user", content: "Context to cache" }],
ttlSeconds: 3600 // Cache for 1 hour
});
// Use the cached context
const completion = await client.chat.completions.create({
model: "gemini-1.5-flash-8b",
messages: [{ role: "user", content: "Follow-up question" }],
additionalProperties: {
cacheName: cache.name
}
});
const completion = await client.chat.completions.create({
model: "gemini-1.5-flash-latest",
messages: [{ role: "user", content: "Analyze this data" }],
tools: [{
type: "function",
function: {
name: "analyze",
parameters: {
type: "object",
properties: {
sentiment: { type: "string" }
}
}
}
}],
tool_choice: {
type: "function",
function: { name: "analyze" }
}
});