A Model Context Protocol (MCP) server that provides access to OpenAI's latest image generation capabilities using both the GPT Image-1 model and the new Responses API. This server enables AI assistants like Claude to generate and manipulate images using natural language prompts with cutting-edge multimodal AI technology.
- 🎨 Latest Image Generation: Create stunning images using OpenAI's GPT Image-1 (2025's state-of-the-art model)
- 🆕 Dual API Support: Choose between Images API (gpt-image-1) and Responses API (gpt-4o with image tools)
- ✏️ Advanced Image Editing: Modify existing images with text prompts and optional masks
- 🔄 Multiple Transports: Supports both stdio (for Claude Desktop) and HTTP (for remote access)
- ⚡ Real-time Streaming: Server-Sent Events (SSE) for live progress updates and partial previews
- 💾 Smart Caching: Two-tier caching system (memory + disk) for instant repeated requests
- 🖼️ Image Optimization: Automatic compression with up to 80% size reduction
- 🚀 Production Ready: Docker support, session management, and comprehensive error handling
- 🔒 Secure: API key authentication via environment variables
- 📊 Flexible Options: Support for various sizes, quality levels, and output formats
- 🔄 Conversation Context: Multi-turn image editing with conversation history tracking
- 🎯 Superior Text Rendering: GPT Image-1's enhanced text-in-image capabilities
- Node.js 18 or higher
- npm or yarn
- OpenAI API key with access to image generation models
- API Organization Verification completed for image generation access
# Clone the repository
git clone https://github.com/pavelsukhachev/mcp-server-gpt-image.git
cd mcp-server-gpt-image
# Install dependencies (required for runtime)
npm install --production
# Clone the repository
git clone https://github.com/pavelsukhachev/mcp-server-gpt-image.git
cd mcp-server-gpt-image
# Install all dependencies
npm install
# Build the project
npm run build
Create a .env
file in the root directory:
# Required
OPENAI_API_KEY=your-openai-api-key-here
# API Configuration
API_MODE=responses # 'responses' (default, latest) or 'images' (legacy)
RESPONSES_MODEL=gpt-4o # Model for Responses API (default: gpt-4o)
# Optional
PORT=3000
CORS_ORIGIN=*
# Cache Configuration
CACHE_DIR=.cache/images
CACHE_TTL=3600
CACHE_MAX_SIZE=100
# Feature Flags
ENABLE_CONVERSATION_CONTEXT=true # Multi-turn conversation support
ENABLE_STREAMING=true # Real-time streaming updates
ENABLE_OPTIMIZATION=true # Image optimization
Default Mode: API_MODE=responses
- Model: gpt-4o with image_generation tool
- Technology: Latest 2025 Responses API with integrated GPT Image-1 capabilities
-
Features:
- Native multimodal understanding
- Better context awareness
- Enhanced prompt following
- Superior text rendering in images
- Real-time streaming with partial previews
- Multi-turn conversation support
Legacy Mode: API_MODE=images
- Model: gpt-image-1 (dedicated image model)
- Technology: Traditional Images API endpoint
-
Features:
- Direct access to GPT Image-1 model
- Simple, focused image generation
- Backward compatibility
Feature | Responses API (gpt-4o) | Images API (gpt-image-1) |
---|---|---|
Latest Technology | ✅ 2025 Responses API | |
Text in Images | ✅ Superior | ✅ Good |
Context Awareness | ✅ Excellent | |
Streaming | ✅ Partial previews | |
Multi-turn | ✅ Full support | |
Performance | ✅ Optimized | ✅ Fast |
Add the following to your Claude Desktop MCP settings (~/Library/Application Support/Claude/claude_desktop_config.json
on macOS):
{
"mcpServers": {
"gpt-image": {
"command": "node",
"args": ["/path/to/mcp-server-gpt-image/dist/index.js", "stdio"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key-here"
}
}
}
}
# Run in HTTP mode for remote access
npm run start:http
# Or use Docker
docker-compose up
The server will be available at:
- Health check:
http://localhost:3000/health
- MCP endpoint:
http://localhost:3000/mcp
- Streaming endpoint:
http://localhost:3000/mcp/stream
Generate images from text prompts with optional streaming support.
Parameters:
-
prompt
(required): Text description of the image to generate -
size
: Image dimensions-
1024x1024
(default) -
1024x1536
(portrait) -
1536x1024
(landscape) auto
-
-
quality
: Rendering quality-
low
(60% compression) -
medium
(80% compression) -
high
(95% quality) -
auto
(default, 85% quality)
-
-
format
: Output format (png
,jpeg
,webp
) -
background
: Background transparency (transparent
,opaque
,auto
) -
output_compression
: Explicit compression level (0-100) -
n
: Number of images to generate (1-4) -
partialImages
: Number of partial images to stream (1-3, enables streaming) -
stream
: Enable streaming mode for real-time generation updates -
conversationId
: ID for conversation context tracking (optional) -
useContext
: Whether to use conversation context from previous interactions (default: false) -
maxContextEntries
: Maximum number of context entries to consider (1-10, default: 5)
Example:
{
"prompt": "A serene Japanese garden with cherry blossoms at sunset",
"size": "1536x1024",
"quality": "high",
"format": "png",
"partialImages": 2,
"stream": true
}
Edit existing images using text prompts and optional masks.
Parameters:
-
prompt
(required): Text description of the desired edit -
images
(required): Array of base64-encoded images to edit -
mask
: Base64-encoded mask for inpainting (optional) - Other parameters same as
generate_image
Example:
{
"prompt": "Add a red bridge over the stream",
"images": ["base64_encoded_image_data..."],
"mask": "base64_encoded_mask_data..."
}
API Mode Support:
- Responses API: Editing via conversation with image input (recommended)
- Images API: Direct image editing using traditional endpoint
Clear all cached images from memory and disk.
Example:
// No parameters required
{}
Get cache statistics including memory entries and disk usage.
Example:
// No parameters required
{}
List all active conversation IDs.
Example:
// No parameters required
{}
Get the full history of a specific conversation.
Parameters:
-
conversationId
(required): The conversation ID to retrieve
Example:
{
"conversationId": "design-session-123"
}
Clear the history of a specific conversation.
Parameters:
-
conversationId
(required): The conversation ID to clear
Example:
{
"conversationId": "design-session-123"
}
The server supports streaming image generation via Server-Sent Events (SSE) for real-time progress updates and partial image previews.
Endpoint: POST /mcp/stream
Request Body:
{
"prompt": "A beautiful sunset over mountains",
"partialImages": 3,
"size": "1024x1024",
"quality": "high"
}
Response: Server-Sent Events stream
Event Types:
-
progress
: Generation progress updates with percentage and message -
partial
: Partial image preview (base64 encoded) -
complete
: Final image with revised prompt -
error
: Error information if generation fails
Responses API Streaming (Recommended):
// Using Responses API with gpt-4o
const response = await fetch('http://localhost:3000/mcp/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'A futuristic city with flying cars',
partialImages: 2,
apiMode: 'responses' // Use latest Responses API
})
});
Images API Streaming (Legacy):
// Using traditional Images API with gpt-image-1
const response = await fetch('http://localhost:3000/mcp/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'A futuristic city with flying cars',
partialImages: 2,
apiMode: 'images' // Use legacy Images API
})
});
Example Client:
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const events = chunk.split('\n\n');
for (const event of events) {
if (event.startsWith('data: ')) {
const data = JSON.parse(event.slice(6));
console.log('Event:', data.type, data.data?.message);
}
}
}
See examples/streaming-client.ts
for a complete implementation.
The server now supports maintaining conversation context across multiple image generation and editing operations. This allows for iterative refinement where each new prompt can build upon previous results.
How it works:
- Assign a
conversationId
to group related operations - Enable
useContext: true
to enhance prompts with previous context - The system automatically tracks prompts, revised prompts, and image metadata
- Context is persisted to disk for resuming sessions later
Example Workflow:
// Initial generation
{
"prompt": "Create a serene mountain landscape",
"conversationId": "landscape-design-001",
"useContext": false // First prompt doesn't need context
}
// Iterative refinement with context
{
"prompt": "Add a crystal clear lake in the foreground",
"conversationId": "landscape-design-001",
"useContext": true, // Will consider previous "mountain landscape" context
"maxContextEntries": 5
}
// Further editing
{
"prompt": "Make the sky more dramatic with sunset colors",
"images": ["previous_generated_image_base64..."],
"conversationId": "landscape-design-001",
"useContext": true // Considers both previous prompts for consistency
}
Benefits:
- Consistency: Maintains style and elements across iterations
- Context Awareness: Each generation considers previous prompts and results
- Session Persistence: Resume work later with full context preserved
-
Flexible History: Control how much context to use with
maxContextEntries
Managing Conversations:
- Use
list_conversations
to see all active sessions - Use
get_conversation
to review the full history of a session - Use
clear_conversation
to start fresh when needed
The server includes an intelligent caching system to reduce API calls and improve response times.
Features:
- Memory + Disk Cache: Two-tier caching for optimal performance
- Content-Based Keys: Cache keys based on prompt, size, quality, and other parameters
- TTL Support: Configurable time-to-live for cache entries
- Size Management: Automatic cleanup when cache exceeds size limits
- Cache Tools: Built-in tools for cache management
Configuration (via environment variables):
CACHE_DIR=.cache/images # Cache directory (default: .cache/images)
CACHE_TTL=3600 # Cache TTL in seconds (default: 1 hour)
CACHE_MAX_SIZE=100 # Max cache size in MB (default: 100MB)
Cache Behavior:
- Identical requests return cached results instantly
- Cache hits are logged for monitoring
- Expired entries are automatically cleaned up
- Edit operations cache based on image+mask+prompt combination
The server includes an intelligent image optimization engine powered by Sharp.
Features:
- Format Conversion: Automatically convert between PNG, JPEG, and WebP
- Smart Compression: Adaptive quality based on image characteristics
- Size Constraints: Maintain dimensions while reducing file size
- Transparency Handling: Preserve alpha channels when needed
- Progressive Encoding: Better perceived loading performance
Optimization Results:
- Typical size reductions: 30-70% for JPEG, 20-50% for WebP
- Automatic format selection based on content type
- Preserved visual quality with smaller file sizes
- Logged optimization metrics for monitoring
# Build and run
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down
The included docker-compose.yml
provides:
- Automatic container restart
- Health checks
- Volume mounting for generated images
- Environment variable configuration
The codebase follows SOLID principles and clean architecture patterns for maintainability and testability.
src/
├── index.ts # Entry point with transport selection
├── server.ts # MCP server setup and tool registration
├── types.ts # TypeScript interfaces and Zod schemas
├── interfaces/ # Contract definitions (Dependency Inversion)
│ └── image-generation.interface.ts # Core interfaces for DI
├── services/ # Business logic (Single Responsibility)
│ ├── image-generator.ts # Main image generation service
│ ├── streaming-image-generator.ts # Streaming implementation
│ ├── file-converter.ts # File conversion utilities
│ └── openai-client-adapter.ts # OpenAI API adapter
├── adapters/ # Interface adapters (Open/Closed)
│ ├── cache-adapter.ts # Cache interface implementation
│ └── optimizer-adapter.ts # Image optimizer interface
├── tools/
│ ├── image-generation.ts # Tool endpoints using services
│ └── image-generation-streaming.ts # Streaming endpoints
├── transport/
│ └── http.ts # HTTP/SSE transport with session management
└── utils/
├── cache.ts # Two-tier caching system
└── image-optimizer.ts # Sharp-based image optimization
- Dependency Injection: All services depend on interfaces, not concrete implementations
- Single Responsibility: Each class has one clear purpose
- Open/Closed Principle: Services are extensible through interfaces
- Interface Segregation: Focused interfaces for specific concerns
- Liskov Substitution: All implementations are interchangeable
The examples/
directory contains complete, runnable examples:
-
test-client.ts
: Basic MCP client example -
streaming-client.ts
: Streaming image generation with SSE -
optimization-demo.ts
: Image optimization features demonstration
Run examples with:
npx tsx examples/streaming-client.ts
GPT Image-1 generates images by producing specialized image tokens. Cost and latency depend on:
Quality | Square (1024×1024) | Portrait (1024×1536) | Landscape (1536×1024) |
---|---|---|---|
Low | 272 tokens | 408 tokens | 400 tokens |
Medium | 1056 tokens | 1584 tokens | 1568 tokens |
High | 4160 tokens | 6240 tokens | 6208 tokens |
Pricing: $5.00/1M text input tokens, $10.00/1M image input tokens, $40.00/1M image output tokens
-
API Key Management:
- Never commit API keys to version control
- Use environment variables for sensitive data
- Rotate API keys regularly
-
Network Security:
- Configure CORS appropriately for production
- Use HTTPS in production environments
- Implement rate limiting for public deployments
-
Input Validation:
- All inputs are validated using Zod schemas
- File size limits are enforced
- Content moderation is applied by default
-
Use Lower Quality: Start with
quality: "low"
for drafts, then regenerate with higher quality - Enable Caching: Identical requests are served instantly from cache
-
Use Streaming: Get partial results faster with
partialImages
parameter
- Cache Results: Automatic caching prevents redundant API calls
- Optimize Images: Use compression to reduce storage and bandwidth
- Monitor Usage: Track cache stats to understand usage patterns
- Detailed Prompts: Include style, mood, lighting, and perspective details
- Reference Styles: Mention specific art styles or artists for consistency
- Iterative Refinement: Use edit_image tool to refine specific areas
- [x] Basic image generation and editing
- [x] Docker support
- [x] Pre-built distribution
- [x] Streaming Infrastructure (SSE-based)
- [x] Partial Image Simulation (1-3 previews)
- [x] Response Caching (memory + disk)
- [x] Image Optimization (format conversion & compression)
- [x] SOLID principles architecture refactoring
- [x] Comprehensive test suite with 90+ tests
- [x] Test coverage reporting (98%+ for core utilities)
- [x] TDD (Test-Driven Development) practices
- [x] Multi-turn editing with conversation context
- [x] OpenAI Responses API integration with GPT-4o + image_generation tool
- [x] Dual API support (Images API + Responses API) with seamless switching
- [ ] Batch processing with queue management
- [ ] WebSocket transport for bidirectional communication
- [ ] File upload support (direct image handling)
- [ ] Custom prompts library
- [ ] Usage analytics and cost tracking
- [ ] Web dashboard for server management
- [ ] Plugin system for custom processors
- [ ] Multi-model support (DALL-E 3 integration)
- Generation Time: Complex prompts may take up to 30 seconds
- Text Rendering: Generated text in images may have inconsistencies
- Response Format: Currently returns base64 images only (no URL support)
- Model Access: Requires organization verification for GPT Image-1
- Max Image Size: Limited by base64 encoding and transport constraints
- Concurrent Requests: Rate limited by OpenAI API quotas
- Cache Size: Limited by available disk space
The project uses Vitest for testing with comprehensive coverage:
# Run all tests
npm test
# Run tests with coverage
npm run test:coverage
# Run tests in watch mode
npm test -- --watch
# Run specific test file
npm test -- src/utils/cache.test.ts
- Overall: ~50% statements
- Core Services: 78.91% coverage
- Utilities: 98.88% coverage (Cache: 100%, ImageOptimizer: 97.69%)
- Server: 96.08% coverage
- Unit Tests: Comprehensive tests for all services and utilities
- Integration Tests: MCP server endpoint testing
- TDD Practice: Write tests first, then implementation
- Mocking: Proper dependency mocking for isolated testing
Contributions are welcome! Please follow our development practices:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Write tests first (TDD approach)
- Implement your feature following SOLID principles
- Ensure all tests pass (
npm test
) - Check test coverage (
npm run test:coverage
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow TypeScript best practices
- Maintain test coverage above 80%
- Use dependency injection for new services
- Follow existing code patterns and conventions
- Document complex logic with clear comments
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with the Model Context Protocol SDK
- Powered by OpenAI's GPT Image-1
- Image optimization by Sharp
- Inspired by the MCP community
- Model Context Protocol Documentation
- OpenAI Image Generation Guide
- GPT Image-1 Documentation
- MCP Server Examples
Note: This is an unofficial implementation. GPT Image-1 is a product of OpenAI.