Overview
sub200 provides unified access to state-of-the-art open-source TTS models through WebSocket API. Our curated selection covers edge deployment to research-grade synthesis.Available Models
- Enterprise Models
- Edge Models
Maya Research - Maya One
State-of-the-art research model with advanced neural architecture.Specifications
- Parameters: Enterprise-grade
- Latency: ~400ms
- Quality: 4.9/5.0
- GPU Required: Yes
Canopy Labs - Orpheus Three Billion
Professional model with three billion parameters for high-quality synthesis.Specifications
- Parameters: 3B
- Latency: ~350ms
- Quality: 4.8/5.0
- GPU Required: Yes
Nari Labs - Dia One Point Six Billion
Versatile conversational model with natural dialogue capabilities.Specifications
- Parameters: 1.6B
- Latency: ~220ms
- Quality: 4.5/5.0
- GPU Required: Yes
Sesame - CSM One Billion
Balanced billion-parameter model for production use.Specifications
- Parameters: 1B
- Latency: ~180ms
- Quality: 4.4/5.0
- GPU Required: Yes
Model Comparison
| Model | Parameters | Latency | Quality | GPU | Use Case |
|---|---|---|---|---|---|
| maya-research/maya1 | Enterprise | ~400ms | 4.9/5 | Yes | Research |
| canopylabs/orpheus-3b-0.1-ft | 3B | ~350ms | 4.8/5 | Yes | Production |
| nari-labs/Dia-1.6B | 1.6B | ~220ms | 4.5/5 | Yes | Dialogue |
| sesame/csm-1b | 1B | ~180ms | 4.4/5 | Yes | Balanced |
| hexgrad/Kokoro-82M | 82M | ~30ms | 4.0/5 | No | Edge |
| neuphonic/neutts-air | Compact | ~75ms | 4.2/5 | Optional | Cloud |
| ResembleAI/chatterbox | Medium | ~150ms | 4.3/5 | Recommended | Interactive |
| coqui/XTTS-v2 | Large | ~200ms | 4.5/5 | Yes | Multilingual |
WebSocket API
Connect to any model through our unified WebSocket API endpoint.Request Parameters
Model identifier for synthesisAvailable models:
- maya-research/maya1
- hexgrad/Kokoro-82M
- neuphonic/neutts-air
- coqui/XTTS-v2
- ResembleAI/chatterbox
- sesame/csm-1b
- nari-labs/Dia-1.6B
- canopylabs/orpheus-3b-0.1-ft
Text to synthesize (max length varies by model and plan)
Voice customization parameters
Voice consistency (0.0 to 1.0)
Voice character strength (0.0 to 1.0)
Speaking style intensity (0.0 to 1.0)
Speech rate multiplier (0.5 to 2.0)
Pitch shift in semitones (-12 to 12)
Audio output formatOptions:
- mp3_44100_128: MP3 at 44.1kHz, 128kbps
- mp3_22050_32: MP3 at 22.05kHz, 32kbps
- pcm_16000: Raw PCM at 16kHz
- pcm_22050: Raw PCM at 22.05kHz
- pcm_44100: Raw PCM at 44.1kHz
Response Format
Base64-encoded audio chunk
Stream metadata (sent with first chunk)
Total audio duration in milliseconds
Audio sample rate
Number of audio channels
Indicates if streaming is complete
Model Selection Guide
By Use Case
By Use Case
Real-time Applications
- Recommended: hexgrad/Kokoro-82M or neuphonic/neutts-air
- Latency: 30-75ms
- Use for: Voice assistants, live translation
- Recommended: maya-research/maya1 or canopylabs/orpheus-3b-0.1-ft
- Quality: Studio-grade
- Use for: Audiobooks, podcasts, narration
- Recommended: ResembleAI/chatterbox or nari-labs/Dia-1.6B
- Features: Emotion, context awareness
- Use for: Chatbots, virtual assistants
- Recommended: coqui/XTTS-v2
- Languages: Twenty-five plus
- Use for: International apps, dubbing
By Performance
By Performance
Lowest Latency
- hexgrad/Kokoro-82M: ~30ms
- neuphonic/neutts-air: ~75ms
- ResembleAI/chatterbox: ~150ms
- maya-research/maya1: 4.9/5.0
- canopylabs/orpheus-3b-0.1-ft: 4.8/5.0
- coqui/XTTS-v2: 4.5/5.0
- hexgrad/Kokoro-82M: 82M params
- neuphonic/neutts-air: Optimized
- ResembleAI/chatterbox: Balanced
Advanced Features
- Voice Cloning
- Emotion Control
- SSML Support
Available with coqui/XTTS-v2 model.Requirements:
- Ten to thirty seconds of clean audio
- Single speaker only
- Sixteen kHz or higher sample rate
Error Codes
| Code | Description | Resolution |
|---|---|---|
| INVALID_API_KEY | Invalid API key | Check credentials |
| RATE_LIMIT_EXCEEDED | Too many requests | Upgrade or wait |
| MODEL_NOT_FOUND | Invalid model ID | Check model name |
| TEXT_TOO_LONG | Exceeds limit | Split text |
| INVALID_PARAMETERS | Bad request | Check format |
| INSUFFICIENT_CREDITS | No credits | Add credits |
| SERVER_ERROR | Internal error | Retry |
Best Practices
Text Preprocessing
Text Preprocessing
Connection Management
Connection Management
Error Handling
Error Handling
SDK Installation
Migration Guide
- From ElevenLabs
- From Google TTS
- From Amazon Polly
Model Mapping:
- eleven_multilingual_v2 → canopylabs/orpheus-3b-0.1-ft
- eleven_turbo_v2 → neuphonic/neutts-air
- eleven_flash_v2 → hexgrad/Kokoro-82M
- WebSocket vs REST API
- Real-time streaming by default
- Open-source models
- More granular control
Support
Documentation
Comprehensive guides and tutorials
Discord
Join our community
GitHub
SDKs and examples
Status
Service monitoring
Need help? Contact our support team at sumit@sub200.dev
sub200 - Democratizing voice AI with open-source models

