Skip to main content

Overview

sub200 provides unified access to state-of-the-art open-source TTS models through WebSocket API. Our curated selection covers edge deployment to research-grade synthesis.

Available Models

  • Enterprise Models
  • Edge Models

Maya Research - Maya One

State-of-the-art research model with advanced neural architecture.

Specifications
  • Parameters: Enterprise-grade
  • Latency: ~400ms
  • Quality: 4.9/5.0
  • GPU Required: Yes
Best for: Research applications, premium audiobooks, high-end content production
{
  "model": "maya-research/maya1",
  "type": "research",
  "quality": "studio"
}

Canopy Labs - Orpheus Three Billion

Professional model with three billion parameters for high-quality synthesis.

Specifications
  • Parameters: 3B
  • Latency: ~350ms
  • Quality: 4.8/5.0
  • GPU Required: Yes
Best for: Commercial production, media content, professional narration
{
  "model": "canopylabs/orpheus-3b-0.1-ft",
  "type": "professional",
  "parameters": "3B"
}

Nari Labs - Dia One Point Six Billion

Versatile conversational model with natural dialogue capabilities.

Specifications
  • Parameters: 1.6B
  • Latency: ~220ms
  • Quality: 4.5/5.0
  • GPU Required: Yes
Best for: Conversational AI, virtual assistants, interactive applications
{
  "model": "nari-labs/Dia-1.6B",
  "type": "conversational",
  "parameters": "1.6B"
}

Sesame - CSM One Billion

Balanced billion-parameter model for production use.

Specifications
  • Parameters: 1B
  • Latency: ~180ms
  • Quality: 4.4/5.0
  • GPU Required: Yes
Best for: General purpose applications, stable performance, production systems
{
  "model": "sesame/csm-1b",
  "type": "balanced",
  "parameters": "1B"
}

Model Comparison

ModelParametersLatencyQualityGPUUse Case
maya-research/maya1Enterprise~400ms4.9/5YesResearch
canopylabs/orpheus-3b-0.1-ft3B~350ms4.8/5YesProduction
nari-labs/Dia-1.6B1.6B~220ms4.5/5YesDialogue
sesame/csm-1b1B~180ms4.4/5YesBalanced
hexgrad/Kokoro-82M82M~30ms4.0/5NoEdge
neuphonic/neutts-airCompact~75ms4.2/5OptionalCloud
ResembleAI/chatterboxMedium~150ms4.3/5RecommendedInteractive
coqui/XTTS-v2Large~200ms4.5/5YesMultilingual

WebSocket API

Connect to any model through our unified WebSocket API endpoint.
const WebSocket = require('ws');

const ws = new WebSocket('wss://api.sub200.dev/v1/tts', {
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

ws.on('open', () => {
  ws.send(JSON.stringify({
    model: 'hexgrad/Kokoro-82M',
    text: 'Hello, world!',
    voice_settings: {
      stability: 0.75,
      similarity_boost: 0.75,
      style: 0.5,
      speed: 1.0
    },
    output_format: 'mp3_44100_128'
  }));
});

ws.on('message', (data) => {
  // Handle audio chunks
  console.log('Received audio chunk:', data.length, 'bytes');
});

Request Parameters

model
string
required
Model identifier for synthesisAvailable models:
  • maya-research/maya1
  • hexgrad/Kokoro-82M
  • neuphonic/neutts-air
  • coqui/XTTS-v2
  • ResembleAI/chatterbox
  • sesame/csm-1b
  • nari-labs/Dia-1.6B
  • canopylabs/orpheus-3b-0.1-ft
text
string
required
Text to synthesize (max length varies by model and plan)
voice_settings
object
Voice customization parameters
voice_settings.stability
number
default:"0.75"
Voice consistency (0.0 to 1.0)
voice_settings.similarity_boost
number
default:"0.75"
Voice character strength (0.0 to 1.0)
voice_settings.style
number
default:"0.5"
Speaking style intensity (0.0 to 1.0)
voice_settings.speed
number
default:"1.0"
Speech rate multiplier (0.5 to 2.0)
voice_settings.pitch
number
default:"0"
Pitch shift in semitones (-12 to 12)
output_format
string
default:"mp3_44100_128"
Audio output formatOptions:
  • mp3_44100_128: MP3 at 44.1kHz, 128kbps
  • mp3_22050_32: MP3 at 22.05kHz, 32kbps
  • pcm_16000: Raw PCM at 16kHz
  • pcm_22050: Raw PCM at 22.05kHz
  • pcm_44100: Raw PCM at 44.1kHz

Response Format

audio
string
Base64-encoded audio chunk
metadata
object
Stream metadata (sent with first chunk)
metadata.duration_ms
number
Total audio duration in milliseconds
metadata.sample_rate
number
Audio sample rate
metadata.channels
number
Number of audio channels
done
boolean
Indicates if streaming is complete

Model Selection Guide

Real-time Applications
  • Recommended: hexgrad/Kokoro-82M or neuphonic/neutts-air
  • Latency: 30-75ms
  • Use for: Voice assistants, live translation
Professional Content
  • Recommended: maya-research/maya1 or canopylabs/orpheus-3b-0.1-ft
  • Quality: Studio-grade
  • Use for: Audiobooks, podcasts, narration
Conversational AI
  • Recommended: ResembleAI/chatterbox or nari-labs/Dia-1.6B
  • Features: Emotion, context awareness
  • Use for: Chatbots, virtual assistants
Multilingual Projects
  • Recommended: coqui/XTTS-v2
  • Languages: Twenty-five plus
  • Use for: International apps, dubbing
Lowest Latency
  • hexgrad/Kokoro-82M: ~30ms
  • neuphonic/neutts-air: ~75ms
  • ResembleAI/chatterbox: ~150ms
Highest Quality
  • maya-research/maya1: 4.9/5.0
  • canopylabs/orpheus-3b-0.1-ft: 4.8/5.0
  • coqui/XTTS-v2: 4.5/5.0
Best Efficiency
  • hexgrad/Kokoro-82M: 82M params
  • neuphonic/neutts-air: Optimized
  • ResembleAI/chatterbox: Balanced

Advanced Features

  • Voice Cloning
  • Emotion Control
  • SSML Support
Available with coqui/XTTS-v2 model.
const request = {
  model: "coqui/XTTS-v2",
  text: "Hello with cloned voice",
  voice_clone: {
    audio_url: "https://example.com/sample.wav",
    language: "en"
  }
};
Requirements:
  • Ten to thirty seconds of clean audio
  • Single speaker only
  • Sixteen kHz or higher sample rate

Error Codes

CodeDescriptionResolution
INVALID_API_KEYInvalid API keyCheck credentials
RATE_LIMIT_EXCEEDEDToo many requestsUpgrade or wait
MODEL_NOT_FOUNDInvalid model IDCheck model name
TEXT_TOO_LONGExceeds limitSplit text
INVALID_PARAMETERSBad requestCheck format
INSUFFICIENT_CREDITSNo creditsAdd credits
SERVER_ERRORInternal errorRetry

Best Practices

import re

def preprocess_text(text):
    # Expand abbreviations
    replacements = {
        "Dr.": "Doctor",
        "Mr.": "Mister",
        "Mrs.": "Misses",
        "vs.": "versus"
    }
    
    for abbr, full in replacements.items():
        text = text.replace(abbr, full)
    
    # Clean special characters
    text = re.sub(r'[^\w\s.,!?;:\-\'"()]', '', text)
    
    # Ensure proper ending punctuation
    if not text.endswith(('.', '!', '?')):
        text += '.'
    
    return text
class ConnectionManager {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.reconnectAttempts = 0;
    this.maxReconnects = 5;
  }

  connect() {
    this.ws = new WebSocket('wss://api.sub200.dev/v1/tts', {
      headers: { 'Authorization': `Bearer ${this.apiKey}` }
    });

    this.ws.on('close', () => {
      this.handleReconnect();
    });
  }

  handleReconnect() {
    if (this.reconnectAttempts < this.maxReconnects) {
      const delay = Math.pow(2, this.reconnectAttempts) * 1000;
      setTimeout(() => {
        this.reconnectAttempts++;
        this.connect();
      }, delay);
    }
  }
}
class ErrorHandler:
    def handle_error(self, error):
        error_handlers = {
            'RATE_LIMIT_EXCEEDED': self.handle_rate_limit,
            'INVALID_API_KEY': self.handle_auth_error,
            'TEXT_TOO_LONG': self.handle_text_length,
            'SERVER_ERROR': self.handle_server_error
        }
        
        handler = error_handlers.get(
            error['code'], 
            self.handle_generic
        )
        return handler(error)
    
    def handle_rate_limit(self, error):
        reset_time = error['details']['reset_at']
        print(f"Rate limit hit. Resets at {reset_time}")
        # Implement backoff strategy

SDK Installation

npm install @sub200/tts

Migration Guide

  • From ElevenLabs
  • From Google TTS
  • From Amazon Polly
Model Mapping:
  • eleven_multilingual_v2 → canopylabs/orpheus-3b-0.1-ft
  • eleven_turbo_v2 → neuphonic/neutts-air
  • eleven_flash_v2 → hexgrad/Kokoro-82M
Key Differences:
  • WebSocket vs REST API
  • Real-time streaming by default
  • Open-source models
  • More granular control

Support

Need help? Contact our support team at sumit@sub200.dev
sub200 - Democratizing voice AI with open-source models