Introduction

TTP Agent SDK is a powerful JavaScript library for building AI-powered voice and chat interactions in your web applications.

Key Features

🔒
Simple Authentication

Direct connection using agentId and appId with domain whitelist access control

🎨
Fully Customizable

Colors, branding, languages, RTL support, and custom agent settings

📱
Mobile Optimized

Works seamlessly on desktop, tablet, and mobile devices

🌍
Multi-language

Built-in support for multiple languages and custom translations

Real-time Streaming

WebSocket-based audio streaming with low latency

🔧
Easy Integration

Simple CDN setup or NPM package with comprehensive API

Installation

NPM

npm install ttp-agent-sdk

CDN

<script src="https://unpkg.com/ttp-agent-sdk/dist/agent-widget.js"></script>

Import

// ES6 Import
import { VoiceSDK } from 'ttp-agent-sdk';

// CommonJS
const { VoiceSDK } = require('ttp-agent-sdk');

// Browser Global
const sdk = new window.TTPAgentSDK.VoiceSDK(config);

Quick Start

Get up and running in 5 minutes with this simple example.

1

Initialize the SDK

Create a VoiceSDK instance with your agent ID and app ID:

import { VoiceSDK } from 'ttp-agent-sdk';

const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',     // The AI agent to connect to
  appId: 'your_app_id',     // Your application ID
  
  // Optional: Configure audio formats (v2 protocol)
  outputContainer: 'raw',      // 'raw' or 'wav'
  outputEncoding: 'pcm',       // 'pcm', 'pcmu', 'pcma'
  outputSampleRate: 24000,     // Typical server/TTS output (default)
  protocolVersion: 2           // Use v2 protocol for format negotiation
});

// Listen to events
voiceSDK.on('connected', () => {
  console.log('✅ Connected to agent');
});

voiceSDK.on('formatNegotiated', (format) => {
  console.log('✅ Format negotiated:', format);
  // Format contains: container, encoding, sampleRate, channels, bitDepth
});

voiceSDK.on('message', (msg) => {
  if (msg.t === 'agent_response') {
    console.log('Agent:', msg.agent_response);
  }
});
2

Connect & Start Recording

Connect to the agent and start capturing audio. If your agent uses a server-driven disclaimer, listen for disclaimersRequired, call sendDisclaimerAck(true) after the user accepts, then call startRecording().

// Connect
await voiceSDK.connect();

// Start recording
await voiceSDK.startRecording();

// Stop recording
await voiceSDK.stopRecording();

Authentication

The SDK connects directly using agentId and appId. No server-side authentication step is needed. Access control is managed via domain whitelist in your agent's admin panel.

How It Works

💡 Simple Setup: Just provide your agentId and appId in the SDK configuration. The SDK connects directly to the TTP backend via WebSocket.
const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',     // Your AI agent ID
  appId: 'your_app_id'      // Your application ID
});

Domain Whitelist

To control which websites can use your agent, configure a domain whitelist in your agent's admin panel. Only requests originating from whitelisted domains will be accepted.

💡 Tip: During development, you can add localhost to the domain whitelist. Remove it before deploying to production.

Configuration Parameters

Parameter Type Required Description
agentId string Yes The AI agent identifier
appId string Yes Your application identifier

Agent Settings Override

NEW FEATURE

Dynamically customize agent behavior, voice, and personality on a per-session basis.

💡 Access Control: Agent settings override is available when the agent has domain whitelist configured in the admin panel. No additional authentication step is required.

How It Works

  1. Configure: Set up a domain whitelist for your agent in the admin panel
  2. Initialize: Pass agentSettingsOverride in the SDK configuration
  3. Connect: The SDK sends overrides in the hello message
  4. TTP Backend: Validates the domain and applies your overrides

Example

const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',     // The AI agent to connect to
  appId: 'your_app_id',     // Your application ID
  
  // Override agent settings
  agentSettingsOverride: {
    // Core settings
    prompt: "You are a friendly Spanish-speaking travel assistant",
    language: "es",
    temperature: 0.9,
    maxTokens: 200,
    
    // Voice settings
    voiceSpeed: 1.2,
    voiceId: "nova",  // Use voiceId (not selectedVoice)
    
    // Behavior
    firstMessage: "¡Hola! ¿Cómo puedo ayudarte hoy?",
    disableInterruptions: false,
    autoDetectLanguage: true,
    
    // Tools (optional)
    toolIds: [123, 456, 789],              // Custom tool IDs
    internalToolIds: ['calendar', 'email'] // Internal tool IDs
  }
});

Available Override Settings

15 out of 16 settings can be overridden. Only model selection is not supported (requires infrastructure changes).

📝 Core Settings

  • prompt - System prompt/instructions
  • temperature - LLM temperature (0-2)
  • maxTokens - Maximum response tokens
  • model - ⚠️ NOT SUPPORTED
  • language - Response language code

🔊 Voice Settings

  • voiceId - Specific voice ID
  • voiceSpeed - Speed multiplier (0.5-2)

⚙️ Behavior

  • firstMessage - Initial greeting
  • disableInterruptions - Allow/prevent barge-in
  • autoDetectLanguage - Auto language detection
  • candidateLanguages - List of languages for auto-detection
  • maxCallDuration - Max session duration (seconds)

🛠️ Advanced

  • toolIds - Array of custom tool IDs
  • internalToolIds - Array of internal tool IDs
  • timezone - User timezone
⚠️ Validation: All overrides are validated and sanitized on the server. Invalid values will be rejected or clamped to safe ranges.

Variables in Hello Request

Overview

Variables allow you to pass dynamic values to your agent that will be used to replace placeholders in the system prompt and first message. Variables sent in the hello request take precedence over default variables stored in the agent configuration.

Hello Message Format

SDK v2 Format (Recommended)

When using VoiceSDK v2, variables are passed in the SDK constructor:

import { VoiceSDK_v2 } from 'ttp-agent-sdk';

const voiceSDK = new VoiceSDK_v2({
  agentId: 'agent_5a2b984c1',
  appId: 'app_Bc01EqMQt2Euehl4qqZSi6l3FJP42Q9vJ0pC',
  
  // Variables (optional)
  variables: {
    USER_NAME: 'John',
    ACCOUNT_TYPE: 'premium',
    LANGUAGE: 'en-US'
  },
  
  // Audio format configuration
  sampleRate: 16000,
  channels: 1,
  bitDepth: 16,
  outputContainer: 'raw',
  outputEncoding: 'pcm',
  outputSampleRate: 24000,
  outputChannels: 1,
  outputBitDepth: 16,
  outputFrameDurationMs: 600
});

await voiceSDK.connect();

Raw WebSocket Format

If connecting via raw WebSocket (without SDK), send variables in the hello message:

{
  "t": "hello",
  "v": 2,
  "variables": {
    "USER_NAME": "John",
    "ACCOUNT_TYPE": "premium",
    "LANGUAGE": "en-US"
  },
  "inputFormat": {
    "encoding": "pcm",
    "sampleRate": 16000,
    "channels": 1,
    "bitDepth": 16
  },
  "requestedOutputFormat": {
    "encoding": "pcm",
    "sampleRate": 24000,
    "channels": 1,
    "bitDepth": 16,
    "container": "raw"
  },
  "outputFrameDurationMs": 600
}

Variable Format

Variables are sent as a JSON object where:

  • Keys: Variable names (case-sensitive, e.g., USER_NAME)
  • Values: String values that will replace {{VARIABLE_NAME}} in the prompt

Example

{
  "variables": {
    "USER_NAME": "John",
    "ACCOUNT_TYPE": "premium",
    "LANGUAGE": "en-US",
    "COMPANY": "Acme Corp"
  }
}

Variable Replacement Priority

Variables are replaced in the following priority order:

  1. Hello Variables (highest priority) - Variables sent in the hello request
  2. Default Variables - Variables stored in agent configuration (Redis)
  3. Leave as-is - If no value found, {{VARIABLE_NAME}} remains unchanged

Example Priority

Agent Configuration (Redis):

{
  "USER_NAME": "David",
  "ACCOUNT_TYPE": "premium"
}

Hello Request:

{
  "variables": {
    "USER_NAME": "John"
  }
}

Result:

  • {{USER_NAME}}"John" (from hello - takes precedence)
  • {{ACCOUNT_TYPE}}"premium" (from defaults - hello doesn't override)

Usage Examples

Example 1: JavaScript/TypeScript with SDK v2

import { VoiceSDK_v2 } from 'ttp-agent-sdk';

const voiceSDK = new VoiceSDK_v2({
  agentId: 'agent_5a2b984c1',
  appId: 'app_Bc01EqMQt2Euehl4qqZSi6l3FJP42Q9vJ0pC',
  
  variables: {
    USER_NAME: 'John Doe',
    ACCOUNT_TYPE: 'premium',
    LANGUAGE: 'en-US'
  },
  
  // ... audio format config
});

await voiceSDK.connect();

Example 2: Raw WebSocket (JavaScript)

const ws = new WebSocket('wss://speech.talktopc.com/ws/conv?agentId=agent_5a2b984c1&appId=app_Bc01EqMQt2Euehl4qqZSi6l3FJP42Q9vJ0pC');

ws.onopen = () => {
  const helloMessage = {
    t: 'hello',
    v: 2,
    variables: {
      USER_NAME: 'John',
      ACCOUNT_TYPE: 'premium',
      LANGUAGE: 'en-US'
    },
    inputFormat: {
      encoding: 'pcm',
      sampleRate: 16000,
      channels: 1,
      bitDepth: 16
    },
    requestedOutputFormat: {
      encoding: 'pcm',
      sampleRate: 24000,
      channels: 1,
      bitDepth: 16,
      container: 'raw'
    },
    outputFrameDurationMs: 600
  };
  
  ws.send(JSON.stringify(helloMessage));
};

Example 3: Python WebSocket

import websocket
import json

def on_open(ws):
    hello_message = {
        "t": "hello",
        "v": 2,
        "variables": {
            "USER_NAME": "John",
            "ACCOUNT_TYPE": "premium",
            "LANGUAGE": "en-US"
        },
        "inputFormat": {
            "encoding": "pcm",
            "sampleRate": 16000,
            "channels": 1,
            "bitDepth": 16
        },
        "requestedOutputFormat": {
            "encoding": "pcm",
            "sampleRate": 24000,
            "channels": 1,
            "bitDepth": 16,
            "container": "raw"
        },
        "outputFrameDurationMs": 600
    }
    
    ws.send(json.dumps(hello_message))

ws = websocket.WebSocketApp(
    "wss://speech.talktopc.com/ws/conv?agentId=agent_5a2b984c1&appId=app_Bc01EqMQt2Euehl4qqZSi6l3FJP42Q9vJ0pC",
    on_open=on_open
)
ws.run_forever()

Example 4: cURL / wscat

# Using wscat
wscat -c "wss://speech.talktopc.com/ws/conv?agentId=agent_5a2b984c1&appId=app_Bc01EqMQt2Euehl4qqZSi6l3FJP42Q9vJ0pC"

# Then send:
{"t":"hello","v":2,"variables":{"USER_NAME":"John","ACCOUNT_TYPE":"premium"},"inputFormat":{"encoding":"pcm","sampleRate":16000,"channels":1,"bitDepth":16},"requestedOutputFormat":{"encoding":"pcm","sampleRate":24000,"channels":1,"bitDepth":16,"container":"raw"},"outputFrameDurationMs":600}

Agent Prompt Setup

To use variables in your agent, include placeholders in the system prompt or first message:

System Prompt Example

Your name is {{AGENT_NAME}}.
You are helping {{USER_NAME}} who has a {{ACCOUNT_TYPE}} account.
Speak in {{LANGUAGE}}.

First Message Example

Hello {{USER_NAME}}! Welcome to our {{ACCOUNT_TYPE}} service.

Backend Processing

When the hello request is received with variables:

  1. Variables are extracted from the hello message
  2. Default variables are loaded from agent configuration (Redis)
  3. Variables are merged (hello variables take precedence)
  4. Prompt is processed - {{VARIABLE_NAME}} placeholders are replaced
  5. First message is processed - Variables are replaced here too
  6. Metadata is added - Variables metadata section is appended to prompt

Server Logs

After sending hello with variables, check server logs for:

📝 Processing variables from hello message: [USER_NAME, ACCOUNT_TYPE, LANGUAGE]
✅ Variables processed and prompt updated
📝 FINAL PROCESSED PROMPT (agentId: ...):
[Prompt with variables replaced]
✅ Processed variables: 3 hello variables, 2 default variables, metadata: added

Variable Naming Conventions

  • Use UPPERCASE with underscores: USER_NAME, ACCOUNT_TYPE
  • Variable names are case-sensitive
  • Avoid special characters except underscores
  • Recommended format: {{VARIABLE_NAME}} in prompts

Common Use Cases

1. User Personalization

{
  "variables": {
    "USER_NAME": "John Doe",
    "USER_EMAIL": "john@example.com"
  }
}

2. Account Context

{
  "variables": {
    "ACCOUNT_TYPE": "premium",
    "SUBSCRIPTION_STATUS": "active"
  }
}

3. Language/Localization

{
  "variables": {
    "LANGUAGE": "en-US",
    "CURRENCY": "USD"
  }
}

4. Session Context

{
  "variables": {
    "SESSION_ID": "abc123",
    "PAGE_URL": "https://example.com/products"
  }
}

Error Handling

Missing Variables

If a variable is referenced in the prompt but not provided:

  • Has default value: Uses default from agent configuration
  • No default value: Placeholder remains unchanged ({{VARIABLE_NAME}})

Invalid Variable Format

  • Variables must be a JSON object
  • Values should be strings (will be converted to string if needed)
  • Empty object {} or null is valid (will use defaults only)

Best Practices

  1. Set defaults in agent configuration for all variables
  2. Override with hello variables only when you have dynamic values
  3. Use descriptive names that clearly indicate the variable's purpose
  4. Document variables in your agent's description or notes
  5. Test variables by checking server logs for "FINAL PROCESSED PROMPT"

API Reference

Hello Message Structure

interface HelloMessage {
  t: "hello";                    // Message type
  v?: number;                    // SDK version (2 for v2)
  variables?: {                   // Optional variables object
    [key: string]: string;        // Variable name -> value mapping
  };
  inputFormat?: AudioFormat;     // Input audio format
  requestedOutputFormat?: AudioFormat; // Output audio format
  outputFrameDurationMs?: number; // Frame duration for streaming
}

interface AudioFormat {
  encoding: "pcm" | "pcmu" | "pcma";
  sampleRate: number;
  channels: number;
  bitDepth: number;
  container?: "raw" | "wav";      // For output format only
}

Troubleshooting

Variables Not Being Replaced

  1. Check variable names match exactly (case-sensitive)
  2. Verify variables are sent in hello message (check logs)
  3. Check server logs for "FINAL PROCESSED PROMPT" to see actual replacement

Variables Not in Hello Message

  • SDK v2: Check if SDK supports variables in constructor
  • Raw WebSocket: Ensure variables field is included in JSON
  • Check WebSocket message is sent after connection opens

Default Variables Not Used

  • Verify variables are stored in Redis (check agent configuration)
  • Check extractVariablesFromAgentConfig is working
  • Look for "Default variables not found in state" warnings in logs

Events & Callbacks

The SDK emits events for all important state changes and interactions.

Event Categories

Connection Events

voiceSDK.on('connected', () => {
  console.log('✅ Connected to agent');
});

voiceSDK.on('disconnected', (event) => {
  console.log('❌ Disconnected:', event.reason);
  console.log('Close code:', event.code);
});

voiceSDK.on('error', (error) => {
  console.error('Error:', error);
});

Recording Events

voiceSDK.on('recordingStarted', () => {
  console.log('🎤 Recording started');
});

voiceSDK.on('recordingStopped', () => {
  console.log('⏹️ Recording stopped');
});

Message Events

voiceSDK.on('message', (msg) => {
  switch(msg.type) {
    case 'agent_response':
      console.log('Agent:', msg.agent_response);
      break;
    case 'transcription':
      console.log('You said:', msg.text);
      break;
    // ... other message types
  }
});

Audio Events

voiceSDK.on('playbackStarted', () => {
  console.log('🔊 Audio playback started');
});

voiceSDK.on('playbackStopped', () => {
  console.log('🔇 Audio playback stopped');
});

voiceSDK.on('audioData', (audioData) => {
  // Raw audio data (Uint8Array)
});

Pause Events

// Call paused (server acknowledged)
voiceSDK.on('callPaused', (data) => {
  console.log('⏸️ Call paused, timeout:', data.timeoutSeconds, 'seconds');
});

// Call resumed (server acknowledged, STT ready)
voiceSDK.on('callResumed', () => {
  console.log('▶️ Call resumed');
});

// Pause timeout (call auto-ended because pause lasted too long)
voiceSDK.on('pauseTimeout', () => {
  console.log('⏱️ Pause timeout — call ended');
});

Special Events

// Barge-in (user interrupts agent)
voiceSDK.on('bargeIn', (message) => {
  console.log('User interrupted the agent');
});

// Format negotiation (v2 protocol only)
voiceSDK.on('formatNegotiated', (format) => {
  console.log('Format negotiated:', format);
  // format: { container, encoding, sampleRate, channels, bitDepth }
});

// Greeting audio
voiceSDK.on('greetingStarted', () => {
  console.log('Playing greeting message');
});

// Domain whitelist error
voiceSDK.on('domainError', (error) => {
  console.error('Domain not whitelisted:', error.reason);
});

// Server-driven disclaimer (voice v2) — see #server-driven-disclaimer
voiceSDK.on('disclaimersRequired', (payload) => {
  // Show your UI, then call voiceSDK.sendDisclaimerAck(true|false)
});
voiceSDK.on('disclaimerRejected', ({ code, message }) => {
  // DISCLAIMER_DECLINED, DISCLAIMER_TIMEOUT, DISCLAIMER_HASH_MISMATCH, etc.
});

Protocol v2 - Format Negotiation

The SDK v2 introduces format negotiation, allowing you to specify exactly what audio format you want to receive from the server.

Server-driven disclaimer (voice)

Some deployments must show exact legal or policy copy from the server before speech recognition and the agent greeting run. The conversation server can require an explicit acknowledgement step after hello_ack. This applies to VoiceSDK v2 (protocol version 2) and the Voice & Chat Widget voice path.

When the gate is active

  • The agent has a non-empty disclaimers list in backend storage (Redis field disclaimers as a JSON array of plain-text strings). An empty array [] means no disclaimer gate.
  • The session is not a resumed voice call (resume skips the gate).

While the gate is open, the server:

  • Does not open STT or play the greeting.
  • Rejects start_continuous_mode with {"ok":false,"t":"error","code":"DISCLAIMER_PENDING",...}.
  • Drops uplink binary audio (microphone data) until the gate clears.
  • Starts a server-side timer; if the user never acknowledges, the session is closed with DISCLAIMER_TIMEOUT (duration is configured on the server, typically on the order of minutes).

Voice capture uses an AudioWorklet. On iPhone and iPad, WebKit often runs the capture AudioContext at the device hardware rate (commonly 44.1 kHz or 48 kHz) even when a lower rate was requested, while the server expects PCM at the input rate negotiated in hello_ack (typically 16 kHz). The SDK resamples uplink PCM to that negotiated rate before sending binary frames. The recorder connects the worklet through a zero-gain node to the destination so WebKit reliably pulls the processor (graphs that dead-end at the worklet alone may not run on some builds). On mobile, after getUserMedia succeeds, the SDK may prime the shared recorder context while user activation is still fresh, before the WebSocket handshake and hello_ack finish. For embedded widgets, use allow="microphone" on the iframe when the host page is cross-origin.

Embed sites: CSP and “Unable to load a worklet's module”

Strict Content-Security-Policy on the parent page can block audioWorklet.addModule() when the processor URL points at another host (for example cdn.talktopc.com), which surfaces as AbortError: Unable to load a worklet's module and voice never reaches the server. The SDK tries that URL first, then automatically retries using the capture worklet source bundled inside the widget script and a blob: URL—this works on many shops without whitelisting our CDN. If both attempts fail, relax CSP (often worker-src / script-src must allow blob:, and/or your CDN origin for the processor), or host audio-processor.js on your own domain and set voice.audioProcessorPath to that same-origin URL.

hello_ack fields (gate active)

When disclaimers apply, the server includes:

Field Type Description
disclaimersRequired boolean Must be true when the gate is active
disclaimerTexts string[] Plain-text lines to show the user (no HTML; escape/sanitize in your UI)
disclaimersHash string SHA-256 (hex) over the canonical text list; echoed in disclaimer_ack for verification
disclaimerTimeoutMs number Hint for UI (countdown copy); server enforces the real timeout independently

Client message: disclaimer_ack

After the user accepts or declines, send:

{
  "t": "disclaimer_ack",
  "accepted": true,
  "disclaimersHash": "sha256-from-hello_ack",
  "conversationId": "optional-matches-hello_ack"
}

For decline, set "accepted": false. Duplicate acks for the same session are ignored. The SDK method sendDisclaimerAck(accepted) builds this frame using the hash and conversation id from hello_ack.

Server error frames (t: "error")

code Meaning
DISCLAIMER_PENDING Client tried to start continuous mode or stream audio before sending a successful ack; session stays open—send disclaimer_ack then retry
DISCLAIMER_DECLINED User declined; server ends the conversation
DISCLAIMER_TIMEOUT No acknowledgement in time; connection closed
DISCLAIMER_HASH_MISMATCH Ack hash did not match server expectation; connection closed

Using VoiceSDK v2 (custom apps)

  1. Use protocol v2 (default in current SDK): protocolVersion: 2.
  2. After connect(), wait for hello_ack. The SDK sets voiceSDK.disclaimersPending === true when the gate is active.
  3. Listen for disclaimersRequired. The payload includes texts, disclaimersHash, disclaimerTimeoutMs, and conversationId for your UI.
  4. Show your own modal or screen with the given texts. Do not call startRecording() until the user has accepted and you have called sendDisclaimerAck(true) (calling startRecording() while disclaimersPending is still true throws with error.code === 'DISCLAIMER_PENDING').
  5. On accept: voiceSDK.sendDisclaimerAck(true). The server then opens STT, plays the greeting, and accepts start_continuous_mode.
  6. On decline: voiceSDK.sendDisclaimerAck(false). The server closes the session; the SDK emits disclaimerRejected and the raw message event for the error frame.
  7. Handle disclaimerRejected for terminal server outcomes (DISCLAIMER_DECLINED, DISCLAIMER_TIMEOUT, DISCLAIMER_HASH_MISMATCH). Handle error for DISCLAIMER_PENDING (ordering bug or race—fix by ack first).
const voiceSDK = new VoiceSDK({ agentId, appId, protocolVersion: 2 });

voiceSDK.on('disclaimersRequired', (payload) => {
  showMyModal({
    texts: payload.texts,
    onAccept: () => voiceSDK.sendDisclaimerAck(true),
    onDecline: () => voiceSDK.sendDisclaimerAck(false)
  });
});

voiceSDK.on('disclaimerRejected', ({ code, message }) => {
  console.warn('Disclaimer flow ended:', code, message);
});

voiceSDK.on('error', (err) => {
  if (err.code === 'DISCLAIMER_PENDING') {
    console.warn('Start recording only after sendDisclaimerAck(true)');
  }
});

await voiceSDK.connect();
// Only after ack (or if disclaimersRequired never fired):
await voiceSDK.startRecording();

SDK state you may read: disclaimersPending, disclaimersHash, lastDisclaimerPayload (set from hello_ack). After sendDisclaimerAck runs, the SDK clears disclaimersPending locally when the WebSocket send succeeds.

Voice & Chat Widget (built-in behavior)

No extra widget options are required. When the server sends the disclaimer gate:

  • VoiceInterface waits after the WebSocket is up and hello_ack is processed.
  • It opens a built-in modal (Notice / Accept / Decline) with disclaimerTexts from the server.
  • Accept: on desktop, the widget sends sendDisclaimerAck(true) immediately, then requests the microphone and startListening. On mobile, it waits until the user has granted microphone access (and the post-grant audio delay) before sending sendDisclaimerAck(true), so the server does not stream the greeting over the system permission sheet or get cut off when capture starts.
  • Decline (No thanks) calls sendDisclaimerAck(false), waits briefly so the ack reaches the server (which ends the conversation and closes the socket), then disconnects the client, invokes onConversationEnd on the wrapper SDK since recording never started, resets UI, and returns to landing (or idle voice in voice-only mode).

To match your site language, use the widget’s existing language / translation hooks for general UI; disclaimer body text always comes from the server (compliance copy).

Resume & text chat

Resume: Resumed voice sessions omit the disclaimer gate.

Text chat: Server-driven disclaimer is implemented for voice in this release. The text WebSocket hello path does not yet mirror these fields; extend the backend and TextChatSDK if you need the same gate for text-only sessions.

🎯 Key Benefits:
  • Format Control: Request specific audio formats (container, encoding, sample rate, bit depth)
  • Automatic Conversion: SDK automatically converts audio if backend sends different format
  • Quality Optimization: Choose optimal formats for your use case (e.g., 48kHz for high quality, 8kHz for bandwidth savings)
  • Protocol Support: Uses v2 protocol with format negotiation

Supported Formats

Input Formats (What SDK Sends)

Property Supported Values
encoding 'pcm', 'pcmu' (μ-law), 'pcma' (A-law)
sampleRate 8000, 16000, 22050, 24000, 44100, 48000 Hz
bitDepth 8, 16, 24 bits
channels 1 (mono only)

Output Formats (What SDK Receives)

Property Supported Values
container 'raw' (no header), 'wav' (with WAV header)
encoding 'pcm', 'pcmu' (μ-law), 'pcma' (A-law)
sampleRate 8000, 16000, 22050, 24000, 44100, 48000 Hz
bitDepth 8, 16, 24 bits
channels 1 (mono only)

Format Negotiation Flow

1. SDK Initialization

Configure requested output format

2. Connect & Send Hello

SDK sends format request in hello message

3. Server Response

Server sends hello_ack with negotiated format

4. Format Negotiated Event

SDK emits 'formatNegotiated' event

5. Automatic Conversion

If formats differ, SDK converts automatically

Example: High-Quality Audio

const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  // Request high-quality audio
  outputContainer: 'raw',        // Raw PCM for lower latency
  outputEncoding: 'pcm',         // Uncompressed PCM
  outputSampleRate: 48000,       // 48kHz for high quality
  outputBitDepth: 16,            // 16-bit depth
  outputChannels: 1,             // Mono
  outputFrameDurationMs: 600,    // 600ms frames
  
  protocolVersion: 2             // Enable format negotiation
});

voiceSDK.on('formatNegotiated', (format) => {
  console.log('Negotiated format:', format);
  // If backend sends different format, SDK will convert automatically
});

Example: Bandwidth-Optimized Audio

const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  // Request compressed, low-bandwidth audio
  outputContainer: 'raw',
  outputEncoding: 'pcmu',        // μ-law compression (8kHz equivalent)
  outputSampleRate: 8000,        // 8kHz for bandwidth savings
  outputBitDepth: 16,
  outputChannels: 1,
  
  protocolVersion: 2
});

Format Conversion

If the backend sends audio in a different format than requested, the SDK automatically converts it:

  • Container: WAV ↔ Raw PCM extraction/wrapping
  • Encoding: PCM ↔ PCMU/PCMA encoding/decoding
  • Sample Rate: Automatic resampling using Web Audio API
  • Bit Depth: 8-bit ↔ 16-bit ↔ 24-bit conversion
  • Channels: Mono/stereo conversion (if needed)
💡 Best Practices:
  • Use protocolVersion: 2 for new projects
  • Request formats that match your use case (quality vs. bandwidth)
  • 48kHz is recommended for best quality (matches most browser defaults)
  • Raw PCM is lower latency than WAV (no header overhead)
  • Listen to formatNegotiated event to verify format

Voice & Chat Widget

Pre-built, customizable widget with voice and text chat - perfect for adding AI conversation to any website.

Agent display name

  • Set the name on the voice idle hero (and the letter inside the avatar when no image URL is set) with the root property agentName only.
  • Use header.pillTitle for a short CTA on the desktop floating pill and the mobile FAB (e.g. “Talk to me” / “דברו איתי”); when empty, both use header.title. Override the mobile FAB line only with header.mobileLabel. Separate from agentName.
  • voice.agentName is not supported — it is removed from config when the widget merges settings.
  • Legacy root headerTitle is ignored. Use agentName for the hero name and header.title for the general assistant title line (pill / mobile landing fallback).
  • In unified mode, the text chat screen has a top bar with Voice or text (widget translation key backToModeChoice) to return to the voice/text choice inside the panel.
💬

Voice & Text Chat

Beautiful interface with voice recording, text chat, and message history

🎨

Fully Customizable

Colors, position, size, RTL support, and custom branding

📱

Mobile Optimized

Responsive design that works perfectly on all devices

🌍

Multi-language

Built-in support for multiple languages with custom translations

Installation

<!-- Add the SDK script to your page -->
<script src="https://unpkg.com/ttp-agent-sdk/dist/agent-widget.js"></script>

<!-- Initialize the widget -->
<script>
  const widget = new TTPAgentSDK.TTPChatWidget({
    agentId: 'agent_123',
    appId: 'your_app_id'
  });
</script>

Basic Configuration

const widget = new TTPAgentSDK.TTPChatWidget({
  // Required
  agentId: 'agent_123',       // Your AI agent ID
  appId: 'your_app_id',       // Your application ID

  // Optional — root only (voice idle hero name & avatar initial when no image)
  agentName: 'Alex',
  
  // Optional - Agent Settings Override (available when domain whitelist is configured)
  agentSettingsOverride: {
    prompt: "You are a helpful customer service assistant.",
    temperature: 0.8,
    voiceId: "F2",
    voiceSpeed: 1.2,
    firstMessage: "Hello! How can I help you today?",
    disableInterruptions: false,
    maxCallDuration: 600
  },
  
  // Optional - Appearance
  primaryColor: '#7C3AED',    // Widget theme color
  position: {                 // Or shorthand: 'bottom-right', 'bottom-left'
    vertical: 'bottom',
    horizontal: 'right',
    offset: { x: 20, y: 20 },
    draggable: false,         // true = user can drag launcher + panel
    draggablePersist: true    // remember drag position in localStorage
  },
  language: 'en',             // 'en', 'es', 'fr', 'de', 'he', etc.
  direction: 'ltr',           // 'ltr' or 'rtl' for right-to-left languages
  
  // Optional - Variables
  variables: {
    userName: 'John Doe',
    page: 'homepage',
    customData: 'value'
  }
});

Server-driven disclaimer (voice)

If your agent has non-empty disclaimers configured on the server, the widget shows a Notice modal with the server-provided text before microphone streaming and the greeting. The user must tap Accept or Decline; you do not need extra embed code. For protocol fields, SDK hooks, and custom implementations, see Server-driven disclaimer (voice).

Access Control

💡 Domain Whitelist: For production applications, configure a domain whitelist in your agent's admin panel to control which websites can connect to your agent.

The widget connects directly using agentId and appId. No backend authentication step is needed:

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  variables: {
    userName: 'John Doe',
    page: 'homepage'
  }
});
📋 Important:
  • The widget only needs agentId and appId to connect
  • Access control is managed via domain whitelist in the admin panel
  • WebSocket URLs use the production TalkToPC endpoint by default (same as voice). Pass websocketUrl only if you need a non-default backend.

Advanced Customization

Icon Customization

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  icon: {
    type: 'custom',                              // 'microphone', 'emoji', or 'custom'
    // Omit customImage (or use '') for default animated waveform on the desktop pill
    customImage: 'https://your-site.com/logo.png', // Optional: pill icon image URL
    size: 60,                                    // Icon size in pixels
    backgroundColor: '#FFFFFF',                  // Background color
    borderRadius: '50%'                         // Border radius (50% for circle)
  }
});

Chat Window Customization

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  chatWindow: {
    width: 400,                    // Width in pixels
    height: 600,                   // Height in pixels
    title: 'Chat with us!',        // Custom title
    subtitle: 'We reply instantly', // Custom subtitle
    placeholder: 'Type here...',   // Input placeholder
    borderRadius: 12               // Window border radius
  }
});

Branding

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  branding: {
    companyName: 'Your Company',
    logo: 'https://your-site.com/logo.png',
    showPoweredBy: false           // Hide "Powered by TTP" footer
  }
});

Agent Settings Override

💡 Access Control: Agent settings override is available when the agent has domain whitelist configured in the admin panel.

Dynamically customize agent behavior, voice, and personality on a per-session basis:

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  // Override agent settings dynamically
  agentSettingsOverride: {
    // Core settings
    prompt: "You are a friendly customer service assistant",
    temperature: 0.8,
    maxTokens: 200,
    
    // Voice settings
    voiceId: "F2",
    voiceSpeed: 1.2,
    
    // Behavior
    firstMessage: "Hello! How can I help you today?",
    disableInterruptions: false,
    maxCallDuration: 600,
    
    // Language
    language: "en",
    autoDetectLanguage: false
  }
});

See the Agent Settings Override section for complete documentation of all available override settings.

RTL (Right-to-Left) Support

// For Hebrew, Arabic, etc.
const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',
  direction: 'rtl',
  language: 'he',                  // Hebrew
  position: 'bottom-left'          // Better for RTL
});

Widget Methods

widget.open()

Programmatically open the chat window.

// Open chat from your own button
document.getElementById('myButton').onclick = () => {
  widget.open();
};

widget.close()

Close the chat window.

widget.close();

widget.toggle()

Toggle chat window open/closed.

widget.toggle();

widget.minimize()

Collapse the chat panel back to the round launcher pill — same visual state as if the user had clicked the launcher while the panel was open. Idempotent (no-op when already minimized) and does not end an active voice call: the WebSocket and conversation state are preserved, the user just sees the bubble until they re-open. Also triggered automatically by a backend-pushed { t: 'minimize_widget' } control message, so partner integrations can choreograph the chat panel from the server side (e.g. minimize the widget when opening a native trolley drawer).

widget.minimize();

widget.maximize()

Expand the chat panel from the round launcher pill — same visual state as if the user had clicked the launcher while the panel was closed. Idempotent (no-op when already open) and runs the same auto-connect side-effect a real click would when configured. Also triggered automatically by a backend-pushed { t: 'maximize_widget' } control message.

widget.maximize();

widget.destroy()

Remove the widget from the page.

widget.destroy();

widget.updateConfig(config)

Update widget configuration dynamically.

widget.updateConfig({
  primaryColor: '#FF5733',
  language: 'es',
  agentName: 'Jordan'  // root only; voice.agentName in this object is ignored
});

Widget Event Callbacks

Pass these as top-level config properties when constructing TTPChatWidget:

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'agent_123',
  appId: 'your_app_id',

  onConversationStart: () => {
    console.log('Voice conversation started');
  },

  onConversationEnd: () => {
    console.log('Voice conversation ended');
  },

  onBargeIn: () => {
    console.log('User interrupted the agent');
  },

  onAudioStartPlaying: () => {
    console.log('Agent audio started');
  },

  onAudioStoppedPlaying: () => {
    console.log('Agent audio stopped');
  },

  onSubtitleDisplay: (subtitle) => {
    console.log('Subtitle:', subtitle);
  },

  onVoiceCallButtonClick: () => {
    // Return false to prevent starting the call
    return true;
  }
});

For lower-level VoiceSDK events (onConnected, onMessage, etc.), use widget.voiceInterface.sdk or instantiate VoiceSDK directly. See Events & Callbacks.

Complete Example

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>My Website with AI Chat</title>
</head>
<body>
  <h1>Welcome to my website!</h1>
  
  <!-- Load the SDK -->
  <script src="https://unpkg.com/ttp-agent-sdk/dist/agent-widget.js"></script>
  
  <!-- Initialize widget -->
  <script>
    const widget = new TTPAgentSDK.TTPChatWidget({
      agentId: 'agent_123',
      appId: 'your_app_id',
      
      // Customize appearance
      primaryColor: '#7C3AED',
      position: 'bottom-right',
      language: 'en',
      
      // Custom branding
      chatWindow: {
        title: 'Chat with us!',
        subtitle: 'We typically reply instantly'
      },
      
      // Pass context variables
      variables: {
        userName: 'Visitor',
        page: window.location.pathname,
        referrer: document.referrer
      },
      
      // Event handlers
      onReady: () => {
        console.log('Chat widget ready!');
      },
      
      onMessage: (message) => {
        // Track messages in analytics
        console.log('Message:', message);
      }
    });
    
    // Optional: Open chat programmatically
    // widget.open();
  </script>
</body>
</html>

Configuration Reference

🎨 Extensive Customization

The Voice & Chat Widget can be customized in almost every aspect - colors, text, icons, sizes, behaviors, and more!

Agent naming: use root agentName for the voice idle hero and avatar letter fallback. Do not use voice.agentName (removed on merge). Do not use legacy root headerTitle (ignored). See Agent display name above.

See the live customization demo to experiment with all options interactively, including quick themes (Default, Light, Sunset, Hebrew, S-Law).

Required Configuration
Property Type Description
agentId string Your AI agent identifier
appId string Your application identifier
General Configuration
Property Type Default Description
primaryColor string '#7C3AED' Main theme color (hex)
direction string 'ltr' (or 'rtl' when language is he / ar) 'ltr' or 'rtl'. If omitted, Hebrew and Arabic default to 'rtl'. Sets the shadow host dir so the desktop pill launcher matches the panel (fixes extra padding beside the logo on RTL sites). In 'rtl', transcript, bubbles, and mobile bar use RTL punctuation order.
language string 'en' Language code (en, es, fr, de, he, ar, etc.)
agentName string Root only. Name on the voice idle hero and first-letter avatar fallback when no header image is set. If omitted or empty, the widget defaults to Sasha. Color: voice.agentNameColor. Not the desktop pill text — use header.pillTitle or header.title for that.
headerTitle string Deprecated / ignored. Former optional root field; it is no longer read. Use agentName for the voice hero name and header.title for the shared assistant title line.
variables object {} Custom variables to pass to agent
websocketUrl string wss://speech.talktopc.com/ws/conv (built-in) Optional override for voice and text. Omit unless you point at a custom backend; text still uses /chat/text on the same host as this base.
agentSettingsOverride object null Override agent settings dynamically. See Agent Settings Override for details.
customStyles string '' Custom CSS to inject
useShadowDOM boolean true Enable Shadow DOM for CSS isolation. Set to false for Shopify compatibility. See Shadow DOM Configuration below.
mobileVoiceUI boolean auto Root-level override. Force mobile-style voice call UI (bottom bar + overlay) or desktop hero. Takes precedence over behavior.mobileVoiceUI when both are set. Omit for auto: native iOS/Android, or viewport ≤768px with touch/coarse pointer.
inputFormat object Optional v2 input audio format passed to the voice WebSocket hello message: { encoding, sampleRate, channels, bitDepth }. Same fields as top-level VoiceSDK config (sampleRate, etc.). See Protocol v2.
visualAssistant object null Enable browser-side visual assistant tools (page read, highlight, scroll, navigate, form fill, click, screenshot). Also accepted under agentSettingsOverride.visualAssistant. See Visual Assistant below.
whatsapp object null Optional WhatsApp handoff on the voice idle hero: { number: '972501234567', text: 'optional pre-filled message' }. Non-digits are stripped from number. Omit to hide the WhatsApp button.

Shadow DOM Configuration (useShadowDOM)

What is Shadow DOM?

Shadow DOM is a web standard that provides CSS isolation by creating a separate DOM tree that doesn't inherit styles from the parent page. This prevents theme CSS from interfering with the widget's appearance.

When to Use Shadow DOM:

  • ✅ WordPress: Use Shadow DOM (useShadowDOM: true or omit - defaults to true)
  • ✅ Most platforms: Shadow DOM works well on most websites and platforms
  • ❌ Shopify: Disable Shadow DOM (useShadowDOM: false) due to rendering issues

Why We Need This Option:

While Shadow DOM provides excellent CSS isolation, some platforms (notably Shopify) have rendering issues where Shadow DOM elements render with 0x0 dimensions, making the widget invisible. By setting useShadowDOM: false, the widget uses regular DOM with targeted CSS resets instead, ensuring visibility while still protecting against most theme conflicts.

Platform-Specific Recommendations:

Platform Recommended Setting Reason
WordPress useShadowDOM: true (default) Shadow DOM works perfectly and provides better CSS isolation
Shopify useShadowDOM: false Shadow DOM elements render with 0x0 dimensions, making widget invisible
Wix useShadowDOM: true (default) Shadow DOM works well on Wix
Custom Websites useShadowDOM: true (default) Use Shadow DOM unless you experience rendering issues

Example Usage:

// WordPress (default - Shadow DOM enabled)
const widget = new TTPChatWidget({
  agentId: 'your-agent-id',
  // useShadowDOM defaults to true, so widget is isolated from theme CSS
});

// Shopify (disable Shadow DOM)
const widget = new TTPChatWidget({
  agentId: 'your-agent-id',
  useShadowDOM: false  // Required for Shopify compatibility
});

// If widget is invisible, try disabling Shadow DOM
const widget = new TTPChatWidget({
  agentId: 'your-agent-id',
  useShadowDOM: false  // Fallback if Shadow DOM causes rendering issues
});

How It Works:

  • With Shadow DOM (useShadowDOM: true): Widget is rendered inside a Shadow DOM tree, completely isolated from page CSS. Styles are injected into the shadow root.
  • Without Shadow DOM (useShadowDOM: false): Widget is rendered in regular DOM. Styles are injected into the document <head> with high-specificity selectors to prevent theme CSS conflicts. Targeted CSS resets protect against common theme issues while preserving widget functionality.

⚠️ Important Notes:

  • When useShadowDOM: false, the widget uses targeted CSS resets instead of aggressive resets to preserve internal widget styles
  • If you experience layout issues with useShadowDOM: false, check if your theme's CSS is overriding widget styles - you may need to add more specific CSS rules
  • Unified mode: With useShadowDOM: false, the widget must show only one of voice or text at a time inside the panel. A previous reset rule forced display:flex on multiple roots (higher specificity than the hide rules), which made them stack vertically on Shopify; this is fixed in current builds.
  • Voice call UI: Call duration updates and the red recording dot use DOM queries scoped to #ttp-widget-container (not document), and extra CSS guards the dot and pulse when themes override spans — fixes static timers and missing dots on Shopify.
  • Orb waveform: Bars are injected under #ttp-widget-container. With useShadowDOM: false, the same CSS ttp-wave keyframes and per-bar delays are used as in Shadow DOM; high-specificity rules protect size and animation without fixing opacity (which would break the keyframe fade).
  • The widget automatically handles CSS injection differently based on this setting - no additional configuration needed
Positioning
Property Type Default Description
position string | object 'bottom-right' String: 'bottom-right', 'bottom-left'. Object: { vertical, horizontal, offset }
position.vertical string 'bottom' 'top' or 'bottom'
position.horizontal string 'right' 'left' or 'right'
position.offset object { x: 20, y: 20 } Offset from edges (pixels)
position.draggable boolean false When true, visitors can drag the launcher pill / mobile FAB and the open chat panel around the viewport. They move together as one unit (never split). Drag handles: the pill/FAB and panel header bars (voice idle header, active-call top bar, text chat top bar). Header buttons (close, etc.) still work normally.
position.draggablePersist boolean true When true (default), the dragged position is saved in localStorage per origin+path and restored on the next visit. Set false to reset to the configured corner on every load. The widget unit and the desktop minimized voice strip (flavor.callView: 'minimized') remember positions independently.
positionOffset object Legacy. Used only when position is a string (e.g. 'bottom-right') instead of an object. Prefer position.offset.

When position.draggable is enabled, the desktop minimized voice strip (flavor.callView: 'minimized') is also draggable by its body (controls and text input still work). Positions are clamped to stay on-screen and re-clamped on window resize.

Icon & Button
Property Type Default Description
icon.type string 'custom' 'microphone', 'custom', 'emoji', 'text'
icon.customImage string Optional HTTPS image URL for the desktop floating pill icon. When omitted or empty, the pill uses the same animated waveform as the mobile launcher and pre-chat landing (white bars on a frosted circle).
icon.size string 'medium' 'small', 'medium', 'large', 'xl'
icon.backgroundColor string '#FFFFFF' Icon background color
button.size string 'medium' 'small', 'medium', 'large'
button.shape string 'circle' 'circle', 'rounded', 'square'
button.backgroundColor string primaryColor Button background color
button.hoverColor string '#7C3AED' Button hover color
button.shadow boolean true Enable button shadow
Panel & Header
Property Type Default Description
panel.width number 350 Panel width (pixels)
panel.height number 500 Panel height (pixels)
panel.borderRadius number 12 Border radius (pixels)
panel.backgroundColor string '#FFFFFF' Panel background color
header.title string 'Chat Assistant' Default title when more specific labels are omitted: fallback for the desktop and mobile launchers if header.pillTitle is not set (unless header.mobileLabel is set), and for the mobile pre-chat landing name when root agentName is not set. Does not set the voice idle hero name — use root agentName for that.
header.pillTitle string '' Optional main line on the desktop floating pill and the mobile pill launcher (same fallback chain as desktop: when empty, uses header.title). Set header.mobileLabel if the FAB needs different copy than the desktop pill. Does not change the voice idle hero name — use root agentName for that.
header.showTitle boolean true Show/hide header title
header.backgroundColor string '#7C3AED' Header background color
header.textColor string '#FFFFFF' Header text color
header.mobileLabel string Optional: overrides the mobile pill launcher’s main line only. When omitted, the FAB uses the same text as the desktop pill (header.pillTitle or header.title).
header.showCloseButton boolean true Show or hide the panel close button in the header.
header.onlineIndicatorText string Auto Online status label on the desktop pill, mobile FAB, and headers. When omitted, uses the translated “Online” string for the widget language.
header.onlineIndicatorColor string header.textColor Text color for the online indicator label.
header.onlineIndicatorDotColor string '#10b981' Color of the online status dot next to the indicator text.
footer.show boolean true Show or hide the "Powered by" footer.
footer.brand string 'talktopc' Brand shown in the "Powered by" footer. 'talktopc' links to talktopc.com; 'speacart' links to speacart.com. Any other value defaults to TalkToPC.
footer.backgroundColor string '#f9fafb' Footer background color.
footer.textColor string '#9ca3af' Footer text color.
footer.hoverColor string '#7C3AED' Footer link hover color.
Messages & Chat

On desktop, when the text chat view is open, the panel uses a fixed height (up to min(520px, 100vh − 100px)) so new messages scroll inside the transcript instead of stretching the card. By default (text.useVoiceTheme not false), the text UI shares the voice theme: voice.heroGradient1 / heroGradient2 surface (same as the voice idle hero), voice.primaryBtnGradient* and startCallButtonColor for send/focus and user bubbles (#primary40 tint), avatarGradient1/2 on the assistant avatar, and transcript-style translucent inputs. The “Powered by” footer uses the same link accent as on the hero. Set text.useVoiceTheme: false for a light layout driven by solid panel.backgroundColor and messages.* colors.

Property Type Default Description
messages.userBackgroundColor string '#E5E7EB' User message background
messages.agentBackgroundColor string '#F3F4F6' Agent message background
messages.systemBackgroundColor string '#DCFCE7' System message background
messages.errorBackgroundColor string '#FEE2E2' Error message background
messages.textColor string '#1F2937' Message text color (fallback when role-specific colors are omitted)
messages.userTextColor string messages.textColor User message text color
messages.agentTextColor string messages.textColor Agent message text color
messages.userAvatarIcon string '👤' Emoji/icon shown on user message avatars
messages.agentAvatarIcon string '🤖' Emoji/icon shown on agent message avatars
messages.fontSize string '16px' Message font size
messages.borderRadius number 8 Message bubble radius
text.useVoiceTheme boolean true When true, text chat matches the voice idle hero gradient, primary gradients, avatar gradients, and dark translucent bubbles/inputs (see voice theme). When false, chrome follows solid panel.backgroundColor (hex) and messages.userBackgroundColor / agentBackgroundColor.
text.sendButtonColor string voice accent Send button fill; defaults to the first resolvable hex from voice.primaryBtnGradient1, voice.primaryBtnGradient2, or voice.startCallButtonColor (including the default start-call indigo when those are unset), then #7C3AED.
text.sendButtonHoverColor string voice accent 2 / shaded Hover state; uses voice.primaryBtnGradient2 when it resolves to a different hex than the send color, otherwise a slightly darker shade of the accent.
text.sendButtonActiveColor string same as hover Active press state; same resolution as hover.
text.sendButtonText string '➤' Send button text/icon
text.sendButtonTextColor string '#FFFFFF' Send button text color
text.sendButtonFontSize string '20px' Send button font size
text.sendButtonFontWeight string '500' Send button font weight
text.inputPlaceholder string 'Type your message...' Input placeholder text
text.inputBorderColor string '#E5E7EB' Input border color
text.inputFocusColor string voice accent Input focus border and ring; same default chain as text.sendButtonColor.
text.inputBackgroundColor string '#FFFFFF' Input background color
text.inputTextColor string '#1F2937' Input text color
text.inputFontSize string '16px' Input font size
text.inputBorderRadius number 20 Input border radius (pixels)
text.inputPadding string '8px 16px' Input padding
text.sendButtonHint.text string '' Optional hint text below or near the send button
text.sendButtonHint.color string '#6B7280' Send button hint text color
text.sendButtonHint.fontSize string '14px' Send button hint font size
Voice Configuration

The agent’s spoken/idle display name is configured with root agentName only. Any voice.agentName value in your JSON is stripped during merge and has no effect (widget.updateConfig({ voice: { agentName: '…' } }) is ignored for naming).

Property Type Default Description
voice.micButtonColor string primaryColor Microphone button color (inside panel)
voice.micButtonActiveColor string '#EF4444' Microphone button color when active
voice.micButtonHint.text string 'Click the button to start...' Hint text below mic button
voice.micButtonHint.color string '#6B7280' Hint text color
voice.avatarBackgroundColor string '#667eea' Voice avatar background
voice.avatarActiveBackgroundColor string '#667eea' Avatar background when active
voice.statusTitleColor string '#1e293b' Status title text color
voice.statusSubtitleColor string '#64748b' Status subtitle text color
voice.startCallTitle string null Custom text for "Click to Start Call" title (bypasses translations)
voice.startCallSubtitle string null Custom text for "Real-time voice conversation" subtitle (bypasses translations)
voice.startCallButtonText string null Custom text for "Start Call" button (bypasses translations)
voice.startCallButtonColor string '#667eea' Fills the start-call button gradient when primaryBtnGradient1/2 are omitted. The “TalkToPC” footer link uses the same solid accent as the start button (primaryBtnGradient1, then startCallButtonColor, then defaults).
voice.startCallButtonTextColor string '#FFFFFF' Start call button text color
voice.endCallButtonColor string '#ef4444' End call button color
voice.transcriptBackgroundColor string '#FFFFFF' Transcript background
voice.transcriptTextColor string '#1e293b' Transcript text color
voice.transcriptLabelColor string '#94a3b8' Transcript label color
voice.userTranscriptPrefix string | null null Prefix before live user speech (STT) in the collapsed transcript strip and mobile bar (e.g. "You: "). null uses the translation key userTranscriptPrefix for the widget language; empty string removes the prefix.
voice.controlButtonColor string '#FFFFFF' Control button color
voice.controlButtonSecondaryColor string '#64748b' Secondary control button color
voice.language string 'en' Voice language (overrides global)
voice.statusDotColor string '#10b981' Status dot color on the voice idle screen
voice.statusText string | null null Custom status line on the voice idle screen. null uses the translated default.
voice.outputContainer string 'raw' Output audio container for v2 format negotiation: 'raw' or 'wav'
voice.outputEncoding string 'pcm' Output encoding: 'pcm', 'pcmu', or 'pcma'
voice.outputSampleRate number 24000 Requested output sample rate (Hz): 8000, 16000, 22050, 24000, 44100, or 48000
voice.outputChannels number 1 Output channels (mono only supported)
voice.outputBitDepth number 16 Output bit depth: 8, 16, or 24
Voice Theming & Pill Launcher

🎨 Full Voice UI Theming

Customize every visual aspect of the voice interface — hero section, buttons, active call screen, pill launcher, and more. Use these properties to create branded themes or match your website's design.

The desktop pill has a fixed width (158px) on viewports ≥769px with a taller vertical layout (padding, 36px icon circle, 13px/11px title/status). Long titles ellipsize. RTL uses logical padding so the logo sits evenly on the start edge.

Pill Launcher

PropertyTypeDefaultDescription
voice.pillGradientstring''CSS gradient for the mobile floating pill, the mobile pre-chat landing sheet (.ttp-mobile-landing), and the mobile in-call bottom bar plus the expanded conversation header—same token everywhere. When unset, the widget uses the same default three-stop purple as the landing sheet (linear-gradient(135deg, #581c87, #312e81, #1e1b4b)). Example: linear-gradient(135deg, #7c3aed, #6d28d9).
voice.pillTextColorstring'#ffffff'Text color on the pill launcher
voice.pillDotColorstring'#4ade80'Online-status dot color on the pill

Hero / Idle Screen

PropertyTypeDefaultDescription
agentNamestringRoot only (duplicate of General Configuration). Idle hero name and avatar initial; default Sasha when unset/empty. Mobile pre-chat landing: agentName if set, else header.title.
voice.avatarGradient1string'#6d56f5'Avatar gradient start color
voice.avatarGradient2string'#a78bfa'Avatar gradient end color
voice.headerAvatarImageUrlstring''Optional https (or http) URL for the idle voice header circle (desktop and mobile pre-call). When missing or invalid, the UI shows the first letter of the agent name inside the circle (e.g. "S" for Sasha). Same key may be set at the top level as headerAvatarImageUrl. Legacy: voice.avatarImageUrl (and snake_case header_avatar_image_url / avatar_image_url on voice or root) are also accepted. Invalid or non-http(s) URLs are ignored.
voice.onlineDotColorstring'#22c55e'Online dot next to avatar
voice.heroGradient1string'#2a2550'Hero gradient start: desktop voice panel (idle and in-call), widget footer, and .voice-interface.active use linear-gradient(160deg, heroGradient1, heroGradient2). Mobile in-call chrome uses voice.pillGradient instead (see Pill Launcher).
voice.heroGradient2string'#1a1a2e'Hero gradient end; pairs with heroGradient1 for desktop/active surfaces above—not for the mobile minimized bar (use pillGradient).
voice.agentNameColorstring'#f0eff8'Agent name text color
voice.agentRoleColorstring'rgba(255,255,255,0.35)'Agent role text color
voice.agentRolestring'AI Voice Assistant'Agent role label
voice.headlineColorstring'#ffffff'Hero headline text color
voice.headlinestring'Hi there 👋'Hero headline text
voice.sublineColorstring'rgba(255,255,255,0.45)'Hero subline text color
voice.sublinestring'Ask me anything...'Hero subline text (supports HTML)

Primary & Secondary Buttons

PropertyTypeDefaultDescription
voice.primaryBtnGradient1string'#6d56f5'"Start Voice Call" button gradient start; footer “TalkToPC” link uses the same resolved accent (after startCallButtonColor fallback when gradients are omitted). Mobile user bubbles and send accent use this palette with primaryBtnGradient2 / sendButtonColor.
voice.primaryBtnGradient2string'#9d8df8'"Start Voice Call" button gradient end; pairs with sendButtonColor for mobile message/send styling.
voice.startCallButtonTextColorstring'#FFFFFF'Primary button text color
voice.startCallButtonTextstring'Start Voice Call'Primary button label
voice.sendMessageTextstring'Send a Message'Secondary button label
voice.secondaryBtnBgstring'rgba(255,255,255,0.05)'Secondary button background
voice.secondaryBtnBorderstring'rgba(255,255,255,0.09)'Secondary button border color
voice.secondaryBtnTextColorstring'rgba(255,255,255,0.6)'Secondary button text color

Active Call View

PropertyTypeDefaultDescription
voice.waveformBarColorstring'#7C3AED'Waveform bar color during call
voice.speakerButtonColorstring'#FFFFFF'Speaker button color

Quick Theme Example

Apply a complete light theme:

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'your-agent-id',
  appId: 'your-app-id',
  panel: {
    backgroundColor: '#ffffff',
    border: '1px solid rgba(0,0,0,0.06)'
  },
  voice: {
    pillGradient: 'linear-gradient(135deg, #7c3aed, #6d28d9)',
    pillTextColor: '#ffffff',
    heroGradient1: '#ede9fe',
    heroGradient2: '#f5f3ff',
    agentNameColor: '#1e1b4b',
    headlineColor: '#1e1b4b',
    sublineColor: '#6b7280',
    primaryBtnGradient1: '#7c3aed',
    primaryBtnGradient2: '#a78bfa',
    secondaryBtnBg: '#f5f3ff',
    secondaryBtnBorder: 'rgba(124,58,237,0.15)',
    secondaryBtnTextColor: '#6d28d9'
  }
});

RTL Hebrew theme example:

const widget = new TTPAgentSDK.TTPChatWidget({
  agentId: 'your-agent-id',
  appId: 'your-app-id',
  direction: 'rtl',
  header: { title: 'עוזרת חכמה', onlineIndicatorText: 'מחוברת' },
  panel: { backgroundColor: '#0f172a' },
  voice: {
    pillGradient: 'linear-gradient(135deg, #1e3a5f, #1e40af, #0f172a)',
    avatarGradient1: '#3b82f6',
    avatarGradient2: '#1d4ed8',
    heroGradient1: '#1a2744',
    heroGradient2: '#0f172a',
    primaryBtnGradient1: '#3b82f6',
    primaryBtnGradient2: '#1d4ed8',
    startCallButtonText: 'התחל שיחה קולית',
    sendMessageText: 'שלח הודעה',
    agentRole: 'עוזרת קולית חכמה',
    headline: 'היי, מה שלומך? 👋',
    subline: 'שאל/י אותי הכל — אני עונה מיידית בקול או בטקסט.'
  }
});
Behavior
Property Type Default Description
behavior.mode string 'unified' 'unified' (both), 'voice-only', 'text-only'
behavior.autoOpen boolean false Auto-open widget on page load
behavior.startOpen boolean false Start with widget open
behavior.hidden boolean false Hide the widget completely
behavior.mobileVoiceUI boolean auto Force mobile-style voice call UI (bottom bar + overlay) or desktop hero. Omit for auto: native iOS/Android, or viewport ≤768px with touch/coarse pointer (helps in-app browsers with desktop User-Agent). Root-level mobileVoiceUI overrides this if both are set.
behavior.autoConnect boolean false Auto-connect on widget open
behavior.showWelcomeMessage boolean true Show welcome message
behavior.welcomeMessage string 'Hello! How can I help...' Welcome message text
behavior.enableVoiceMode boolean true Enable voice mode option (in unified mode)
Animation, Prompt Bubble & Tooltips
Property Type Default Description
animation.enableHover boolean true Enable hover animations
animation.enablePulse boolean true Enable pulse animations
animation.enableSlide boolean true Enable slide animations
animation.duration number 0.3 Animation duration (seconds)
promptAnimation.enabled boolean false Show an animated “Try me!” prompt bubble next to the launcher pill. Must be explicitly set to true.
promptAnimation.text string 'Try me!' Prompt bubble label text
promptAnimation.backgroundColor string purple gradient Prompt bubble background (CSS color or gradient)
promptAnimation.textColor string '#ffffff' Prompt bubble text color
promptAnimation.animationType string 'bounce' 'bounce', 'pulse', 'float', or 'none'
promptAnimation.showShimmer boolean true Shimmer effect on the prompt bubble
promptAnimation.showPulseRings boolean true Pulse rings around the launcher while the prompt is visible
promptAnimation.hideAfterClick boolean true Hide the prompt after the user opens the widget
promptAnimation.hideAfterSeconds number | null null Auto-hide after N seconds. null = never auto-hide.
promptAnimation.position string 'top' Prompt placement relative to the launcher: 'top', 'left', or 'right'
tooltips.newChat string Auto New chat button tooltip
tooltips.back string Auto Back button tooltip
tooltips.close string Auto Close button tooltip
tooltips.mute string Auto Mute button tooltip
tooltips.speaker string Auto Speaker button tooltip
tooltips.endCall string Auto End call button tooltip
Mobile pre-chat overlay & legacy landing keys (Unified Mode)

On desktop, unified mode opens the voice idle hero inside the panel—there is no in-panel “Voice / Text” mode card screen. Back navigation and call end return to that hero. On mobile, the full-screen pre-chat sheet (ttpMobileLanding) still offers call vs text; the following options apply there. During an active call, the bottom minimized bar stays visible first; tapping the transcript row opens a conversation sheet (type while voice is active). Close the sheet to return to the bar—mute, speaker, and end call remain on the bar when the sheet is closed. On mobile, Back (header or text bar) closes the panel and opens the Call vs Chat landing overlay directly—the same sheet as tapping the FAB. If the user declines the server-driven disclaimer during voice setup, the widget also returns to that Call vs Chat overlay (and disconnects) instead of leaving the in-panel “Start call” card. After accepting the disclaimer, the minimized voice bar (waveform / mic / end call) stays the default view; the expandable conversation sheet with the text field does not open automatically. Other landing.* keys (e.g. mode card colors, in-panel title) remain in config merges for backward compatibility but are not rendered on desktop.

Property Type Default Description
landing.voiceCardTitle string null Fallback label for the mobile “call” action when landing.callButtonText is not set
landing.textCardTitle string null Fallback label for the mobile “chat” action when landing.chatButtonText is not set
landing.callButtonText string null Mobile overlay primary button text (falls back to voiceCardTitle / translation)
landing.chatButtonText string null Mobile overlay secondary button text (falls back to textCardTitle / translation)
landing.statusText string null Mobile overlay status line; when unset, uses online translation plus landing.subtitle (e.g. “Ready to help”)
landing.subtitle string Shown after the online dot in the mobile overlay status when landing.statusText is not set
Advanced Configuration
Property Type Default Description
websocketUrl string Optional Custom WebSocket base URL (defaults to wss://speech.talktopc.com/ws/conv). If not provided, URL is constructed from agentId/appId.
demo boolean true Enable demo mode
panel.backdropFilter string null CSS backdrop filter (e.g., 'blur(10px)')
panel.border string '1px solid rgba(0,0,0,0.1)' Panel border style
button.shadowColor string 'rgba(0,0,0,0.15)' Button shadow color
icon.emoji string '🎤' Emoji when icon.type = 'emoji'
icon.text string 'AI' Text when icon.type = 'text'
accessibility.ariaLabel string 'Chat Assistant' ARIA label for the widget
accessibility.ariaDescription string 'Click to open chat assistant' ARIA description
accessibility.keyboardNavigation boolean true Enable keyboard navigation
onConversationStart function Called when a voice conversation starts (after connect + recording begins).
onConversationEnd function Called when a voice conversation ends (including disclaimer decline before recording).
onBargeIn function Called when the user interrupts the agent (barge-in).
onAudioStartPlaying function Called when agent audio playback starts.
onAudioStoppedPlaying function Called when agent audio playback stops.
onSubtitleDisplay function Called when a subtitle/transcript line is displayed during a call.
onVoiceCallButtonClick function Called when the user taps “Start Voice Call”. Return false to cancel the call start.

Visual Assistant (visualAssistant)

When enabled, the SDK registers browser-side tools the agent can call to read the page, highlight elements, scroll, navigate, fill forms, click elements, and capture screenshots. Tools are always registered on the client; the backend agent config decides which are available.

visualAssistant: {
  enabled: true,           // Required to activate visual assistant
  allowHighlight: true,    // highlight_element
  allowScroll: true,       // scroll_to_element
  allowNavigate: true,     // navigate_to
  allowFillForm: true,     // fill_form
  allowClick: true         // click_element
}

May be set at the widget root or inside agentSettingsOverride.visualAssistant. See examples/test-client-tools.html for a live demo.

✅ Configuration Verified:

All configuration options listed above have been verified against the source code and are fully supported. The widget has 100+ customization options across 15+ categories:

  • General (incl. visualAssistant, whatsapp, inputFormat)
  • Positioning (incl. draggable, draggablePersist)
  • Icon & Button
  • Panel & Header (incl. online indicator, footer colors)
  • Messages & Text Chat
  • Voice Interface & output format
  • Voice Theming & Pill Launcher
  • Mobile landing / legacy landing
  • Prompt bubble (promptAnimation)
  • Tooltips & Animations
  • Behavior (incl. mobileVoiceUI)
  • Accessibility
  • Event callbacks (voice lifecycle)
  • Advanced & legacy keys
  • Visual Assistant

Experiment with all options interactively in the live demo.

📖 Tip: All configuration options support the spread operator (...), so you can pass additional custom properties that will be merged with defaults.

🎮 Live Demo: Try the fully functional widget with customization options at test-text-chat.html

Use Cases

💼 Customer Support

Add 24/7 AI-powered support to your website

🛒 E-commerce

Help customers find products and answer questions

📚 Documentation

Provide interactive help for your docs

🎓 Education

Create AI tutors and learning assistants

Vanilla JavaScript Guide

Use the SDK in any JavaScript application without frameworks.

Complete Example

import { VoiceSDK } from 'ttp-agent-sdk';

class VoiceAssistant {
  constructor() {
    this.sdk = null;
    this.isConnected = false;
    this.isRecording = false;
  }

  async initialize(agentId, overrides = {}) {
    this.sdk = new VoiceSDK({
      agentId: agentId,
      appId: 'your_app_id',
      agentSettingsOverride: overrides
    });

    // Setup event listeners
    this.setupEventListeners();

    // Connect
    await this.sdk.connect();
  }

  setupEventListeners() {
    this.sdk.on('connected', () => {
      this.isConnected = true;
      this.updateUI('connected');
    });

    this.sdk.on('disconnected', () => {
      this.isConnected = false;
      this.isRecording = false;
      this.updateUI('disconnected');
    });

    this.sdk.on('recordingStarted', () => {
      this.isRecording = true;
      this.updateUI('recording');
    });

    this.sdk.on('recordingStopped', () => {
      this.isRecording = false;
      this.updateUI('connected');
    });

    this.sdk.on('message', (msg) => {
      if (msg.type === 'agent_response') {
        this.displayMessage('agent', msg.agent_response);
      }
    });

    this.sdk.on('error', (error) => {
      this.handleError(error);
    });
  }

  async toggleRecording() {
    if (!this.isConnected) return;

    if (this.isRecording) {
      await this.sdk.stopRecording();
    } else {
      await this.sdk.startRecording();
    }
  }

  disconnect() {
    if (this.sdk) {
      this.sdk.disconnect();
      this.sdk = null;
    }
  }

  updateUI(state) {
    // Update your UI based on state
  }

  displayMessage(role, text) {
    // Display message in your UI
  }

  handleError(error) {
    // Handle errors
  }
}

// Usage
const assistant = new VoiceAssistant();

await assistant.initialize('agent_123', {
  language: 'es',
  temperature: 0.9
});

React Integration

Use the SDK in React applications with hooks and components.

Using Hooks

import React, { useState, useEffect, useRef } from 'react';
import { VoiceSDK } from 'ttp-agent-sdk';

function VoiceChat() {
  const [status, setStatus] = useState('disconnected');
  const [isRecording, setIsRecording] = useState(false);
  const [messages, setMessages] = useState([]);
  const sdkRef = useRef(null);

  // Initialize SDK
  useEffect(() => {
    async function initSDK() {
      const sdk = new VoiceSDK({
        agentId: 'agent_123',
        appId: 'your_app_id',
        agentSettingsOverride: {
          language: 'es',
          temperature: 0.9
        }
      });

      // Event listeners
      sdk.on('connected', () => setStatus('connected'));
      sdk.on('disconnected', () => {
        setStatus('disconnected');
        setIsRecording(false);
      });
      sdk.on('recordingStarted', () => setIsRecording(true));
      sdk.on('recordingStopped', () => setIsRecording(false));
      sdk.on('message', (msg) => {
        if (msg.type === 'agent_response') {
          setMessages(prev => [
            ...prev,
            { role: 'agent', text: msg.agent_response }
          ]);
        }
      });

      await sdk.connect();
      sdkRef.current = sdk;
    }

    initSDK();

    // Cleanup
    return () => {
      if (sdkRef.current) {
        sdkRef.current.disconnect();
      }
    };
  }, []);

  const toggleRecording = async () => {
    if (sdkRef.current) {
      await sdkRef.current.toggleRecording();
    }
  };

  return (
    <div>
      <div>Status: {status}</div>
      <button onClick={toggleRecording} disabled={status !== 'connected'}>
        {isRecording ? 'Stop' : 'Start'} Recording
      </button>
      <div>
        {messages.map((msg, i) => (
          <div key={i}>{msg.role}: {msg.text}</div>
        ))}
      </div>
    </div>
  );
}

export default VoiceChat;

VoiceButton Component

Pre-built React component for quick integration.

Installation

import { VoiceButton } from 'ttp-agent-sdk/react';

Basic Usage

import React from 'react';
import { VoiceButton } from 'ttp-agent-sdk/react';

function App() {
  return (
    <VoiceButton
      agentId="agent_123"
      appId="your_app_id"
      agentSettingsOverride={{
        language: 'es',
        temperature: 0.9
      }}
      onConnected={() => console.log('Connected')}
      onMessage={(msg) => console.log('Message:', msg)}
      onError={(error) => console.error('Error:', error)}
    />
  );
}

Props

Prop Type Required Description
agentId string Yes The AI agent identifier
appId string Yes Your application ID
websocketUrl string No Optional custom WebSocket base URL (defaults to wss://speech.talktopc.com/ws/conv)
agentSettingsOverride object No Custom agent settings
voice string No Voice preset (default: 'default')
language string No Language code (default: 'en')
autoReconnect boolean No Auto-reconnect on disconnect (default: true)
className string No Custom CSS class for the button
style object No Inline styles for the button
children React.Node No Custom button content (replaces default icon)

Event Callbacks

Callback Parameters Description
onConnected - Called when successfully connected
onDisconnected - Called when disconnected
onRecordingStarted - Called when recording starts
onRecordingStopped - Called when recording stops
onPlaybackStarted - Called when audio playback starts
onPlaybackStopped - Called when audio playback stops
onMessage message Called for all WebSocket messages
onError error Called on errors
onBargeIn message Called when user interrupts agent
onStopPlaying message Called when server requests to stop audio

Fly-to-Cart Animation

When users tap "Add to Cart" in the e-commerce widget, the product card flies down into the cart icon with a tornado funnel effect. Uses html2canvas to capture the card as an image, avoiding stacking/overflow clipping in iframes or containers with overflow:hidden. Built-in for TTPEcommerceWidget. For custom React product widgets, use the useFlyToCart hook.

React Hook (useFlyToCart)

import { useFlyToCart } from 'ttp-agent-sdk';

const cartIconRef = useRef(null);
const { triggerFly, isAnimating } = useFlyToCart(cartIconRef);

const handleAddToCart = (product) => {
  triggerFly(cardRef.current, product, () => {
    addToCartAPI(product);
  });
};

Vanilla JS (FlyToCart)

import { FlyToCart } from 'ttp-agent-sdk';

const flyToCart = new FlyToCart({
  getCartTarget: () => document.querySelector('.cart-icon'),
  onCartBump: () => { /* optional */ },
});

flyToCart.triggerFly(cardElement, product, () => {
  addToCartAPI(product);
});

E-commerce: Adding to cart (partner API & front-end cart)

Cart logic is owned by your integration (partner APIs, mock stores, or a Shopify storefront). The widget renders products, sends user intent over the WebSocket, and updates the cart summary when it receives the right messages. The SDK does not keep its own authoritative cart array: totals and counts should reflect what the backend (or the browser cart, then the backend) considers true.

Who does the "add"?

Model Where the line item is added How the widget learns the new totals
Server-side partner cart (mock store, custom API, headless stack) Conversation backend calls the partner API after product_selected or after the model runs the internal tool add_to_cart. Backend sends t: "cart_updated" with cartTotal, cartItemCount, and optional currency. For a visible "added" toast, include action: "added" and a product object (id, name, price).
Shopify Online Store (widget embedded on the store theme) The browser must call Shopify's Ajax Cart API (POST /cart/add.js) so the add uses the visitor's store session cookie. Server-only Storefront API carts are a different session and will not match the theme cart. The conversation backend sends t: "add_to_store_cart" with variantId and quantity. The ecommerce flavor runs the Ajax calls, refreshes the cart bar from GET /cart.js, and sends t: "cart_add_result" back so the backend can continue the turn. The backend may still send cart_updated afterward if you want one canonical sync message.
Custom client tool Your backend issues a client_tool_call with a tool name you defined (for example addToCart). Your page handles it via registerToolHandler on the widget or AgentSDK. The SDK replies with client_tool_result (or client_tool_error). The cart bar still expects cart_updated unless your handler updates UI itself.

Flow 1 — User taps Add / Update on a product card

  1. The widget stops playback (so the agent does not talk over the action) and sends t: "product_selected" on the voice WebSocket. Payload includes productId, productName, price, quantity (absolute units or weight), and sellBy (quantity or weight).
  2. Your backend decides what happens next:
    • Update the partner cart via API, then push cart_updated to the session, or
    • For Shopify on-domain, send add_to_store_cart and let the widget perform /cart/add.js, then consume cart_add_result on the server, or
    • Hybrid: tool or internal step on the server plus a follow-up cart_updated.
  3. EcommerceManager.handleCartUpdated updates the bottom cart bar when cartTotal and cartItemCount are present. Optional action: "added" drives the short confirmation UX.

Flow 2 — User asks in voice (or the agent adds without a card click)

For TTP's Java conversation backend, e-commerce uses internal tools such as search_products, add_to_cart, and get_cart (names match what the LLM sees; they are not the same as optional SDK client_tool_call names unless you wire them).

  1. The model calls add_to_cart with productId and quantity.
  2. The server runs the partner integration:
    • Partner API cart: The service calls the partner, receives success and line items, then typically sends cart_updated to the widget with totals and item count so the UI matches the server.
    • Shopify theme cart: The service sends add_to_store_cart to the widget instead of mutating cart only on the server; the widget performs the Ajax add and returns cart_add_result. Reading the cart may use get_store_cart from the server, which the widget answers with cart_state_result after GET /cart.js.
  3. The model receives a text summary of the tool result and can speak to the user.

Message cheat sheet (widget ↔ backend)

Direction t / type Role
Widget → backend product_selected User confirmed quantity and tapped Add/Update on a card.
Backend → widget show_products / show_items Render product cards (search or browse results).
Backend → widget cart_updated Sync cartTotal, cartItemCount, optional currency, optional action + product for toast.
Backend → widget add_to_store_cart Shopify: widget runs Ajax add for variantId / quantity. Optional verbalAck: true when the add was triggered by the user tapping Add (not an agent tool); echoed on cart_add_result so the server can prompt a brief spoken acknowledgment.
Widget → backend cart_add_result Outcome of Ajax add (success, counts, totals, currency). Includes verbalAck when the originating add_to_store_cart or hook add_to_site_cart requested it (UI add).
Backend → widget get_store_cart Shopify: widget fetches /cart.js and replies with cart contents.
Widget → backend cart_state_result Async cart snapshot after get_store_cart (implementation-specific).
Backend → widget client_tool_call Optional: run custom logic in the page; reply with client_tool_result.

Custom client tools: Register handlers with registerToolHandler on the widget or AgentSDK so the toolName in each client_tool_call matches what your backend defines.

Widget Flavors

The TTP SDK supports domain-specific "flavors" that customize the widget experience for different verticals. Each flavor provides specialized UI components, backend tools, and message handling.

Available Flavors

Flavor TypePartner IDsBackend ToolsKey Features
ecommerce mock-store, shopify search_products, add_to_cart, get_cart Product cards, cart bar, fly-to-cart animation
hotels mock-hotel search_rooms, select_room, add_extra, get_booking, show_media Room cards, booking bar, gallery
pharma mock-pharm search_medications, add_to_prescription, get_prescription Medication cards (with Rx/OTC badges), prescription summary bar
restaurants mock-restaurant search_menu, add_to_order, get_order, show_media Menu item cards (with allergen/dietary tags), order summary bar, gallery
tours mock-tour search_tours, book_tour, get_tour_booking, show_media Tour item cards, booking summary bar, gallery

E-commerce WebSocket messages: The widget listens for t: "show_items" and t: "show_products" (same handler; backend tools often send show_products with products, title, layout). Use TTPEcommerceWidget (or TTPChatWidget with flavor.type: 'ecommerce') so these handlers are registered. For end-to-end add-to-cart flows (partner API vs Shopify Ajax vs client tools), see E-commerce cart flows.

Desktop voice strip (flavor.callView: 'minimized'): When flavor.callView is 'minimized', a fixed bottom voice surface is used on viewports wider than 768px. This works with or without a flavor: it requires only a flavor object carrying callView: 'minimized' (e.g. flavor: { callView: 'minimized' }, no type/partner needed) — a flavor type is optional. It is a rounded floating dock with layered glass styling (rim light, ambient shadow, soft indigo outer glow), an inset transcript “well,” squircle control tiles, and a blurred LIVE capsule—same layout and controls as before. It is centered with a capped width (up to about 720px with horizontal inset) and bottom spacing plus env(safe-area-inset-bottom). It uses the same tokens as the floating pill (voice.pillGradient, pillTextColor, pillDotColor, endCallButtonColor). Mute, pause, speaker, keyboard (text inject), and end-call align with the in-widget voice UI. The in-panel desktop “active call” UI stays hidden; the floating panel collapses if it was open, and the launcher pill is hidden for the duration of the call. The “Powered by” footer (footer.show, footer.brand) is rendered inside the strip (below the call controls) for the duration of the call—not as a separate floating pill at the widget corner. Viewports 768px and below skip the strip and use the mobile minimized bar flow instead. A single transcript line shows either the assistant (streaming) or the user (speech-to-text interim/final), never both at once; user text uses a warm highlight color and no You: prefix. When the widget language is Hebrew or Arabic (base he / ar) or direction is 'rtl', chrome uses RTL. Transcript lines use explicit dir="rtl" when the string contains Hebrew or Arabic letters (even if the widget Language is English), so neutral punctuation follows the sentence; Latin-only lines use dir="ltr". Mic/pause/speaker/end layout is unchanged.

Configuration

Set the flavor when creating the widget:

const widget = new TTPChatWidget({
  agentId: 'agent_...',
  appId: 'app_...',
  flavor: {
    type: 'pharma',        // 'ecommerce' | 'hotels' | 'pharma' | 'restaurants' | 'tours'
    partnerId: 'mock-pharm' // partner-specific data source
  }
});

Pharma Flavor

The pharmacy flavor provides a medication search and prescription management experience.

// Pharma configuration
flavor: {
  type: 'pharma',
  partnerId: 'mock-pharm'
}

// Backend tools injected automatically:
// - search_medications: Search the medication catalog
// - add_to_prescription: Add a medication to the prescription
// - get_prescription: View current prescription contents

// Frontend message types:
// - show_items: Displays medication cards
// - prescription_updated: Updates the prescription summary bar

The pharma flavor does not include gallery support (show_media).

Restaurants Flavor

The restaurants flavor provides a menu browsing, ordering, and photo gallery experience.

// Restaurant configuration
flavor: {
  type: 'restaurants',
  partnerId: 'mock-restaurant'
}

// Backend tools injected automatically:
// - search_menu: Search the restaurant menu
// - add_to_order: Add a menu item to the order
// - get_order: View current order contents
// - show_media: Display restaurant photo gallery

// Frontend message types:
// - show_items: Displays menu item cards
// - order_updated: Updates the order summary bar
// - show_media / dismiss_media: Gallery with dish photos, ambiance, etc.

Menu item cards display allergen warnings and dietary tags automatically.

Tours Flavor

The tours flavor provides a tour browsing, booking, and photo gallery experience.

// Tour configuration
flavor: {
  type: 'tours',
  partnerId: 'mock-tour'
}

// Backend tools injected automatically:
// - search_tours: Search available tours and activities by query
// - book_tour: Book a tour or activity
// - get_tour_booking: View current booking contents
// - show_media: Display tour photo gallery

// Frontend message types:
// - show_items: Displays tour item cards
// - cart_updated: Updates the booking summary bar
// - show_media / dismiss_media: Gallery with tour photos, highlights, etc.

Tour cards display activity tags and pricing. Reuses the same UI infrastructure as the restaurants flavor.

Client-Script Tools

Client-script tools let you attach backend-authored JavaScript to a client tool (tool_type: 'client') that runs in the visitor's browser when the LLM calls the tool — no host-page registerToolHandler() code required. The script is configured in the dashboard (tool form → "Agent scripts"), stored with the tool, and delivered to the widget automatically at session start. Available since SDK v2.45.3.

When to use which: use registerToolHandler() when the host page owns the logic and ships its own JS. Use a client-script tool when the agent owner wants browser-side behavior (DOM reads, page API calls, UI nudges) configurable from the dashboard without touching the embedding site.

How it works

  1. Session init — the backend pushes a partner_bundle message with the reserved partner id __client_tools__, carrying every scripted tool attached to the agent (plus auto-run library scripts). The bundle is pushed on both the voice and text-chat channels, works with or without a widget flavor, and is re-pushed on reconnect (a new bundle replaces the previous one).
  2. Compile on load — each entry compiles into a strict-mode async function immediately when the bundle lands. A syntax error poisons only that entry (logged at load time); the rest of the bundle still works.
  3. Invocation — when the LLM calls the tool, the SDK runs the compiled script and returns its result to the backend.
  4. Auto-run — library scripts flagged auto_run execute exactly once when the bundle arrives (e.g. to set up page listeners).

Authoring styles

Two styles compile — the SDK detects the shape and picks the right wrapping:

// 1. Raw statement body (typical in the dashboard UI; `ctx` is in scope)
alert("hi");
return { ok: true, clicked: true };

// 2. Function expression
async (ctx) => {
  const res = await ctx.fetch('/api/something');
  return { ok: true, data: await res.json() };
}

Both run inside an async function, so await is always allowed. A script that returns nothing resolves to { ok: true }.

The ctx object

PropertyDescription
ctx.argsParameters from the LLM tool call.
ctx.hostwindow.location.hostname.
ctx.fetchBound window.fetch.
ctx.log(...)console.log prefixed with [adapter:<tool>].
ctx.runAdapter(action, args)Invoke another script in the same bundle (max nesting depth 16).
ctx.emit(msg)Send an unsolicited JSON message back to the backend over the active channel.

Client-tool scripts run with a generic context — the ecommerce-only helpers (ctx.platform, ctx.normalizeCart, ctx.refreshUI) are not available; those exist only in ecommerce partner-adapter bundles.

Pre / main / post steps

A scripted tool may carry up to three steps, executed in order: pre → main → post.

  • Pre is best-effort — if it throws, the error is logged and main still runs.
  • Main produces the tool result. A thrown error becomes an ok: false result.
  • Post runs only when main succeeded; its ctx.args._mainResult holds main's return value.

Sync vs async results

Configured per tool in the dashboard ("Wait for result"):

ModeBehavior
Wait ON (sync)The backend awaits the script result up to timeoutMs (1000–15000 ms, default 8000) and feeds it to the LLM as the tool result.
Wait OFF (async)The LLM immediately receives the configured pending message; the real result is injected later: SPOKEN_EVENT (spoken immediately), SPOKEN_DEFERRED (next natural turn, default), or SILENT_CONTEXT (silent context update).

Limits

LimitValue
Per-step code size32 KB
Result size512 KB serialized (oversize → ok: false, error: "RESULT_TOO_LARGE")
Sync timeout1000–15000 ms (default 8000)
Pending message500 chars

Wire protocol (reference)

Script invocations arrive on two wire forms, both handled by the SDK on both channels:

// Bundle (session init / reconnect)
{ "t": "partner_bundle", "partner_id": "__client_tools__",
  "adapters": { "my_tool": { "code_js": "...", "pre_code_js": "...", "post_code_js": "..." } },
  "autoRun":  { "my_script": { "code_js": "..." } } }

// Form A — dedicated envelope (async path / chain steps)
→ { "t": "run_partner_script", "requestId": "...", "partnerId": "__client_tools__",
    "action": "my_tool", "args": { ... } }
← { "t": "run_partner_script_result", "requestId": "...", "ok": true, "result": { ... } }

// Form B — client_tool_call piggyback (backend sync-await path)
→ { "t": "client_tool_call", "toolCallId": "...", "toolName": "run_partner_script",
    "parameters": { "action": "my_tool", "args": { ... }, "partner_id": "__client_tools__" } }
← { "t": "client_tool_result", "toolCallId": "...", "result": { ... } }

Routing rule: partner_id === '__client_tools__' → the flavor-independent ClientScriptManager; any other partner id → the ecommerce flavor's partner-adapter path. The two bundles coexist.

Troubleshooting

SymptomCause / fix
compile failed ... SyntaxError at bundle loadScript doesn't parse. On SDK < 2.45.1, raw statement bodies (e.g. alert("stop");) failed even when valid — check the SDK version banner in the console and hard-refresh if stale.
NO_HANDLER for run_partner_scriptSDK < 2.45.3 didn't route the client_tool_call piggyback form without an ecommerce flavor. Fixed in 2.45.3.
adapter_not_in_bundleTool not attached to the agent, or the bundle hasn't arrived yet. Check the session-start partner_bundle log.
Sync result is a timeoutScript exceeded timeoutMs. Raise it (max 15 s) or switch to async delivery.

For the full internals reference see CLIENT_SCRIPT_TOOLS_GUIDE.md in the repository root.

Chain Tools

A chain tool (tool_type: 'chain') runs a graph of steps — server tools, client tools, agent switches, and library scripts — as a single LLM tool call. Chains are edited visually in the dashboard (Chain Editor: nodes are steps, edges are data-flow dependencies) and executed by the backend; the widget participates only when a step targets the browser.

Execution model

  • Steps with no dependencies run first; steps at the same level run in parallel; levels run sequentially.
  • A step's input_from lists the upstream steps whose outputs feed it. Dependent steps receive the original LLM arguments shallow-merged with each upstream result (a non-object result lands under the upstream step's id).
  • Fail-fast: the first failing step aborts the chain with { "success": false, "error": ..., "step": ... }.
  • The LLM receives the terminal step's result (multiple terminals merge keyed by step id).
  • Inside a chain, every step is awaited — a referenced client tool's own "wait for result = off" setting is ignored.

Step types and the widget

Step typeRuns whereWidget involvement
server_toolBackend (webhook)None.
switch_agentBackendNone.
client_tool (plain)BrowserArrives as a normal client_tool_call → your registerToolHandler() handler.
client_tool (scripted)BrowserDispatched as a __client_tools__ bundle action keyed by tool name.
client_script (library)BrowserDispatched as a __client_tools__ bundle action keyed by script:{scriptId}.

Scripts and scripted client tools referenced by a chain are resolved into the session bundle automatically — they do not need to be attached to the agent.

Constraints

  • 1–30 steps per chain; no cycles; chains cannot reference other chain tools (no recursion).
  • Deleting a tool or library script that a chain references is blocked in the dashboard (the delete dialog lists the referencing chains).

VoiceSDK Class

Core class for voice interaction functionality. Server-driven disclaimer gating (compliance copy before STT/greeting) is described in detail under Server-driven disclaimer (voice) and applies when using protocol v2.

Constructor

new VoiceSDK(config)

Configuration Object

Property Type Required Description
agentId string Yes The AI agent identifier to connect to
appId string Yes Your application identifier
websocketUrl string No Optional custom WebSocket base URL (defaults to wss://speech.talktopc.com/ws/conv)
agentSettingsOverride object No Custom agent configuration
voice string No Voice preset name (default: 'default')
language string No Language code (default: 'en')
sampleRate number No Input audio sample rate: 8000, 16000, 22050, 24000, 44100, or 48000 Hz (default: 16000)
channels number No Input audio channels (default: 1, mono only)
bitDepth number No Input audio bit depth: 8, 16, or 24 bits (default: 16)
outputContainer string No Output container format: 'raw' or 'wav' (default: 'raw')
outputEncoding string No Output audio encoding: 'pcm', 'pcmu' (μ-law), or 'pcma' (A-law) (default: 'pcm')
outputSampleRate number No Output audio sample rate: 8000, 16000, 22050, 24000, 44100, or 48000 Hz (default: 24000)
outputChannels number No Output audio channels (default: 1, mono only)
outputBitDepth number No Output audio bit depth: 8, 16, or 24 bits (default: 16)
outputFrameDurationMs number No Frame duration for raw PCM streaming in milliseconds (default: 600)
protocolVersion number No Protocol version: 1 (legacy) or 2 (format negotiation) (default: 2)
autoReconnect boolean No Auto-reconnect on disconnect (default: true)

Audio Format Configuration (v2 Protocol)

The SDK v2 supports format negotiation with the backend. You can specify both input and output audio formats:

📋 Format Support:
  • Input Encodings: PCM, PCMU (μ-law), PCMA (A-law)
  • Input Sample Rates: 8000, 16000, 22050, 24000, 44100, 48000 Hz
  • Input Bit Depths: 8, 16, 24 bits
  • Output Containers: 'raw' (no header) or 'wav' (with header)
  • Output Encodings: PCM, PCMU (μ-law), PCMA (A-law)
  • Output Sample Rates: 8000, 16000, 22050, 24000, 44100, 48000 Hz
  • Output Bit Depths: 8, 16, 24 bits

Example: Custom Audio Format

const voiceSDK = new VoiceSDK({
  agentId: 'agent_123',
  appId: 'your_app_id',
  
  // Input format (what we send to server)
  sampleRate: 16000,
  channels: 1,
  bitDepth: 16,
  
  // Output format (what we want from server)
  outputContainer: 'raw',        // 'raw' or 'wav'
  outputEncoding: 'pcm',          // 'pcm', 'pcmu', 'pcma'
  outputSampleRate: 24000,       // Default; typical TTS/server output
  outputChannels: 1,
  outputBitDepth: 16,
  outputFrameDurationMs: 600,    // Frame duration for streaming
  
  // Protocol version
  protocolVersion: 2              // Use v2 protocol for format negotiation
});

// Listen for format negotiation
voiceSDK.on('formatNegotiated', (format) => {
  console.log('Format negotiated:', format);
  // format contains: { container, encoding, sampleRate, channels, bitDepth }
});

Methods

connect()

Connect to the voice agent.

Returns: Promise<boolean>

await voiceSDK.connect();

disconnect()

Disconnect from the voice agent.

Returns: void

voiceSDK.disconnect();

startRecording()

Start capturing and streaming audio.

Disclaimer (v2): If the server required a disclaimer and disclaimersPending is still true, this throws an Error with error.code === 'DISCLAIMER_PENDING'. Call sendDisclaimerAck(true) first. See Server-driven disclaimer.

Returns: Promise<boolean>

await voiceSDK.startRecording();

sendDisclaimerAck(accepted) (v2, voice)

Sends disclaimer_ack after the user accepts or declines server-shown disclaimer text. Uses disclaimersHash and conversationId from the last hello_ack.

Parameters: accepted (boolean) — true to continue the call, false to decline.

Returns: void (logs and clears disclaimersPending on successful send).

voiceSDK.sendDisclaimerAck(true);

stopRecording()

Stop capturing audio.

Returns: Promise<boolean>

await voiceSDK.stopRecording();

toggleRecording()

Toggle recording state (start/stop).

Returns: Promise<boolean>

await voiceSDK.toggleRecording();

pauseCall()

Pause the active call. Stops sending audio, clears the playback queue, and notifies the server to close STT/TTS connections. The WebSocket stays open and conversation history is preserved. A configurable timeout (default 5 minutes) will automatically end the call if not resumed.

Returns: void

voiceSDK.pauseCall();
// or via AgentSDK:
agentSDK.pauseCall();

resumeCall()

Resume a paused call. Restarts audio recording and notifies the server to re-create STT/TTS connections. The conversation continues from where it left off with full history.

Returns: void

voiceSDK.resumeCall();
// or via AgentSDK:
agentSDK.resumeCall();

isPaused (property)

Boolean indicating whether the call is currently paused.

Type: boolean

if (voiceSDK.isPaused) {
  console.log('Call is paused');
}
// or via AgentSDK:
if (agentSDK.isPaused) {
  console.log('Call is paused');
}

getStatus()

Get current connection and recording status.

Returns: Object

const status = voiceSDK.getStatus();
// Returns: {
//   version: '2.0.0',
//   isConnected: boolean,
//   isRecording: boolean,
//   isPlaying: boolean,
//   outputFormat: object,      // Negotiated output format (v2)
//   audioPlayer: object,       // AudioPlayer status
//   audioRecorder: object      // AudioRecorder status
// }

validateInputFormat(format)

v2 only: Validate input audio format configuration.

Parameters:

  • format (object) - Format object with encoding, sampleRate, bitDepth, channels

Returns: string|null - Error message if invalid, null if valid

const error = voiceSDK.validateInputFormat({
  encoding: 'pcm',
  sampleRate: 16000,
  bitDepth: 16,
  channels: 1
});
if (error) {
  console.error('Invalid format:', error);
}

validateOutputFormat(format)

v2 only: Validate output audio format configuration.

Parameters:

  • format (object) - Format object with container, encoding, sampleRate, bitDepth, channels

Returns: string|null - Error message if invalid, null if valid

const error = voiceSDK.validateOutputFormat({
  container: 'raw',
  encoding: 'pcm',
  sampleRate: 24000,
  bitDepth: 16,
  channels: 1
});
if (error) {
  console.error('Invalid format:', error);
}

updateConfig(newConfig)

Update SDK configuration dynamically.

Parameters:

  • newConfig (object) - Partial configuration object to merge with existing config

Returns: void

voiceSDK.updateConfig({
  outputSampleRate: 48000,
  outputEncoding: 'pcmu'
});

reconnect()

Manually reconnect to the agent.

Returns: Promise<boolean>

await voiceSDK.reconnect();

stopAudioPlayback()

Immediately stop audio playback (for barge-in).

Returns: void

voiceSDK.stopAudioPlayback();

on(event, callback)

Register an event listener.

Parameters:

  • event (string) - Event name
  • callback (function) - Event handler
voiceSDK.on('connected', () => {
  console.log('Connected!');
});

destroy()

Cleanup all resources and disconnect.

Returns: void

voiceSDK.destroy();

Events Reference

Event Parameters Description
connected - Emitted when successfully connected
disconnected event Emitted when disconnected (includes reason)
error error Emitted on errors
recordingStarted - Emitted when recording starts
recordingStopped - Emitted when recording stops
message message Emitted for all WebSocket messages
playbackStarted - Emitted when audio playback starts
playbackStopped - Emitted when audio playback stops
playbackError error Emitted on audio playback errors
bargeIn message Emitted when user interrupts agent
stopPlaying message Emitted when server requests to stop audio
formatNegotiated format v2 only: Emitted when audio format is negotiated with server. Format object contains: container, encoding, sampleRate, channels, bitDepth
greetingStarted - Emitted when greeting audio starts
domainError error Emitted when domain is not whitelisted
disclaimersRequired payload v2 voice: Server requires disclaimer acknowledgement. Payload: texts, disclaimersHash, disclaimerTimeoutMs, conversationId. Call sendDisclaimerAck after user decision.
disclaimerRejected { code, message } v2 voice: Terminal disclaimer failure from server (DISCLAIMER_DECLINED, DISCLAIMER_TIMEOUT, DISCLAIMER_HASH_MISMATCH). Not used for DISCLAIMER_PENDING (that uses error).

Configuration Options

Agent Settings Override

Complete reference for all overridable settings:

Core Settings

Setting Type Range/Values Description
prompt string Any text System prompt/instructions for the agent
temperature number 0.0 - 2.0 LLM creativity level
maxTokens number 1 - 4096 Maximum tokens per response
model string Model names ⚠️ NOT SUPPORTED - LLM model selection requires infrastructure changes
language string ISO codes Response language (e.g., 'en', 'es', 'fr')

Voice Settings

Setting Type Range/Values Description
voiceId string Voice IDs Specific voice identifier
voiceSpeed number 0.5 - 2.0 Voice speed multiplier

Behavior Settings

Setting Type Range/Values Description
firstMessage string Any text Initial greeting message
disableInterruptions boolean true/false Prevent user from interrupting agent
autoDetectLanguage boolean true/false Automatically detect user's language
candidateLanguages array Language codes List of candidate languages for auto-detection (e.g., ['en', 'es', 'fr'])
maxCallDuration number Seconds Maximum session duration

Advanced Settings

Setting Type Range/Values Description
toolIds array Array of numbers Array of custom tool IDs to enable for this agent (e.g., [123, 456, 789])
internalToolIds array Array of strings Array of internal tool IDs to enable for this agent (e.g., ['calendar', 'weather', 'email'])
timezone string TZ names User timezone (e.g., 'America/New_York')

Text-to-Speech API

PUBLIC REST API

Generate high-quality voice audio from text using our public REST API endpoint.

🔒 Authentication: This endpoint requires API key authentication. Never expose your API key in frontend code - always call from your backend.

Endpoint

POST https://backend.talktopc.com/api/public/agents/tts/generate

Authentication

Include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Request Parameters

Parameter Type Required Description
text string Yes The text to convert to speech
voiceId string No Voice identifier (default: agent's configured voice)
voiceSpeed number No Voice speed multiplier: 0.5 - 2.0 (default: 1.0)
language string No Language code (e.g., 'en', 'es', 'fr')
agentId string No Agent ID to use voice settings from

Example Requests

curl -X POST https://backend.talktopc.com/api/public/agents/tts/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "text": "Hello! Welcome to our service.",
    "voiceId": "nova",
    "voiceSpeed": 1.2,
    "language": "en"
  }' \
  --output speech.mp3
const response = await fetch('https://backend.talktopc.com/api/public/agents/tts/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.TTP_API_KEY}`
  },
  body: JSON.stringify({
    text: 'Hello! Welcome to our service.',
    voiceId: 'nova',
    voiceSpeed: 1.2,
    language: 'en'
  })
});

// Response is audio file
const audioBuffer = await response.arrayBuffer();
const audioBlob = new Blob([audioBuffer], { type: 'audio/mpeg' });

// Play or save the audio
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
import requests
import os

url = "https://backend.talktopc.com/api/public/agents/tts/generate"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.environ['TTP_API_KEY']}"
}
data = {
    "text": "Hello! Welcome to our service.",
    "voiceId": "nova",
    "voiceSpeed": 1.2,
    "language": "en"
}

response = requests.post(url, json=data, headers=headers)

if response.status_code == 200:
    # Save audio to file
    with open("speech.mp3", "wb") as f:
        f.write(response.content)
    print("Audio saved to speech.mp3")
else:
    print(f"Error: {response.status_code}")
<?php
$url = "https://backend.talktopc.com/api/public/agents/tts/generate";
$apiKey = getenv('TTP_API_KEY');

$data = [
    'text' => 'Hello! Welcome to our service.',
    'voiceId' => 'nova',
    'voiceSpeed' => 1.2,
    'language' => 'en'
];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json',
    'Authorization: Bearer ' . $apiKey
]);

$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($httpCode === 200) {
    file_put_contents('speech.mp3', $response);
    echo "Audio saved to speech.mp3";
} else {
    echo "Error: HTTP $httpCode";
}
?>
import java.net.http.*;
import java.net.URI;
import java.nio.file.*;

public class TTSExample {
    public static void main(String[] args) throws Exception {
        String url = "https://backend.talktopc.com/api/public/agents/tts/generate";
        String apiKey = System.getenv("TTP_API_KEY");
        
        String json = """
            {
                "text": "Hello! Welcome to our service.",
                "voiceId": "nova",
                "voiceSpeed": 1.2,
                "language": "en"
            }
            """;
        
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .header("Content-Type", "application/json")
            .header("Authorization", "Bearer " + apiKey)
            .POST(HttpRequest.BodyPublishers.ofString(json))
            .build();
        
        HttpResponse<byte[]> response = client.send(
            request, 
            HttpResponse.BodyHandlers.ofByteArray()
        );
        
        if (response.statusCode() == 200) {
            Files.write(Paths.get("speech.mp3"), response.body());
            System.out.println("Audio saved to speech.mp3");
        } else {
            System.out.println("Error: " + response.statusCode());
        }
    }
}

Response

The endpoint returns audio data directly with the following headers:

Header Value Description
Content-Type audio/mpeg Audio format (MP3)
Content-Length number Size of audio file in bytes

Voice Speed Examples

Speed Effect Use Case
0.5 50% slower (half speed) Educational content, accessibility
0.75 25% slower Clear pronunciation, language learning
1.0 Normal speed (default) Standard conversation
1.2 20% faster Quick updates, notifications
1.5 50% faster Rapid information delivery
2.0 2x speed (double speed) Maximum speed, time-saving

Backend Implementation Example

// Your backend endpoint
app.post('/api/generate-speech', async (req, res) => {
  const { text, voiceSpeed = 1.0 } = req.body;
  
  // Call TTP TTS API
  const ttpResponse = await fetch('https://backend.talktopc.com/api/public/agents/tts/generate', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.TTP_API_KEY}`  // 🔒 Secret!
    },
    body: JSON.stringify({
      text: text,
      voiceSpeed: voiceSpeed,
      voiceId: 'nova',
      language: 'en'
    })
  });
  
  if (!ttpResponse.ok) {
    return res.status(ttpResponse.status).json({ 
      error: 'TTS generation failed' 
    });
  }
  
  // Forward audio to client
  const audioBuffer = await ttpResponse.arrayBuffer();
  res.set('Content-Type', 'audio/mpeg');
  res.send(Buffer.from(audioBuffer));
});

Error Responses

Status Code Description
400 Bad Request - Invalid parameters
401 Unauthorized - Invalid or missing API key
429 Too Many Requests - Rate limit exceeded
500 Internal Server Error - TTS generation failed
⚠️ Security Best Practices:
  • Never expose your API key in frontend JavaScript
  • Always call this endpoint from your backend server
  • Implement rate limiting on your backend
  • Validate and sanitize text input to prevent abuse

Use Cases

📢 Announcements

Generate audio announcements for notifications

📚 Content Creation

Convert articles or books to audio format

♿ Accessibility

Provide audio alternatives for text content

🎓 E-Learning

Create voice-overs for educational materials

Java SDK

Server-side Java SDK for text-to-speech conversion. Perfect for backend applications, phone systems, and server-to-server integrations.

🎯 Use Cases:
  • Backend TTS: Generate speech on your server without exposing API keys
  • Phone Systems: Integrate with Twilio, Telnyx, or custom VoIP systems
  • Server-to-Server: Automated voice generation for notifications, alerts, or content
  • Audio Format Control: Request specific formats (PCMU, PCMA, PCM) for phone systems

Installation

Maven

<repositories>
    <repository>
        <id>github</id>
        <url>https://maven.pkg.github.com/TTP-GO/java-sdk</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.talktopc</groupId>
        <artifactId>ttp-agent-sdk-java</artifactId>
        <version>1.0.5</version>
    </dependency>
</dependencies>
⚠️ GitHub Packages Authentication:

You'll need to authenticate with GitHub Packages. Add credentials to your ~/.m2/settings.xml:

<settings>
    <servers>
        <server>
            <id>github</id>
            <username>YOUR_GITHUB_USERNAME</username>
            <password>YOUR_GITHUB_TOKEN</password>
        </server>
    </servers>
</settings>

Gradle

repositories {
    maven {
        url = uri("https://maven.pkg.github.com/TTP-GO/java-sdk")
        credentials {
            username = project.findProperty("gpr.user") ?: System.getenv("USERNAME")
            password = project.findProperty("gpr.key") ?: System.getenv("TOKEN")
        }
    }
}

dependencies {
    implementation 'com.talktopc:ttp-agent-sdk-java:1.0.5'
}

Quick Start

1. Initialize SDK

import com.talktopc.sdk.VoiceSDK;

// Get API key from environment variable
String apiKey = System.getenv("TALKTOPC_API_KEY");

VoiceSDK sdk = VoiceSDK.builder()
    .apiKey(apiKey)
    .baseUrl("https://speech.talktopc.com")  // Optional
    .build();

2. Simple TTS (Blocking)

// Generate complete audio file
byte[] audio = sdk.textToSpeech("Hello world", "mamre");

// Save to file
Files.write(Paths.get("output.wav"), audio);

// Or send to phone system
phoneSystem.playAudio(audio);

3. Streaming TTS (Real-time)

// Stream audio chunks as they're generated
sdk.textToSpeechStream(
    "Hello world, this is a longer text that will be streamed",
    "mamre",
    audioChunk -> {
        // Receive chunks in real-time
        phoneSystem.playAudio(audioChunk);
    }
);

API Reference

VoiceSDK

Main SDK entry point for text-to-speech operations.

Builder Methods

Method Type Description
apiKey(String) String Your TalkToPC API key (required)
baseUrl(String) String API base URL (default: https://speech.talktopc.com)
connectTimeout(int) int Connection timeout in milliseconds (default: 30000)
readTimeout(int) int Read timeout in milliseconds (default: 60000)

Methods

Method Description
textToSpeech(String text, String voiceId) Simple TTS (blocking) - returns complete audio as byte array
textToSpeech(String text, String voiceId, double speed) TTS with speed control (0.1 - 3.0)
textToSpeech(TTSRequest request) TTS with full configuration (format, speed, etc.)
synthesize(TTSRequest request) Get full response with metadata (sample rate, duration, credits)
textToSpeechStream(String text, String voiceId, Consumer<byte[]> chunkHandler) Streaming TTS - chunks delivered to handler as they're generated
textToSpeechStream(TTSRequest request, Consumer<byte[]> chunkHandler, Consumer<StreamMetadata> onComplete, Consumer<Throwable> onError) Streaming TTS with completion and error callbacks

TTSRequest Builder

Configure TTS requests with audio format options.

Basic Configuration

TTSRequest request = TTSRequest.builder()
    .text("Hello world")              // Required
    .voiceId("mamre")                // Required
    .speed(1.0)                      // Optional (0.1 - 3.0)
    .build();

Audio Format Configuration

TTSRequest request = TTSRequest.builder()
    .text("Hello world")
    .voiceId("mamre")
    .outputContainer("raw")          // "raw" or "wav"
    .outputEncoding("pcm")           // "pcm", "pcmu", "pcma"
    .outputSampleRate(24000)         // Hz (8000, 16000, 22050, 24000, 44100, 48000)
    .outputBitDepth(16)              // bits (8, 16, 24)
    .outputChannels(1)               // 1 (mono) or 2 (stereo)
    .outputFrameDurationMs(600)     // ms per frame (for streaming)
    .build();

Preset Methods

Method Format Use Case
phoneSystem() PCMU @ 8kHz, 20ms frames Phone systems (Twilio, Telnyx, VoIP)
highQuality() WAV @ 44.1kHz High-quality audio files
standardQuality() PCM @ 22.05kHz Standard quality audio

TTSResponse

Response object containing audio and metadata.

Method Return Type Description
getAudio() byte[] Audio data
getSampleRate() int Sample rate in Hz
getDurationMs() long Playback duration in milliseconds
getAudioSizeBytes() long Audio size in bytes
getCreditsUsed() double Credits consumed
getConversationId() String Unique conversation ID

Examples

Basic Usage

VoiceSDK sdk = VoiceSDK.builder()
    .apiKey(System.getenv("TALKTOPC_API_KEY"))
    .build();

// Simple TTS
byte[] audio = sdk.textToSpeech("Welcome to TalkToPC", "mamre");
System.out.println("Generated " + audio.length + " bytes of audio");

// Save to file
Files.write(Paths.get("output.wav"), audio);

With Speed Control

// Faster speech (1.5x speed)
byte[] fastAudio = sdk.textToSpeech("Quick message", "mamre", 1.5);

// Slower speech (0.8x speed)
byte[] slowAudio = sdk.textToSpeech("Slow and clear", "mamre", 0.8);

Streaming with Metadata

sdk.textToSpeechStream(
    TTSRequest.builder()
        .text("Streaming example with full configuration")
        .voiceId("mamre")
        .speed(1.0)
        .build(),
    audioChunk -> {
        // Handle each audio chunk
        System.out.println("Received chunk: " + audioChunk.length + " bytes");
        phoneSystem.playAudio(audioChunk);
    },
    metadata -> {
        // Handle completion
        System.out.println("Stream completed:");
        System.out.println("  Total chunks: " + metadata.getTotalChunks());
        System.out.println("  Total bytes: " + metadata.getTotalBytes());
        System.out.println("  Duration: " + metadata.getDurationMs() + " ms");
        System.out.println("  Credits: " + metadata.getCreditsUsed());
    },
    error -> {
        // Handle errors
        System.err.println("Stream error: " + error.getMessage());
    }
);

Phone System Integration

Perfect for Twilio, Telnyx, or custom VoIP systems.

Standard Phone System (PCMU @ 8kHz)

// Using convenient phoneSystem() preset
TTSRequest request = TTSRequest.builder()
    .text("Hello, thank you for calling. How can I help you today?")
    .voiceId("en-US-female")
    .phoneSystem()  // ✅ PCMU @ 8kHz, 20ms frames
    .build();

sdk.textToSpeechStream(
    request,
    audioChunk -> {
        // audioChunk is PCMU @ 8kHz, 20ms frames (160 bytes)
        // Ready to send directly to phone connection
        phoneConnection.sendAudio(audioChunk);
    }
);

Twilio Integration

TTSRequest request = TTSRequest.builder()
    .text("Your appointment is confirmed for tomorrow at 3 PM")
    .voiceId("en-US-male")
    .outputContainer("raw")
    .outputEncoding("pcmu")      // μ-law for Twilio
    .outputSampleRate(8000)       // 8kHz
    .outputBitDepth(16)
    .outputChannels(1)            // Mono
    .outputFrameDurationMs(20)    // 20ms frames
    .build();

sdk.textToSpeechStream(
    request,
    audioChunk -> {
        // Send to Twilio Media Stream
        twilioStream.sendMedia(audioChunk);
    }
);

Custom Audio Format

TTSRequest request = TTSRequest.builder()
    .text("Custom format example")
    .voiceId("mamre")
    .outputContainer("raw")
    .outputEncoding("pcm")
    .outputSampleRate(16000)      // 16kHz
    .outputBitDepth(16)           // 16-bit
    .outputChannels(1)            // Mono
    .outputFrameDurationMs(100)   // 100ms frames
    .build();

byte[] audio = sdk.textToSpeech(request);
// Expected: 16kHz PCM, 16-bit, mono

High Quality Audio

TTSRequest request = TTSRequest.builder()
    .text("This is a high quality recording")
    .voiceId("mamre")
    .highQuality()  // WAV @ 44.1kHz
    .build();

byte[] audio = sdk.textToSpeech(request);
Files.write(Paths.get("high_quality.wav"), audio);

Error Handling

import com.talktopc.sdk.exception.TtsException;

try {
    byte[] audio = sdk.textToSpeech("Test", "mamre");
} catch (TtsException e) {
    System.err.println("TTS Error [" + e.getStatusCode() + "]: " + e.getErrorMessage());
    
    switch (e.getStatusCode()) {
        case 401:
            System.err.println("→ Invalid API key");
            break;
        case 402:
            System.err.println("→ Insufficient credits");
            break;
        case 400:
            System.err.println("→ Invalid parameters");
            break;
        default:
            System.err.println("→ Other error");
    }
}

Full Configuration Example

import com.talktopc.sdk.models.TTSRequest;
import com.talktopc.sdk.models.TTSResponse;

// Build request with all options
TTSRequest request = TTSRequest.builder()
    .text("Full configuration example")
    .voiceId("mamre")
    .speed(1.2)
    .outputContainer("wav")
    .outputEncoding("pcm")
    .outputSampleRate(24000)
    .outputBitDepth(16)
    .outputChannels(1)
    .build();

// Get response with metadata
TTSResponse response = sdk.synthesize(request);

System.out.println("Audio: " + response.getAudioSizeBytes() + " bytes");
System.out.println("Sample rate: " + response.getSampleRate() + " Hz");
System.out.println("Duration: " + response.getDurationMs() + " ms");
System.out.println("Credits: " + response.getCreditsUsed());

// Save audio
Files.write(Paths.get("output.wav"), response.getAudio());

Supported Audio Formats

Format Encoding Sample Rates Use Case
PCM pcm 8000, 16000, 22050, 24000, 44100 Hz General purpose, high quality
PCMU (μ-law) pcmu 8000 Hz Phone systems (Twilio, Telnyx, VoIP)
PCMA (A-law) pcma 8000 Hz Phone systems (European standards)

Requirements

  • Java 11 or higher
  • Valid TalkToPC API key
  • No external dependencies - Uses Java 11+ HttpClient
💡 Key Differences from Frontend SDK:
  • Backend-only: Designed for server-side use, not browser
  • Format Pass-through: Can request PCMU/PCMA and forward directly to phone systems
  • No Audio Playback: Returns raw audio bytes - you handle playback/forwarding
  • REST API: Uses REST endpoints instead of WebSocket

Resources

  • GitHub Repository: TTP-GO/java-sdk
  • Maven Package: com.talktopc:ttp-agent-sdk-java:1.0.5
  • Documentation: See README.md in the repository
  • Examples: Check src/main/java/com/talktopc/sdk/examples/