Skip to content

Speech

Convert text to speech (TTS) and transcribe audio to text (STT).

Text-to-Speech

php
use Atlasphp\Atlas\Atlas;

$response = Atlas::speech('openai', 'tts-1')
    ->instructions('Hello, welcome to Atlas!')
    ->withVoice('alloy')
    ->asAudio();

// $response->data contains the audio binary
$response->store('public');  // Store to disk

Voice Options

php
$response = Atlas::speech('openai', 'tts-1-hd')
    ->instructions('This is high quality speech.')
    ->withVoice('nova')
    ->withSpeed(1.2)
    ->withFormat('mp3')
    ->asAudio();

With ElevenLabs

php
$response = Atlas::speech('elevenlabs', 'eleven_multilingual_v2')
    ->instructions('Welcome to the future of AI.')
    ->withVoice('Rachel')
    ->withLanguage('en')
    ->asAudio();

Speed & Language

php
// Adjust playback speed (0.25 to 4.0)
$response = Atlas::speech('openai', 'tts-1')
    ->instructions('Slow and clear narration.')
    ->withVoice('onyx')
    ->withSpeed(0.8)
    ->asAudio();

// Specify language for multilingual models
$response = Atlas::speech('elevenlabs', 'eleven_multilingual_v2')
    ->instructions('Bonjour, bienvenue sur Atlas.')
    ->withVoice('Rachel')
    ->withLanguage('fr')
    ->asAudio();

Speech-to-Text (Transcription)

php
use Atlasphp\Atlas\Input\Audio;

$response = Atlas::speech('openai', 'whisper-1')
    ->withMedia([Audio::fromPath('/path/to/recording.mp3')])
    ->asText();

echo $response->text;  // "Hello, this is the transcribed text..."

Audio Input Sources

php
Audio::fromUrl('https://example.com/audio.mp3')
Audio::fromPath('/path/to/file.wav')
Audio::fromStorage('recordings/meeting.mp3')
Audio::fromUpload($request->file('audio'))

Storing Audio

php
$response = Atlas::speech('openai', 'tts-1')
    ->instructions('Hello world')
    ->withVoice('alloy')
    ->asAudio();

// Store to disk manually
$path = $response->store('public');
$path = $response->storeAs('audio/greeting.mp3', 'public');

Automatic Storage

When persistence is enabled, generated audio is automatically stored to disk and tracked as an Asset record — no manual store() call needed. Access the asset via $response->asset. See Media & Assets for details.

Persisted Asset

When persistence is enabled, generated audio is automatically stored to disk:

php
$response = Atlas::speech('openai', 'tts-1')
    ->instructions('Welcome to Atlas')
    ->withVoice('alloy')
    ->asAudio();

if ($response->asset) {
    $response->asset->path;       // Storage path
    $response->asset->mime_type;  // "audio/mpeg"
    $response->asset->disk;       // Filesystem disk
}

See Media & Assets for the complete storage guide.

Queue Support

php
Atlas::speech('openai', 'tts-1')
    ->instructions('Generate a long audiobook chapter')
    ->withVoice('nova')
    ->queue()
    ->asAudio()
    ->then(function ($response) {
        $path = $response->store('public');
        notify($user, "Audio ready: {$path}");
    });
php
// Transcription in background
Atlas::speech('openai', 'whisper-1')
    ->withMedia([Audio::fromStorage('recordings/meeting.mp3')])
    ->queue()
    ->asText()
    ->then(fn ($response) => Transcript::create(['text' => $response->text]));

xAI Voice Effects

xAI TTS supports expressive speech tags for natural-sounding output. Maximum 15,000 characters per request.

Inline Tags

Add vocal expressions at specific points in the text:

[pause], [long-pause], [hum-tune], [laugh], [chuckle], [giggle], [cry], [tsk], [tongue-click], [lip-smack], [breath], [inhale], [exhale], [sigh]

php
$response = Atlas::speech('xai', 'grok-2-audio')
    ->instructions('Well [pause] that was unexpected! [laugh] Let me think about that.')
    ->withVoice('eve')
    ->asAudio();

Wrapping Tags

Envelope text sections to alter delivery style:

Volume: <soft>, <whisper>, <loud>, <build-intensity>, <decrease-intensity>

Pitch & Speed: <higher-pitch>, <lower-pitch>, <slow>, <fast>

Style: <singing>, <sing-song>, <laugh-speak>, <emphasis>

php
$response = Atlas::speech('xai', 'grok-2-audio')
    ->instructions('<slow><soft>Goodnight, sleep well.</soft></slow>')
    ->withVoice('ara')
    ->asAudio();

// Combine styles
$response = Atlas::speech('xai', 'grok-2-audio')
    ->instructions('This is <emphasis>really important</emphasis> [pause] <whisper>but keep it quiet.</whisper>')
    ->withVoice('eve')
    ->asAudio();

xAI Voices

eve (energetic), ara (warm), rex (confident), sal (smooth), leo (authoritative)

Supported Providers

ProviderTTSSTTFeatures
OpenAItts-1, tts-1-hdwhisper-1Voices, speed, format
ElevenLabseleven_multilingual_v2YesVoices, cloning, languages
xAIgrok-2-audioTTS

Builder Reference

MethodDescription
instructions(string)Text to convert to speech
withMedia(array)Audio files for transcription
withVoice(string)Voice name or ID
withVoiceClone(array)Voice cloning configuration
withSpeed(float)Playback speed multiplier
withLanguage(string)Language code
withFormat(string)Output format (mp3, wav, ogg, etc.)
withVariables(array)Variables for instruction interpolation
withProviderOptions(array)Provider-specific options
withMiddleware(array)Per-request provider middleware
withMeta(array)Metadata for middleware/events
queue()Dispatch to queue
asAudio()Terminal: returns AudioResponse
asText()Terminal: returns transcription

API Reference for SpeechRequest

PropertyTypeDescription
datastringRaw audio binary data
format?stringAudio format (mp3, wav, etc.)
text?stringTranscribed text (STT only)
metaarrayAdditional metadata
asset?AssetLinked asset (when persistence enabled)

Released under the MIT License.