enterprise GradeSpeech

English Conversational Speech

Stereo multi-speaker dialogue recordings with L/R speaker separation and emotion annotations

Languages

English

Quality Check

100% Verified

Overview

Natural English conversational recordings featuring spontaneous discussions and formal business meetings. Stereo recordings with dedicated left/right audio channels per speaker for clean speaker separation—recorded via LiveKit with isolated per-participant tracks. Includes casual dialogues, structured multi-participant meetings, and professional interactions across diverse topics. Features speaker diarization, word-level timestamps, and per-utterance emotion labels across 18 categories. Ideal for training multi-speaker ASR, speaker extraction, voice cloning, meeting transcription, and conversational AI models.

Key Highlights

Stereo speaker separation: L/R channel isolation for perfect speaker extraction

Per-utterance emotion labels with confidence scores

Technical Specifications

Files

Stereo WAV files with L/R speaker separation (48kHz, 16-bit), JSON transcripts with speaker diarization labels, Word-level timestamps per speaker, Per-utterance emotion labels with confidence scores

Audio Specs

48kHz sample rate, 16-bit depth, 1536 kbps, stereo with L/R channel separation

Transcription Format

JSON with speaker labels, word timestamps, emotion annotations