SpeechASRSDERNLP

English Conversational Speech

Multi-speaker dialogue recordings with speaker diarization and emotion annotations

Languages

English

Pricing

Enterprise

Overview

Natural English conversational recordings featuring spontaneous discussions and formal business meetings. Includes casual dialogues, structured multi-participant meetings, and professional interactions. Features speaker diarization, word-level timestamps, and per-utterance emotion labels. Ideal for training multi-speaker ASR, speaker diarization, meeting transcription, and conversational AI models.

Highlights

  • Speaker diarization with labeled turns
  • Per-utterance emotion labels with confidence scores
  • 18 emotion categories: Joy, Determination, Interest, Calmness, Confusion, and more
  • Word-level timestamps for precise alignment
  • Natural conversation dynamics: turn-taking, interruptions, overlapping speech
  • Diverse contexts: casual discussions and formal business meetings
  • Custom annotations available

Deliverables

Files

WAV audio files (48kHz 24-bit mono), JSON transcripts with speaker diarization labels, Word-level timestamps per speaker, Per-utterance emotion labels with confidence scores

Audio Specs

48kHz sample rate, 24-bit depth, 1152 kbps, mono

Transcription Format

JSON with speaker labels, word timestamps, emotion annotations

Contact us for samples and volume pricing. See our monologue dataset for single-speaker content.