Skip to content

registry

registry

Sample catalog and ground-truth registry.

SampleLang

Bases: StrEnum

Supported sample languages.

AudioSample dataclass

AudioSample(
    name: str,
    lang: SampleLang,
    reference_text: str,
    audio_path: Path,
    sample_rate: int = 16000,
    duration_ms: int = 0,
)

Ground-truth pair: audio + expected transcription.

audio_bytes

audio_bytes() -> bytes

Raw file bytes.

audio_numpy

audio_numpy() -> np.ndarray

Load as float32 numpy array.

chunks

chunks(chunk_ms: int = 200) -> list[bytes]

Split audio into float32 PCM chunks for streaming. O(n/chunk_size).

chunks_pcm16

chunks_pcm16(chunk_ms: int = 200) -> list[bytes]

Split audio into PCM16 (int16) chunks for streaming. O(n/chunk_size).

SampleRegistry

SampleRegistry()

Catalog of embedded audio fixtures. O(1) lookup by name.

__getattr__

__getattr__(name: str) -> AudioSample

Attribute-style access: samples.en_hello_world.

all

all() -> list[AudioSample]

All registered samples.

by_lang

by_lang(lang: SampleLang) -> list[AudioSample]

Filter samples by language. O(n).

register

register(sample: AudioSample) -> None

Register custom project-specific samples.