Skip to content

stt

stt

STT evaluation client — httpx + httpx-ws + httpx-sse under the hood.

AudioEncoding

Bases: StrEnum

Wire encoding for WebSocket audio frames.

STTResult dataclass

STTResult(
    hypothesis_text: str = "",
    text_metrics: TextMetrics | None = None,
    latency_ms: float = 0.0,
    chunks_received: int = 0,
    fragments: list[str] = list(),
)

STT evaluation result with optional metrics.

assert_quality

assert_quality(
    *, max_wer: float = 0.2, max_cer: float = 0.15
) -> Self

Assert STT quality. Chainable.

compute_metrics

compute_metrics(reference: str) -> Self

Compute WER/CER against reference. Chainable.

STTSession

STTSession(
    *,
    session: AsyncWebSocketSession,
    sample: AudioSample | None,
)

Active WebSocket session for STT evaluation.

send_bytes async

send_bytes(data: bytes) -> None

Send binary audio data.

send_text async

send_text(data: str) -> None

Send text (JSON config, END_OF_AUDIO, etc.).

send_sample async

send_sample(
    sample: AudioSample,
    *,
    chunk_ms: int = 200,
    encoding: AudioEncoding = AudioEncoding.FLOAT32,
) -> None

Stream sample in chunks with realistic pacing.

encoding controls wire format

FLOAT32 → binary frame, raw float32 (default) PCM16 → binary frame, raw int16 PCM16_BASE64 → text frame, base64-encoded int16

receive_text async

receive_text(*, timeout: float | None = None) -> str

Receive text frame and accumulate as fragment.

receive_bytes async

receive_bytes(*, timeout: float | None = None) -> bytes

Receive binary frame.

result

result() -> STTResult

Build STTResult from accumulated fragments.

STTClient

STTClient(*, url: str, timeout: float = 30.0)

STT evaluation client — HTTP batch + WebSocket streaming.

post async

post(
    *, data: bytes | None = None, **kwargs: Any
) -> httpx.Response

Batch POST audio to STT endpoint (e.g. OpenAI Whisper API). Returns raw httpx.Response.

stream async

stream(
    *, data: bytes | None = None, **kwargs: Any
) -> AsyncIterator[httpx.Response]

Chunked streaming POST. Yields httpx.Response for aiter_bytes/aiter_lines.

sse async

sse(
    *, data: bytes | None = None, **kwargs: Any
) -> AsyncIterator[EventSource]

SSE streaming POST. Yields EventSource for aiter_sse().

ws async

ws(
    *, sample: AudioSample | None = None, **kwargs: Any
) -> AsyncIterator[STTSession]

Open WebSocket session for STT streaming (e.g. WhisperLive).

aclose async

aclose() -> None

No-op — clients are created per-call.