Engineering

Tests for streaming responses

Streaming responses have their own contract. The tests verify the stream, not just the final text.

Yash ShahMarch 5, 20262 min read

A team's streaming endpoint returned text that was eventually correct but arrived in chunks that broke their UI. The tests checked the assembled final text — clean. The stream itself — broken. Users saw partial words, missing punctuation, jarring transitions.

Streaming has its own contract. The tests verify the stream, not just the destination.

The streaming contract

Things to assert about streams:

Token boundaries. Tokens shouldn't break in the middle of words (when consumer cares about word boundaries).
Latency. First token by X ms; subsequent tokens at acceptable cadence.
Order. Tokens arrive in order.
Completeness. All tokens arrive; the stream completes.
Recoverability. If the stream drops, it can resume or restart.

Assertion patterns

Common streaming-test assertions:

Time-to-first-token under threshold.
Total stream duration under threshold.
Final assembled text matches expected.
Stream has no premature termination.
Stream chunks parse correctly (for structured streaming).

Reviewer ritual

PR review for streaming changes:

Streaming tests included.
Latency assertions verified.
Edge cases tested (slow consumers, dropped connections).

A real test

A team's streaming-test suite:

20 cases asserting time-to-first-token.
10 cases asserting completion.
10 cases asserting structured-stream parsing (tokens that need to be valid JSON when assembled).
5 cases asserting recovery from disconnects.

These ran on every PR for the streaming feature. Caught issues that final-output tests missed.

Coverage

Streaming coverage:

Happy path (typical responses).
Long responses (sustained streaming).
Quick responses (latency edge).
Error mid-stream.
Disconnection mid-stream.

What we won't ship

Streaming endpoints tested only by final-text comparison.

Latency assertions missing.

Streams that don't recover gracefully from interruptions.

Tests that rely on real-time behaviour without timing assertions.

Close

Tests for streaming responses verify the stream itself, not just the assembled output. Latency, order, completion, recovery — each gets asserted. Skip these and the user-facing stream is broken in ways the team didn't catch.

Tests for streaming responses

The streaming contract

Assertion patterns

Reviewer ritual

A real test

Coverage

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors