Engineering

MCP testing: harnesses, fixtures, regressions

MCP servers test like any service. The MCP-specific layer is testing the protocol contract.

Yash ShahMarch 6, 20262 min read

A team's MCP server passed all its unit tests. An AI assistant connected and got malformed responses. The server was buggy at the protocol level — schemas didn't match runtime behaviour.

MCP servers need testing at the protocol level, not just the implementation level.

The test plan

For each MCP server:

Unit tests. Each tool's logic.
Integration tests. Each tool against real or stubbed dependencies.
Protocol tests. The server speaks MCP correctly.
Regression tests. Specific behaviours that have shipped before.

Protocol tests

Test the MCP contract:

list_tools returns the expected tools.
Tool schemas match what the implementation expects.
Errors are returned in the expected format.
Stdio/HTTP transport handles edge cases.

These catch issues that unit tests miss.

Reviewer ritual

PR review:

Tests for new tools.
Protocol tests still pass.
Regression tests for fixed bugs.

A real implementation

A team's MCP server testing:

pytest for unit + integration.
A protocol-test harness that connects to the server and exercises the contract.
Regression test added per fixed bug.
CI runs all three layers.

Trade-offs

Protocol tests are slower than unit tests. They run in their own job. The trade-off is catching issues that unit tests can't.

What we won't ship

MCP servers without protocol-level tests.

Tests that don't exercise the actual transport.

Servers that pass tests but fail in real assistant connections.

Close

MCP testing covers all three layers — implementation, integration, protocol. The protocol layer is MCP-specific. Skip it and the server has subtle bugs that surface only in production.

MCP testing: harnesses, fixtures, regressions

The test plan

Protocol tests

Reviewer ritual

A real implementation

Trade-offs

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors