Engineering

The post-launch test plan: what runs forever

Some tests run only at launch; others run forever. The plan is the difference.

Yash ShahApril 15, 20263 min read

Some testing happens at launch. Other testing should never stop. The team that conflates these has either too much CI or too much production exposure.

This is the closing of the testing series. The post-launch test plan is what runs forever.

The forever suite

These tests run in production indefinitely:

Drift detection. Style, accuracy, quality dimensions monitored.
Performance monitoring. Latency and cost tracked.
Cost guardrails. Budgets enforced.
PII detection. Outputs scanned for leaks.
Prompt-injection monitoring. Production traffic sampled for attack patterns.
Eval cadence. Periodic eval-suite runs.

Each runs continuously. Each has alerting on regression.

Cadence

Different tests run at different cadences:

Real-time: cost guardrails, PII detection, prompt-injection detection.
Daily: drift detection trends, eval-suite smoke runs.
Weekly: full eval-suite runs.
Monthly: comprehensive review of trends.

The cadence matches the cost of detection delay for each issue type.

Reviewer ritual

Trends reviewed weekly:

Drift indicators.
Performance metrics.
Cost per call.
Failure patterns.

Significant moves investigate-and-respond.

A real plan

A team's forever-running test plan:

Production sampling: 1% of traffic gets re-evaluated against gold standard.
Cost monitoring: per-feature, daily, alerts on >10% week-over-week movement.
PII scanner: every output, real-time.
Drift catcher: weekly style-eval against the team's voice rubric.
Quarterly eval refresh: golden set updated based on production patterns.

The plan runs without engineering bandwidth. Alerts route to the team. Issues surface fast.

Trade-offs

Forever-running tests cost money and operational attention. The trade-off:

Without them: the team learns about issues from customer complaints (slow, embarrassing).
With them: the team learns from monitoring (fast, professional).

For any feature meaningful to the business, with them is the right choice.

What we won't ship

Features into production without a forever-running test plan.

Test plans that aren't reviewed.

Alerts that nobody acts on.

Static eval sets that don't grow with production patterns.

Close

The post-launch test plan is what makes AI features durable. The plan covers drift, performance, cost, PII, security, eval. Each runs continuously. The team learns about issues fast. The product survives in production over years.

This concludes the testing series. The next series — evals & output testing — covers the eval discipline that powers most of these tests.

The post-launch test plan: what runs forever

The forever suite

Cadence

Reviewer ritual

A real plan

Trade-offs

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors