Engineering

Output post-processors that don't hide the truth

Post-processors clean the model's output. Done badly, they hide failures. Done well, they make the failures visible.

Yash ShahMarch 6, 20263 min read

A team had a post-processor that "cleaned" model outputs. It stripped boilerplate, normalised whitespace, and corrected common issues. It also silently dropped fields the model occasionally produced incorrectly. The team thought their model accuracy was 98%. It was 91%; the post-processor was hiding the difference.

Post-processors are useful. Done badly, they hide failures. The discipline is making failures visible while still cleaning up legitimate variance.

The transparency rule

The rule: post-processors transform; they don't hide.

Whitespace normalisation: yes.
Case normalisation where appropriate: yes.
Removing fields that fail validation: log the failure, don't silently drop.
Replacing wrong values with defaults: log it, surface it as a metric.

The post-processor's actions are auditable. The team can see what it's doing.

Logging discipline

Each post-processor action is logged:

What input came in.
What transform was applied.
What output went out.

For automated transforms (whitespace), the log is minimal. For substantive interventions (replacing values, dropping fields), the log captures the original.

Reviewer ritual

The team reviews post-processor logs:

What's the team's "save rate" (post-processor saved a bad output)?
What's the team's "hide rate" (post-processor masked something the team should have seen)?

The team adjusts. Post-processors that were quietly hiding issues get changed to surface them.

Where post-processors go wrong

Silently dropping invalid outputs. Better: validate, log, retry.
Replacing wrong outputs with defaults. Better: log the wrongness, decide whether the default is appropriate.
Coercing types beyond what the schema allows. Better: reject and surface.
Adding inferred fields the model didn't produce. Worse than wrong output — fabricated.

A real post-processor

A team's setup:

Schema validation at the boundary.
For valid outputs: minor cleanup (whitespace, case).
For invalid outputs: logged, retried, fallback.
Weekly review of the action log.

Outputs are clean. Failures are visible. Both happen.

What we won't ship

Post-processors that silently drop outputs.

Post-processors without action logging.

Post-processors that don't get reviewed periodically.

"Cleaning" by replacing the model's output with what the team would have written. That's hiding the model's failure mode.

Close

Post-processors are useful when transparent. The discipline is logging actions, reviewing patterns, and treating substantive interventions as failures to investigate, not features to celebrate. The team that knows what its post-processor does ships reliable systems. The team that doesn't ships hidden risks.

Output post-processors that don't hide the truth

The transparency rule

Logging discipline

Reviewer ritual

Where post-processors go wrong

A real post-processor

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors