A team had a post-processor that "cleaned" model outputs. It stripped boilerplate, normalised whitespace, and corrected common issues. It also silently dropped fields the model occasionally produced incorrectly. The team thought their model accuracy was 98%. It was 91%; the post-processor was hiding the difference.
Post-processors are useful. Done badly, they hide failures. The discipline is making failures visible while still cleaning up legitimate variance.
The transparency rule
The rule: post-processors transform; they don't hide.
- Whitespace normalisation: yes.
- Case normalisation where appropriate: yes.
- Removing fields that fail validation: log the failure, don't silently drop.
- Replacing wrong values with defaults: log it, surface it as a metric.
The post-processor's actions are auditable. The team can see what it's doing.
Logging discipline
Each post-processor action is logged:
- What input came in.
- What transform was applied.
- What output went out.
For automated transforms (whitespace), the log is minimal. For substantive interventions (replacing values, dropping fields), the log captures the original.
Reviewer ritual
The team reviews post-processor logs:
- What's the team's "save rate" (post-processor saved a bad output)?
- What's the team's "hide rate" (post-processor masked something the team should have seen)?
The team adjusts. Post-processors that were quietly hiding issues get changed to surface them.
Where post-processors go wrong
- Silently dropping invalid outputs. Better: validate, log, retry.
- Replacing wrong outputs with defaults. Better: log the wrongness, decide whether the default is appropriate.
- Coercing types beyond what the schema allows. Better: reject and surface.
- Adding inferred fields the model didn't produce. Worse than wrong output — fabricated.
A real post-processor
A team's setup:
- Schema validation at the boundary.
- For valid outputs: minor cleanup (whitespace, case).
- For invalid outputs: logged, retried, fallback.
- Weekly review of the action log.
Outputs are clean. Failures are visible. Both happen.
What we won't ship
Post-processors that silently drop outputs.
Post-processors without action logging.
Post-processors that don't get reviewed periodically.
"Cleaning" by replacing the model's output with what the team would have written. That's hiding the model's failure mode.
Close
Post-processors are useful when transparent. The discipline is logging actions, reviewing patterns, and treating substantive interventions as failures to investigate, not features to celebrate. The team that knows what its post-processor does ships reliable systems. The team that doesn't ships hidden risks.
Related reading
- Output validation libs — preceding layer.
- Drift catchers — similar visibility discipline.
We build AI-enabled software and help businesses put AI to work. If you're tightening post-processing, we'd love to hear about it. Get in touch.