Why JSON breaks LLM streaming

Why I cared about streaming in the first place

In my AI site editor, the planner output is a JSON object: a list of operations like update_block, add_block, move_block. The orchestrator validates them, applies them in order, and the site updates.

If I wait for the model to finish before parsing, the user stares at a spinner for two to eight seconds. If I stream and apply ops as soon as they’re complete, edits start landing on the page within ~800ms. That difference is the entire feel of the product. So I wanted streaming. The problem is JSON wasn’t designed for it.

What “streaming JSON” actually looks like on the wire

Anthropic’s tool-call API delivers tool inputs as partial_json deltas — fragments of the JSON serialization, in order, with no guarantee that any particular fragment ends on a token boundary. A typical sequence looks like this:

{"ops":[{"op":
"update_props","blockId":"hero-1","props":{"title":"Hello
 world"}}

At every point before the very last delta, the buffer is invalid JSON. JSON.parse throws. The SDK itself sometimes throws on message_stop if the final assembled buffer doesn’t parse — which is its way of telling you the model truncated mid-token.

So you can’t naively parse the whole buffer. You also can’t parse each delta on its own. You need something in between.

The failure modes I actually hit

Once I started capturing real production traces, the same handful of failure shapes kept showing up:

Bare minus signs. The model emits - as the start of a number, then the stream cuts off. - alone isn’t valid JSON.
Unescaped control characters inside strings. Models with eager input streaming sometimes drop a literal \n or \t straight into a string value, which JSON forbids. This happens constantly when the LLM is writing markdown body copy.
Unclosed brackets and strings. Stream ends mid-array, or mid-string, with no closing " ] }.
Markdown bullets in arrays. The model occasionally writes ["- one", "- two"] as [- "one", - "two"] because its prose habits leak into its serialization.
Trailing commas. [item1, item2,] shows up just often enough to need handling.

None of these are exotic edge cases. Together they accounted for most of the parse errors I was logging.

The pattern that actually works

I ended up with a two-layer approach.

Layer one: extract completed ops from a still-streaming array. As partial_json deltas arrive, I append them to a per-tool buffer and run a small extractor that walks the buffer looking for completed { … } entries inside the ops array. Each completed entry gets emitted downstream and applied immediately, regardless of whether the rest of the JSON is valid yet. The user sees ops land progressively while the model is still typing the next one.

Layer two: a repair pipeline for the final buffer. When the stream ends — successfully or with the SDK throwing on message_stop — I run the buffer through repair strategies in order: escape unescaped control characters inside strings, fix markdown-bulleted arrays, drop trailing commas, replace bare - with 0 outside of strings, close unclosed brackets and strings, and as a last resort truncate at the last valid array boundary. The first strategy that produces something parseable wins, and I log which one fired so I can see drift over time.

The combination handles roughly 99% of the malformed payloads I see in production. The remaining 1% gets surfaced to the user as “the model returned something I couldn’t parse — try rephrasing,” which is the right failure mode.

Takeaway

Strict JSON output is a great contract for a finished response. It’s a terrible contract for a stream. If you want fast UX from a streaming planner, treat the model output as a tolerant byte stream and only require strict JSON at the moments when you actually need to make a decision — when emitting a completed op, or when finalizing.

The operational lesson is simpler: don’t treat tool-call JSON parse errors as fatal. Treat them as the default. Build a buffer, build an extractor, build a repair pipeline, and log which repair fired. The reliability you get is worth more than the formatting purity you give up.