Agent UX mirrors model limitations

Agent UX tends to signpost model limitations and is heavily shaped by technical constraints.

We stream responses token by token because inference can be slow. We accept unpredictability because varying batch size introduces nondeterminism. Latency and stochasticity shape the experience in quite significant ways.

But now inference hardware is improving, and research is pushing toward reproducible, batch invariant execution. If models became consistently sub-second and deterministic, then streaming cursors and “thinking…” animations wouldn’t have the same utility as today.

If agents responded to complex tasks in a quick and repeatable way, like a local function call, it would enable instant feedback loops. Agent UX would cater for flow, not user patience!

But all that said… if this did happen, would we just come back to add artificial pauses, or prefer higher temperatures, so that responses feel considered and creative rather than mechanical? For typical GUI applications you would always prefer speed and predictability, but conversational interfaces are a different style of interaction. A question I’m sure conversation designers will be pondering 🤔