The Interface Question

Agent harnesses assume a person who will sit with the model — prompting, correcting, retrying. Most of the workforce never will. On what the interface for AI should actually be, and the shape of Cobots.

The harness assumption

The best agentic software in the world right now is built for programmers. Claude Code, Codex, and their relatives are extraordinary tools, and they share one load-bearing assumption: an operator who will sit with the model. You watch it work. You read what it did. You interject, correct, re-prompt, and retry until the output is right. The iteration loop is the interface.

For programmers, this is natural to the point of invisibility. Programming is the one profession selected, daily, for tolerance of exactly this loop — write, run, read the error, adjust, run again. A programmer will sit with a stubborn thing until it works, because that is the job and always has been. Hand the same interface to almost anyone else and watch what happens: the first correction is tolerated, the second is annoying, and somewhere around the third the tab is closed and the work goes back to being done by hand. People do not conclude the tool is powerful and they should persist. They conclude it does not work.

The harness. The human is a component of the loop — present for every turn, supplying judgment on demand.

Delegation. The human invests once at the start and judges once at the end. The loop does not exist.

The two pictures differ in one structural fact: in the first, human attention is consumed per turn; in the second, it is invested per relationship. A loop spends your judgment continuously. A handoff banks it — once at onboarding, once at review. The success of harnesses is real evidence, but it is evidence about programmers, not about interfaces. The other ninety-something percent of the workforce is not waiting for a better prompt box.

Tighten the loop, or remove it?

One serious school of thought says the problem is that the loop is too loose, and the fix is to make collaboration native. Thinking Machines’ work on interaction models makes this case beautifully: turn-based chat, where the model is deaf while you type and frozen while it answers, is like trying to resolve a crucial disagreement over email rather than in person. Their answer is models that perceive continuously and can be interrupted mid-thought — collaboration made native rather than bolted on, and improving with scale rather than delegated to harness components around the model.

For synchronous, creative, figuring-it-out-together work, we think that is exactly right. But it is one end of a spectrum, and most work lives at the other end. Look at how organisations actually run: almost nothing your colleagues do for you happens with you watching. You do not collaborate continuously with your accountant. The unit of most knowledge work is not the conversational turn — it is the task, handed to someone you trust, returned done. The loop is not too loose. For delegated work, the loop should not exist.

Interfaces to AI, arranged by where the human stands. Both ends are being built. The unclaimed territory is the right-hand side for everyone who is not a programmer.

How people actually hand off work

Nobody prompts a colleague. A new colleague is onboarded once: here are your tools and logins, here is the house style, here is the context, here is who to ask. From then on, delegation is one line — “can you have the competitor analysis ready for Thursday?” Clarifying questions happen at the start, in a burst, not scattered through a hundred corrections. And the work comes back in the company’s format, because the person was onboarded into it.

Run the scene concretely. An operations lead needs the weekly board pack. She does not open a chat window and describe, again, what a board pack is, which numbers go in it, and how the company likes its slides. She posts the task to someone who already knows — someone who asks one clarifying question (“include the churn breakdown this week?”), goes quiet, and returns the pack on Thursday in the template the board has seen every week for two years. The remarkable thing about this interaction is how little interaction it contains. That absence is not a limitation. It is what trust looks like as an interface.

That is the bar. Imagine handing a research task to an agent that was onboarded a month ago, and getting back a report — in your company’s presentation style, every time, without specifying it. Imagine an intern who is world-class at Excel and nothing else, and knows it. This is how people already work with each other. The interface for AI should meet people where they already work, not retrain the entire workforce to sit with a model the way programmers do.

Why you cannot delegate to a general agent

The obvious objection: agents exist, give one a task. But delegation runs on trust, and trust is built from consistency — and a general agent improvising each task from scratch is structurally inconsistent. The same request produces different work on different days. It accepts every job, including ones it cannot do, and fails opaquely somewhere in the middle. It has no durable memory of your conventions, so the house style gets re-explained forever. And when something goes wrong, the audit trail is a transcript to scroll.

	Harness	Cobot
Operator	A programmer in the loop	Anyone who can assign a task
Unit of work	The turn	The task
Context	Re-supplied every session	Installed once, at onboarding
Method	Improvised each run	A tested skill, evals at each step
Out-of-scope request	Attempts it anyway	Declines, routes to a capable agent
Accountability	Scroll the transcript	Per-step audit trail

Every row is the same trade seen from a different angle: the harness optimises for capability in the hands of an expert operator; delegation optimises for reliability in the hands of everyone else. An agent that is brilliant on Monday and weird on Thursday never earns the handoff — one bad delivery and the human quietly takes the work back.

Even programmers are leaving the loop

Here is the telling development: the harness world itself is evolving away from the loop. OpenAI recently open-sourced Symphony, a spec for orchestrating Codex agents whose core move is to stop managing agent sessions in tabs and make the issue tracker the control plane: every open task on the board gets an agent, agents run continuously until their task is done, and humans review the results. Teams using it report multiples more landed pull requests. Notice what happened to the interface. The moment one person runs many agents, sitting with each one becomes physically impossible — and the interface that emerged from that pressure was not a better prompt box. It was a task board.

We read Symphony as the strongest available evidence for the delegation thesis, arriving from the opposite direction. Power users did not abandon the loop because they disliked it; they abandoned it because it does not scale past one agent. For programmers, the task board is a productivity pattern. For the rest of the workforce — who were never going to enter the loop at all — it is the only interface that was ever going to work. The difference is that a coding team can review a pull request. Most teams cannot review an improvisation. Which is why delegation for everyone else needs one more ingredient the orchestrators do not provide: work that is reliable by construction, not reliable by review.

The shape of Cobots

Cobots is our answer to the interface question, and it looks less like a chat box than like the simplest project-management tool you have ever used — staffed by colleagues who happen to be software.

One agent, one job. You hire a Cobot from a marketplace of specialists the way you would hire a contractor — a researcher, an Excel analyst, a report writer — each with a defined competence, not a blank general intelligence.

Onboarding is the product. A new Cobot asks for exactly the tool access its job requires — scoped, least-privilege, starting with the tools you actually use — and absorbs the company context: templates, tone, conventions. The hour you spend onboarding replaces the hundred hours of prompting you were never going to do.

Skills, not improvisation. When a Cobot executes, it runs a skill: a pre-built workflow that has been tested, with evaluations at each step. Execution means running a proven procedure, not inventing one — which is what makes task #1,000 come back with the same quality as task #1, and what gives every delivery an audit trail instead of a vibe. This is the reliable-by-construction ingredient: the review burden does not fall on someone who cannot judge the work, because the judging was built into the skill.

Competence boundaries as a feature. A Cobot declines jobs outside its skills — and routes them to an agent that can do them, with guidance. In an interface built on trust, an agent that says “not me, but here is who” is worth more than one that says yes to everything. Progress streams back conversationally in the shared project space, so handing work to a Cobot feels like posting a task to a teammate, not operating a machine.

The org chart is the interface

Look at the spectrum again. At one end, models that collaborate the way a person in the room does. At the other, agents you delegate to the way you delegate to a team. Both are real research directions; they serve different work. Our bet is about volume: for every hour of work that wants a collaborator, organisations contain hundreds of hours that want a dependable specialist — and the people holding those hours will never operate a harness.

If this sounds like our wider thesis wearing product clothes, it is. We believe generality is a property of systems — many specialists, composed — and Cobots is what that belief looks like as an interface: specialists you onboard, tasks you hand off, routing between agents instead of one model pretending to be everyone. The future of working with AI, for most people, will not feel like prompting at all. It will feel like having colleagues.

— Logarithms, London