On Agency and Decision Routing

How judgment, feedback loops, and emergent capability reshape decision-making in AI-enabled organizations

Mar 08, 2026

In my first two posts, I explored two layers of organizational adaptation under AI acceleration. The first was structural. I argued that SaaS organizations are beginning to undergo a kind of molting as role boundaries soften, execution compresses, and the historical shells that once organized software work no longer fit the pace or nature of capability now sitting at the fingertips of talent.

The second was more internal. I argued that under acceleration, organizational memory becomes more of a metabolic concern. Institutions do not metabolize change through structure alone. They metabolize it through memory, or through what they retain, reinterpret, and carry forward. The risk emerging here is a kind of divestiture of organizational knowledge into continuous AI-mediated decision-making, especially where automation or convenience short-circuits traditional routes of contextual judgment and replaces them with decontextualized “good enough” responses at scale.

But there is another layer sitting between structure and memory that feels increasingly important to understand, particularly as reasoning models begin to interact more directly with systems and workflows. That layer is agency. More specifically, it is the question of how agency gets distributed as AI moves from being a passive object of use to an increasingly active participant in the completion and routing of work. This becomes especially consequential in vertical software domains where decision points were never truly binary or deterministic to begin with — put differently, where judgment, interpretation, and contextual tradeoffs were always part of the workflow.

That sentence probably needs immediate qualification, because “agency” is one of those words that expands too quickly if left unattended. I am not using it here in the strongest philosophical sense, as though models possess intention, consciousness, moral standing, or some mystical inner life. I mean something narrower and more organizationally relevant. I mean the practical locus of initiation, routing, interpretation, escalation, and action.

Who, or what, decides what happens next?
What is allowed to interpret signals?
What is allowed to initiate movement across systems?
What is allowed to decide that ambiguity is tolerable, or that it is time to escalate to a human?

In the context of modern organizations, these questions matter more than whether one wants to grant AI “real” agency in some metaphysical sense. If a system can synthesize data, trigger a workflow, and selectively decide when to invoke a human network, then something meaningful about agency has shifted, even if one ultimately insists that humans remain legally, ethically, and institutionally responsible for the result. This is especially true for decisions that involve second- or third-order complexity, non-predictable cognition, and domains where there is neither a high concentration of standardized data nor the institutional scale required to build highly specialized automation architectures in every workflow.

Historically, this question was comparatively simple. Even in highly digitized enterprises, agency was still mostly human-routed. Humans initiated workflows. Humans interpreted context. Humans decided when a deviation mattered, when a supplier looked risky despite technically compliant documentation, when a formulation result was interesting enough to pursue, or when a quality signal merited waking up the right person. Software systems, even quite sophisticated ones, were generally subordinate in this arrangement. They stored artifacts. They executed deterministic logic. They preserved outcomes. They did not meaningfully participate in deciding which latent possibilities within the system were worth pursuing.

There are, of course, important exceptions. Data science and machine learning have long supported higher levels of automation through pattern recognition in domains such as credit risk or autonomous driving. But those are generally environments with enormous scale, highly specialized training regimes, and substantial confidence-building infrastructure. Many of the workflows now being automated with reasoning models are different. They are not narrowly machine-learned from massive scale in the classical sense. They rely more on probabilistic reasoning applied to messy, judgment-heavy contexts. In that sense, what is emerging here is not just automation. It is applied judgment at scale.

It is therefore incredibly important that companies have insight into how and why agents are making decisions, including what tools they are using and in what way. Otherwise, organizations are not really automating judgment so much as deferring it to probabilistic systems without meaningful oversight.

This distinction matters. Traditional enterprise software could contain an extraordinary amount of logic and still not really alter the locus of agency. A C# application with hundreds of layers of hardcoded business rules could encode institutional preferences, guardrails, and compliance logic, but the system still operated inside a narrow corridor of explicit design. It did what it had been told to do. It did not meaningfully discover new affordances inside its own architecture. The organization still supplied the routing intelligence.

That is part of what is beginning to change.

As reasoning models improve and, more importantly, as they gain access to tools, the system begins to participate in the routing layer itself. It no longer merely answers questions about work. It starts to influence the sequence of work. That influence may be modest at first. A model retrieves a set of documents, synthesizes the signal, and recommends a next step. But even that is already different from the historical pattern, because the first-order interpretive layer is no longer exclusively human. And once the model can call tools, query systems, invoke downstream workflows, extract data from documents, chain actions, or escalate selectively, the pattern becomes more consequential.

At that point, the workflow is no longer best described as human → system → human. In many cases, it is more accurate to describe it as AI → system → AI, or AI → human → system, or even AI → system → human → AI, with the human entering only where ambiguity, risk, or policy requires it.

In my world, for example, in some cases the human may disappear from the immediate loop altogether. A model analyzes incoming risk data and triggers a deterministic risk workflow before any operator reviews it. A document extraction pipeline interprets supplier data and initiates compliance checks. A system classifies signals, sequences actions, and then returns to the model for further synthesis. None of this requires a science-fiction notion of AI personhood. It only requires accepting that the practical routing of work is no longer fully human-mediated.

That is why I think the most interesting organizational shift is not simply that AI helps people think faster, but that AI is beginning to sit in the place where people historically mediated between systems, data, and each other.

Put more plainly, a new intermediary has entered the organization, and in many contexts it is becoming the default intermediary because it is simply more efficient and good enough, or presumed good enough.

That efficiency is not incidental to the risk. It is the risk. When something becomes the path of least resistance for interpretation and action, it quietly absorbs authority whether or not the organization has formally granted it.

The deeper problem is that once this intermediary becomes efficient enough, “probably good enough” can quietly become operationally sufficient. In systems with large action surfaces, large decision sets, and relatively small human oversight layers, there is often no efficient way to validate every judgment. Precision can degrade not because anyone explicitly accepted lower standards, but because the scale of action outpaces the scale of review. In regulated environments, that is not a trivial tradeoff. It creates a real need for guardrails, recursive QA mechanisms, and much better visibility into how and why agentic systems are making decisions.

This is one of the reasons I am increasingly interested in the concept of decision routing rather than the more generic language of assistance or copiloting. “Copilot” is too flattering and too vague. It implies a bounded and intelligible division of labor. But in practice, what we are seeing in many environments is less tidy. The model does not just sit beside the operator waiting to be consulted. It increasingly sits between people and systems, between systems and systems, and sometimes between people and each other. The model becomes part of the routing fabric through which knowledge, work, and judgment move.

Once that happens, the architecture of tools becomes inseparable from the architecture of agency. Whatever tools an agent can call defines, in practical terms, the surface area of its possible action. The tools are not just conveniences. They are the affordances from which latent capability emerges.

This is where some of our work in Formulation AI has become especially interesting to me. As we have been testing agents with tools, one of the more useful instincts has been not simply to validate the workflows we explicitly designed, but to pressure test what the system might do that we did not explicitly solve for.

That sounds trivial when stated abstractly, but it represents a very different epistemic posture than traditional enterprise software testing. Instead of asking whether a known function works correctly, the question becomes what general capabilities emerge when a very strong reasoning system is given access to these tools, data structures, and operational surfaces. I increasingly think these frontier models need safe environments in which both opportunities and risks can be explored before they are normalized into production behavior.

A simple example would be asking the system to find all ingredients in a recipe rather than naming one explicitly. Perhaps everything we originally built assumed single-ingredient replacement. But we never explicitly limited the agent in that respect. Once the tools are available, the scope of what the system can attempt becomes only partially deterministic.

In our case, superficially this looks like a small prompt variation. In practice, it is probing several deeper things at once. Can the system infer the structure of the recipe? Can it generalize beyond a named entity to a class of entities? Can it chain retrieval, parsing, and synthesis without being told exactly how? Can it use the tool environment as a substrate for abstraction rather than just execution? These are not just product questions. They are questions about the extent to which the system is behaving as a deterministic wrapper around known logic versus a reasoning actor operating within a data-rich and affordance-rich environment.

We have observed second-order behaviors as well. In some cases, the agent begins offering follow-on tasks for which it does not actually possess the tools — for example, suggesting that it contact a supplier and retrieve samples. Even where the action cannot yet be executed, the system is already reasoning one step beyond its formal affordance surface. That is useful as a signal, but it also reinforces the point: intended capability and emergent capability are not the same thing.

In classic software, the scope of meaningful behavior was largely bounded by what had been explicitly engineered. With tool-using agents, it becomes increasingly possible for the system to exhibit useful behaviors that no one directly specified but that nevertheless become possible because of the interaction between model capability and tool surface. That is exciting, obviously. It is also a governance problem.

Once a capability can emerge from architecture rather than from a product requirement, then the question of what the system can do becomes partly empirical. One has to discover the capability surface, not just design it.

This is where I think a lot of current discourse still undershoots the real issue. Much of the debate about AI in organizations still assumes a relatively stable relationship between task, workflow, and system. Either the system can do the thing or it cannot. Either the human retains control or the model is “autonomous.” But in practice, a more interesting and difficult reality is emerging.

The system may be able to do something useful that no one explicitly planned for, but only under certain conditions, and only if the right tools are exposed, and only if the organizational tolerance for emergent action is sufficiently high. That is not a classic automation problem. It is closer to capability cartography. One is mapping the boundary of delegated agency inside a sociotechnical environment.

At that point, the governance questions become unavoidable.

When should the system be permitted to act?
When should it escalate?
When should it recommend?
When should it be prohibited from generalizing?
Who owns the risk when a system-initiated action was logically available but organizationally undesirable?
Who decides whether an emergent capability should remain exploratory, become operationalized, or be actively suppressed?

These are not just questions of safety or compliance. They are questions of institutional design.

They also require visibility. If organizations cannot inspect how agents are arriving at decisions, which tools they are invoking, and what reasoning paths they are implicitly following, then they are not really automating judgment. They are deferring it to probabilistic systems without meaningful oversight.

This is also where the relationship to the previous essay on memory becomes more than thematic. Agency and memory are not separate topics. They are entangled. Every routing decision has memory consequences. Every time an agent handles a task without human involvement, the organization gains efficiency but risks losing an opportunity for articulated human interpretation.

Every time an agent escalates to a human, the system has a chance to absorb not just an answer but a rationale. The real question is whether the architecture captures that rationale in a way that compounds institutional knowledge, or whether the insight disappears into ephemeral conversation and the system remains permanently dependent on intermittent human rescue.

In that sense, agency design and memory design are mutually reinforcing. The more the system routes, the more critical it becomes to determine what happens when the routing hits a human. Is that human reasoning merely instrumental, used to get through the moment? Or does it become part of the institution’s evolving memory substrate? If the latter, then the system can begin to compound judgment over time. If the former, then the system may accelerate output while quietly hollowing out the very context that made good decisions possible.

This is why I do not think the real question is whether AI will replace human judgment. That frame is too blunt, and in practice it obscures the more interesting design problem. The better question is where human judgment should sit inside increasingly automated routing systems.

Retrieval, synthesis, pattern detection, extraction, and workflow initiation can increasingly be handled by systems. Interpretation, exception handling, risk calibration, political sensitivity, patience, restraint, and broader contextual perspective remain much harder to formalize. It is not that humans should keep all the interesting work, nor that systems should inherit all the repetitive work.

It is that the organization has to decide where judgment has the highest leverage, and then architect feedback loops so that when judgment is invoked, the resulting insight does not vanish.

That is also why I think the language of “human in the loop” is starting to become insufficient. In some environments, especially highly automated ones, the human will not be continuously in the loop. The more useful question is whether the human is appropriately in the routing architecture, and whether the system knows when to surface embedded institutional knowledge, when to route to a human network, and when to capture the resulting reasoning back into memory.

In these environments, organizational performance may depend less on knowing who to ask and more on how well routing decisions are designed across humans and systems. That is not simply a retrieval problem. It is an agency design problem.

And once agency becomes distributed in this way, another process begins to matter. Some human-system arrangements will prove reliable and scalable. Others will prove fragile, noisy, or deceptively competent. Some patterns of routing will become institutionalized. Others will be discarded. In other words, once agency becomes hybrid, selection begins.

That is probably the next layer of the story. For now, what seems most important is recognizing that AI is not merely making individual workers faster. It is beginning to change how decisions move, how authority gets exercised, and how action gets initiated inside organizations.

The companies that navigate this well will not simply be those that automate the most tasks. They will be the ones that understand where agency is shifting, where it should remain human, where it can safely become systemic, and how to ensure that the resulting flows of judgment, action, and memory actually compound over time.

John's Substack

Discussion about this post

Ready for more?