Enterprise AI

AI Does Not Work Well Inside Human-Designed Workflows

AI-native workflow design is not about replacing human tasks with AI tasks. It is about redesigning information flow, judgment units, verification points, and decision records so AI can perform consistently.

Field note

I have been redesigning part of my own work around an AI-native operating model. That experience changed how I think about enterprise AI.

The first question many organizations ask is natural: which parts of the existing human workflow can AI automate?

A person reads information, summarizes it, classifies it, judges whether it matters, writes a memo, and passes it to the next step. If that is the workflow, the obvious move is to ask AI to do some of those tasks.

That can help. Work gets faster. Drafts appear. Repetitive work becomes less painful.

But when I started designing the workflow with AI in mind from the beginning, a deeper problem became visible.

A workflow that is easy for humans to read is not always a workflow where AI can perform well.

Humans can handle a lot of mess. We can look at a loose bundle of documents, notes, conversations, and old context, and still infer what probably matters. We remember what we have seen before. We can sense that something is similar to a previous case, or that a weak signal may matter in a different context, even when the data is not labeled cleanly.

AI does not carry that kind of implicit operating context unless the workflow gives it the context explicitly.

If we insert AI into a workflow that was designed around human intuition, the AI inherits the ambiguity. The output may still look polished, but the process underneath can be unstable. The AI may make different assumptions from one run to the next. The review burden may not fall very much. In some cases, human review becomes more fragmented because the human now has to supervise many small AI steps.

That was the main lesson for me.

AI-native workflow design is not about replacing human tasks with AI tasks. It is about redesigning the flow of information, the unit of judgment, the points of verification, and the way decisions are recorded so that AI can perform consistently.

The wrong version of "human in the loop"

The problem often appears when an organization keeps the existing human workflow intact and gives one piece of it to AI.

Meetings, reports, spreadsheets, market notes, research memos, internal comments, external documents. These are natural units for humans. They are easy to name, easy to browse, and easy to discuss.

But they are not always the right units for AI.

For AI, the important units are often more granular: input, question, allowed context, source, evidence, output format, escalation condition, confidence, next action, and record of judgment.

Without that design, AI will still produce something that looks complete. That is part of the risk. The output can look finished even when the underlying process is not reliable.

This is why the phrase "human in the loop" is not enough.

If the loop is poorly designed, the human is not really governing the workflow. The human becomes a constant proofreader, correction layer, and instruction giver. Instead of reducing work, AI creates a new burden: checking every intermediate step.

The better question is not only where to put a human in the loop. The better question is what the human needs to see in order to make a good decision.

If the workflow preserves AI-to-AI checks, decision history, evidence, exceptions, and the path from input to output, the human can review the final output and the process behind it. The human can make a GO / NO GO decision instead of manually steering every small step.

That distinction matters. A bad AI workflow asks humans to keep rescuing the process. A better AI-native workflow gives humans enough traceability to govern the process.

Five design principles

At this stage, I think AI-native workflow design requires at least five elements.

The first is input decomposition. AI should not receive a vague bundle of information and be asked to "make sense of it." Source, metadata, context, instruction, expected output, and allowed use should be separated. Even this alone can reduce output variance.

The second is state management. Humans can remember what has already been reviewed, rejected, deferred, or connected to another theme. AI needs that state to be explicit. Otherwise the workflow keeps rediscovering the same material.

The third is verification. It is not enough for AI to produce an answer. The workflow needs to show what the answer depends on. In research, advisory, compliance, and investment-adjacent work, facts, inference, and opinion need to be separable.

The fourth is escalation. Not every input should be treated with the same weight. Some items can be summarized automatically. Some need additional checks. Some should stop the workflow. High-impact judgments, weak-but-relevant signals, low-confidence outputs, and regulated decisions should return to humans.

The fifth is memory. If an AI workflow cannot preserve why a judgment was made, it remains a temporary assistant. If it can preserve the trace of judgment, it can become part of organizational learning.

This is the difference between using AI inside a workflow and designing the workflow for AI.

AI-friendly information is not the same as human-friendly information

The same issue appears in database design.

For a human-facing database, a title, status, tag, date, owner, and memo may be enough. A person can look at a Notion database and infer what the page is, where it sits, and who should look at it next.

But when one AI reads the database, another AI processes the next step, and another AI reviews the result, the meaning of the database changes.

It is no longer only a table for humans to browse. It becomes an interface through which AI systems pass work state to each other.

That changes the design problem.

Human-friendly databases help people find information. AI-friendly databases help machines avoid the wrong next action.

In other words, AI-friendly information is information that tells AI what it may do next, what it must not do, which evidence it can use, when it should escalate to a human, and why a prior judgment was made.

That means clean prose is not always the most important thing. Separation of meaning matters more.

Is this a fact or an interpretation? Is it a primary source or a view? Is it confirmed, tentative, deferred, or rejected? Why was that judgment made? What is the next allowed action? What is outside the AI's authority? Where is human confirmation required?

Humans organize information to read it. AI needs information so that it can reuse it in the next step.

If AI participates across multiple steps, the database stops being just a repository. It becomes a handoff note, an instruction layer, an exception handling system, and a decision log. In that sense, database design starts to look much more like workflow design.

One unintuitive point is that rejected or deferred information can matter. Human information management often deletes what is no longer useful. But if AI is used continuously, the reason something was ignored, rejected, or deferred can improve future judgment. The negative decision is also part of the organization's memory.

Before asking AI to read more information, the organization should ask whether the information is held in a form AI can act on responsibly.

What financial institutions are already showing

This shift is visible in how large financial institutions talk about AI.

JPMorgan Chase wrote in its 2023 annual report that it had more than 2,000 AI and machine learning experts and data scientists, with more than 400 AI use cases in production across areas such as marketing, fraud, and risk. The more important point is not the number of use cases. It is the bank's suggestion that generative AI could reimagine entire workflows.

That is a different framing from simple task automation.

Morgan Stanley's AI work in wealth management points in the same direction. Axios reported in 2024 that the firm's advisor-facing AI assistant was grounded in internal data, separated information that could be used internally from information that could be shared with clients, and attached citations to answers. The same report noted that the tool required months of tuning and more than 20,000 pieces of advisor feedback.

That is not just a chatbot story. It is a workflow story: content, feedback, citations, governance, review, and permissioning are being arranged so that AI can operate inside a regulated advisory environment.

Technical research is moving in the same direction. A 2025 paper on agentic workflows and enterprise APIs argues that many enterprise APIs were designed for deterministic, human-led interactions, not for dynamic, goal-oriented AI agents. A 2026 paper on LLM agents argues that reliability does not come only from model quality, but also from specialized agents, deterministic tools, and structured coordination.

In finance, this matters because the organization must be able to explain both the output and the process behind the output. The SEC's 2024 actions against two investment advisers for allegedly misleading statements about AI use are a reminder that the issue is not whether financial institutions should avoid AI. The issue is whether they can describe and govern how AI is used.

How I see it

The first wave of generative AI adoption was naturally tool-centric. Which model should we use? Which AI assistant should employees get? Which vendor has the best interface? Which department can reduce hours?

Those questions still matter. But they are no longer the most important questions.

My view is that the focus of AI adoption in financial institutions will move from deploying AI tools to converting unstructured information into operating protocols that AI can handle.

This is especially important in finance because valuable work is often not cleanly structured. Investment research, manager monitoring, private market sourcing, client proposal preparation, risk review, market intelligence, and regulatory response all mix documents, conversations, signals and noise, exceptions, prior context, judgment, and memory.

That is why these areas are promising for AI, and also why naive automation can fail.

If a bank, insurer, asset manager, or institutional investor asks AI to imitate only the visible work of an analyst, summaries and drafts will get faster. But that is only the entry point.

The more important difference appears after information enters the organization.

Which information is stored? In what form? What is kept? What is dropped as noise? Where is evidence checked? What returns to humans? What becomes organizational memory?

This conversion capability is not just an operational detail.

For financial institutions in the AI era, it may become an organizational capability that supports research, monitoring, judgment, and accountability.

Implications

If this view is right, the difference between AI adopters will not be determined only by access to models. Many organizations will have access to similar models, vendors, and productivity tools. The difference will appear after that.

The advantage will come from how work is decomposed, how judgment units are defined, how evidence is recorded, how exceptions are routed, and how outputs become auditable.

For institutional investors, this may matter most in areas where information is fragmented. Private markets, external manager research, GP monitoring, thematic research, and cross-border market intelligence all require repeated judgment over incomplete information. AI can expand the range of information humans can cover, but only if the workflow is designed so the organization can trust the output.

For vendors selling AI to financial institutions, the implication is clear. A polished UI is not enough. The product has to fit into data permissioning, review gates, audit logs, evidence presentation, workflow state management, and exception handling. In regulated or fiduciary contexts, the product that wins may not be the one that looks most magical. It may be the one that makes AI work governable.

For leaders, the question is not "How much work can we give to AI?"

The better question is: "Which workflows must be redesigned before AI can be useful safely?"

That question is less exciting than a demo. But it is closer to where durable advantage may be created.

Open question

The question I would watch is whether financial institutions can move from AI adoption to AI-native operating design.

Adoption means giving employees tools. Operating design means changing the way work enters the organization, how it is decomposed, how evidence is preserved, how judgment escalates, and how learning accumulates.

The first improves productivity. The second changes organizational capability.

The next phase of enterprise AI will not be determined only by better models. It will be determined by whether organizations can build workflows that are readable to humans, usable by AI, and governable by the institution.

To put it simply: AI does not work well inside human-designed workflows. If an organization wants AI to do real work, it has to redesign the workflow and the data so AI can process, judge, verify, and hand off work without getting lost.