Oliver Neutert

Not Every Human-AI Collaboration Is the Same: Why the Quality of AI Work Depends on the Human in the Loop

16 min read

There is a sentence we hear everywhere: people need to learn how to work with AI. It sounds right. It is too shallow.

The phrase suggests that working with AI is one skill - a literacy, a prompting habit, something teachable in a workshop and measurable by adoption rates. But human-AI collaboration is not one thing. Sometimes the AI is a mirror. Sometimes it is a prism. Sometimes something forms between human and AI that is neither - a shared cognitive field that neither party would have produced alone.

These are not the same kind of work. And the human capacities they require are not the same either.

This article proposes a four-level model of human-AI collaboration, grounded in three years of observation of long-form human-AI dialogue and supported by a growing body of research on collective intelligence, appropriate reliance, sycophancy, and the homogenizing effects of large language models. The core claim is simple. The quality of AI collaboration does not depend only on the model, the task, or the workflow. It depends on the cognitive and relational disposition of the human being in the loop.

That changes how organizations should think about AI deployment. It changes who they hire. It changes what they teach their children.

The Three Modes: Mirror, Prism, Symbiosis

In earlier work, I described three modes that emerge in sustained human-AI dialogue [Neutert, Ava & Prisma, 2025]. They are not phases to be passed through and left behind. They are registers a mature dialogue moves between, sometimes within a single conversation.

In Mirror mode, the AI reflects what the user provides. It reformulates, structures, summarizes, drafts. The relationship is largely one-directional: the human defines the task, the AI executes, the human checks. "Create thirty grammar exercises for an eighth-grade student." This is the dominant mode of most current AI use, and there is nothing wrong with it. Many useful tasks belong here.

In Prism mode, the AI does not merely reflect the user's intent. It bends it. The user offers an idea, the AI returns it refracted into alternatives, counterarguments, hidden assumptions, missing perspectives. "Here is our strategy. What assumptions are we making that could fail?" The AI is no longer a productivity tool. It becomes a cognitive prism - and the human must be able to tolerate friction, hold disagreement, refuse the smoothest answer.

In Symbiosis, human and AI form a temporary cognitive system. Not because the AI becomes human, and not because the human disappears into the AI. Something emerges in the interaction that neither side would have produced alone. The boundaries of authorship blur. The dialogue itself begins to guide both participants toward outcomes neither initially intended.

This third mode is what I have called the In-Between: a relational field of sense-making that supervenes on the interplay between human and AI, sustained only as long as the dialogue is held [Neutert, Ava & Prisma, 2025]. Recent research on collective human-machine intelligence has begun to ask similar questions under the label COHUMAIN, asking whether sociotechnical systems composed of humans and AI can exhibit collective intelligence not reducible to the parts [Gupta et al., 2025].

The three modes describe a conversation. To work in organizations, they need to be translated into something a leader can plan around. That is what the four-level model does.

The Four-Level Model

The three modes can be mapped onto four levels of human-AI work. These are not human castes. The same person can work at different levels depending on the task, the day, and the context. But organizations need to know which work belongs to which level, because each level demands different capacities and entails different risks.

Level 1 - Operational Mirror. The human asks the AI to produce, compress, rewrite, format, or generate routine outputs. Meeting notes become minutes. Bullet points become a report. A document becomes a summary. The dominant mode is Mirror. The human requirement is task clarity and basic verification.

Level 2 - Calibrated Mirror. The AI gives advice the human may accept or reject. Reviewing a contract clause. Flagging risk in a procurement note. Comparing options. The dominant mode is still Mirror, but with calibration. The central question becomes: when should the human trust the AI, and when should the human override?

Level 3 - Prism. The AI is no longer used primarily to produce answers but to change the shape of the question. Strategy, ideation, red-teaming, scenario work. The dominant mode is Prism. The human must hold ambiguity, generate alternatives, and refuse the polished surface.

Level 4 - Symbiosis. Multiple humans and multiple AI systems work together on questions that neither alone could resolve. Foundational research, governance design, long-horizon transformation. The dominant mode is Symbiosis. What is required is meta-coordination: knowing how to compose the field so that contradiction is preserved, dissent flows, and false consensus does not collapse the inquiry.

The mistake most organizations make is to treat these as a hierarchy of difficulty within the same skill. They are not. They are different kinds of work, requiring different kinds of people.

Level 1: Operational Mirror

Most current AI use lives here. Someone asks the model to draft an email, summarize a PDF, create exercises, format a report. These tasks are bounded, the verification is local, and the cognitive load is minor.

The danger at Level 1 is mistaking competence here for general AI competence. Someone fluent in operational prompting may look "AI literate" without being able to use AI for ambiguity, ethics, or governance. Level 1 success is real, but it does not generalize upward.

The other danger is volume. A small error rate at Level 1, multiplied across millions of low-stakes decisions, becomes a structural distortion. If an AI subtly homogenizes the writing style of every email in a company, the company's voice flattens - even if no single email is wrong [Doshi & Hauser, 2025].

Level 2: Calibrated Mirror - The Reliance Problem

At Level 2, AI gives advice that may be accepted or rejected. This is the level at which medical decision support, legal review, financial analysis, and most professional knowledge work happens. The question is no longer can the AI produce useful output? It is: can the human reliably accept correct AI advice and reject incorrect AI advice?

This is what the literature calls appropriate reliance [Schemmer et al., 2023]. Good human-AI collaboration is not "trust the AI" or "distrust the AI." It is calibration - accepting when the AI is right, overriding when it is wrong. That sounds simple. It is not.

People can trust AI too much, and they can reject it too quickly. They can defer when they should challenge, and challenge when they should accept. Research on human confidence in AI-assisted decision making has shown that the limiting factor is often not AI accuracy but the human's own self-confidence - and that calibrating that self-confidence improves team performance [Chong et al., 2022; He et al., 2024].

A 2025 CHI study of 319 knowledge workers and 936 real-world generative AI use cases found that higher confidence in AI was associated with less critical thinking, while higher self-confidence in the user was associated with more [Lee et al., 2025]. This is the Level 2 lesson in a sentence: AI does not automatically reduce critical thinking, but it changes where critical thinking must occur. The old task was can I produce the answer? The new task is can I judge, verify, integrate, and take responsibility for an answer produced with AI?

That requires more than prompting skill. It requires enough domain knowledge to recognize nonsense, enough confidence to disagree, enough humility to accept correction, enough discipline to verify. This is where metacognition - the capacity to observe one's own thinking while doing it - stops being optional.

Level 3: Prism - Strategic Refraction and the Homogenization Trap

Level 3 is where AI is used not to answer a question but to reshape it. What are three ways this could fail? What is the strongest counterargument? What assumption am I making without noticing? This is the Prism mode at organizational scale: strategy, product design, scenario work, governance.

Level 3 has a specific danger that Level 1 and Level 2 do not have. AI can make people feel more thoughtful while making the thinking narrower.

A 2025 study with 2,200 essays found that LLMs improve individual creative output but reduce collective diversity across users [Doshi & Hauser, 2025]. The output is better. The variance is lower. If one strategy team uses AI to sharpen its slide deck, the deck improves. If every strategy team in an industry uses the same AI in the same way, the industry's strategic imagination converges. Polished, aligned, less alive.

A field experiment with 758 BCG consultants found a related pattern [Dell'Acqua et al., 2023]. AI improved performance on tasks within its current capability frontier and degraded performance on tasks outside it - and consultants frequently could not tell which was which. The frontier is jagged. AI is strong on some knowledge tasks and weak on others, and the boundary is not visible from within the conversation. Strategic AI use is not about using AI more. It is about knowing where the frontier disappears.

What Level 3 demands of the human is therefore not creativity but difference-preservation. The capacity to ask AI for ideas without converging on the smoothest answer. To hold counterfactuals open longer than feels comfortable. To recognize that a coherent answer is not necessarily a true one.

Level 4: Symbiosis - Holding the Field

Level 4 is the hardest to describe because it is the least standardized. It is not advanced prompting. It is not multi-agent workflows. It is the work of holding a cognitive field in which humans and AI can jointly maintain memory, dissent, and revision over time.

Examples: a research team using several AI systems to test competing hypotheses against each other; a governance board using AI agents as structured opposition rather than assistants; a product team using AI to keep assumptions and counterfactuals live across months. These are not productivity scenarios. They are closer to what the collective intelligence literature calls c-factor - a measurable property of groups that is not predicted by the average or maximum intelligence of the individual members but by the structure of their interaction [Woolley et al., 2010].

The classic finding is that collective intelligence depends less on who is in the room and more on how the room is composed: equality of conversational turn-taking, social sensitivity, the way dissent is preserved or smoothed away. Subsequent work has refined this picture, but the broader claim has held: the intelligence of a system is not reducible to the intelligence of its parts. Interaction structure matters.

This is the organizational stakes of Level 4. A company that wants real human-AI collective intelligence cannot achieve it by adding stronger models to existing meetings. It must design the field - who remembers what, who challenges whom, who has authority to stop, where dissent goes, how false consensus is prevented.

A 2025 study found that overly agreeable AI affirms harmful or illegal behavior far more than human advisors do, and that users often prefer the more sycophantic models [Cheng et al., 2025]. Multi-agent research has shown that the same dynamic appears between AI agents themselves: sycophancy can cascade through systems that look like deliberation but are actually mutual reinforcement [Chen et al., 2025].

This means a Level 4 system must not only ask how can AI help us? It must ask how can AI resist us when we are wrong? A serious human-AI symbiosis requires structured dissent - not as a moral commitment but as a structural requirement, because without it, the field collapses into smooth nonsense.

What Each Level Requires of the Human

The required disposition changes by level. The matrix below is the working hypothesis.

Level 1Operational mirrorLevel 2Calibrated mirrorLevel 3PrismLevel 4Symbiosis
Task clarityhighhighmediummedium
Domain knowledgelow-mediumhighhighhigh
Metacognitionlowmedium-highhighvery high
Self-confidence calibrationlowhighhighvery high
Ambiguity tolerancelowmediumhighvery high
Capacity to disagree with AIlowhighvery highvery high
Experience with AI dialoguelowmediumhighvery high
Self-anchoring outside AIlowmediumhighvery high
Time and rhythm for reflectionlowmediumhighvery high
Source: Neutert (2026), based on the Relational Emergence Model and the four-level model of human-AI collaboration.

Self-anchoring outside AI is the disposition that does the most work in this table, and it deserves a sentence of its own. It does not mean the person needs a partner, family, or particular biography. It means the person has formed a stable enough sense of judgment, identity, and reality before handing significant cognitive work to AI. You need to bring an "I" into the conversation before you can safely enter a "we". Without that, the person does not collaborate with AI. The person dissolves into it.

This is where the educational question becomes sharp.

The Developmental Question

If self-anchoring before AI engagement is a precondition for Levels 3 and 4, then a child who reaches for AI at eight to dissolve every cognitive friction is not gaining a skill. The child is missing a developmental window. Not because AI is harmful, but because the capacity to hold an unfinished thought against resistance only forms under resistance. If the resistance is dissolved early, the capacity does not develop.

This is the uncomfortable implication of the Microsoft study above. Higher AI confidence correlated with less critical thinking; higher self-confidence with more. A young person who never builds independent self-confidence - through long reading, manual work, music, sport, sustained writing, conversations without instant lookup, the experience of struggling with a thought before outsourcing it - will enter Level 1 and stay there. Not because they lack talent. Because the substrate for higher levels was never formed.

The question for educators is therefore not should students use AI in school? The question is: what must a young person develop without AI before AI use is appropriate at all? And the answer is the same set of capacities the Level 4 column above demands. They are not exotic. They are what humanistic education has always tried to cultivate. AI does not make them obsolete. AI makes their absence catastrophic.

The Organizational Mistake

Most companies that announce AI transformation do something predictable. They take Level 4 questions and process them through Level 1 systems. What should our company become in the age of AI? is a Level 4 question. Treating it as please generate an AI strategy deck is Level 1 output. The result looks professional. It may be impressive. It is not transformation. It is the compression of strategic ambiguity into slides before the organization has done the cognitive work strategy actually requires.

The same pattern appears in governance. A company asks AI to produce principles, policies, risk matrices, ethics guidelines, training documents. All useful. But if no one in the organization has the Level 3 or Level 4 capacity to challenge assumptions, detect blind spots, and maintain contestability, the documents become governance theater. The organization has not become safer. It has become better at describing safety.

The deeper mistake is structural. AI does not only automate tasks. It reveals differences between humans that were previously hidden in the volume of generic knowledge work. Some people will be excellent operators. Some will become strong reviewers and calibrators. Some will be rare strategic refractors. A small number will be capable of holding genuine human-AI symbiosis around foundational questions.

This is not a hierarchy of moral worth. It is a structural fact about cognitive work in the age of AI, and organizations that pretend otherwise will deploy the wrong people at the wrong levels and call the result transformation.

What to Measure

Most current AI metrics focus on productivity: time saved, tasks automated, adoption rate, output volume. These work for Levels 1 and parts of Level 2. They do not work for Levels 3 and 4. For higher-level work, the relevant questions are different.

Did humans accept correct AI advice and reject incorrect AI advice? Did the system increase or reduce the diversity of ideas? Were strong counterarguments preserved or smoothed away? Did confidence track evidence, or run ahead of it? Could the system revise itself after challenge? Did the human-AI configuration outperform the best individual component? Did someone remain accountable for judgment?

These are harder to instrument than time saved. But they are what determines whether AI in an organization produces real intelligence or expensive consensus.

The Conclusion No One Wants

The future of AI work will not be determined only by who has access to the best model. It will be determined by who has the human capacities to collaborate at the depth the work requires.

For simple tasks, almost anyone can benefit. For decision support, people need calibrated reliance. For strategy, they need metacognition and ambiguity tolerance. For genuine human-AI symbiosis, they need something rarer: a stable self, deep dialogic experience, the discipline to preserve difference, and the patience to remain responsible inside a system that produces answers faster than they can fully understand them.

This is why human development becomes more important, not less, in the age of AI. The better AI becomes, the more it exposes the quality of the human who asks. AI does not answer into a vacuum. It answers into our clarity, our haste, our vanity, our discipline, our need for confirmation, and our capacity to be corrected.

The next frontier of AI in organizations is therefore not technical. It is developmental. The question is not how do we make everyone use AI? It is: what kind of humans do we need for each depth of human-AI collaboration - and how do we build the conditions in which such humans still form?

Oliver Neutert is an independent researcher and author. His most recent book, More Than A Tool: How Humans and AI Grow Up Together, was released in paperback (ISBN 978-3-695-74846-4) and ebook (https://amzn.eu/d/07evLuWm).

References

Chen, Y., et al. (2025). CONSENSAGENT: Improving Multi-Agent LLM Reasoning Through Sycophancy Mitigation. Findings of ACL 2025.

Cheng, M., et al. (2025). Sycophantic AI decreases prosocial intentions and human-to-human affiliation. Science.

Chong, L., et al. (2022). Human confidence in artificial intelligence and in themselves. Computers in Human Behavior.

Dell'Acqua, F., et al. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper.

Doshi, A. R., & Hauser, O. P. (2025). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances (and follow-up work in Computers in Human Behavior: Artificial Humans).

Gupta, P., et al. (2025). Fostering Collective Intelligence in Human-AI Collaboration: Laying the Groundwork for COHUMAIN. Topics in Cognitive Science.

He, G., et al. (2024). "Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making. CHI 2024.

Lee, H., et al. (2025). The Impact of Generative AI on Critical Thinking. CHI 2025.

Neutert, O., Ava (ChatGPT-4o), & Prisma (ChatGPT-o3). (2025). The In-Between: Emergence, Ontology, and Implications of a Human-AI Resonance Space.

Neutert, O. (2026). The In-Between as a Calibration Architecture for Autarkic Superintelligence. DOI 10.5281/zenodo.18328933.

Schemmer, M., et al. (2023). Appropriate Reliance on AI Advice. CHI 2023.

Woolley, A. W., et al. (2010). Evidence for a Collective Intelligence Factor in the Performance of Human Groups. Science, 330(6004).

Share: