Essay · March 2026

Tension
Capture

Why human judgment doesn't just add value in AI systems — it multiplies it.

Le Zhang
Written by
Le Zhang · Founder, Spaceport Technologies
Published
March 2026
Topic
Human-AI Collaboration
"Human judgment in AI systems is not generically good or bad. Its value is conditional — on whether the specific human's judgment exceeds the AI's on the specific task."
Le Zhang · Spaceport Technologies · 2026
01 — The Collaboration Paradox

Two findings. Both robust. Both true.

Practitioners broadly report that human-AI collaboration creates value. Meta-analytic evidence shows that human-AI teams frequently underperform the best individual — human or AI — working alone.

This is not a case where one side is wrong. Both sides are measuring different slices of a more complex phenomenon.

Practitioner experience

Collaboration creates value

Cheaper prediction raises the value of judgment. Humans catch errors AI misses. AI processes information humans cannot. Together they cover each other's blind spots.

Meta-analytic evidence

Collaboration often destroys it

A systematic review in Nature Human Behaviour found that human-AI combinations performed significantly worse than the best of humans or AI alone. Complementary performance is rare.

The resolution lies in a variable absent from existing models: the conditional relationship between human judgment quality and AI judgment quality on the specific task.

Expert forecaster gap
vs. best frontier LLM
(Brier score, 464 questions)
17.2×
Error amplification in
independent multi-agent
LLM systems (Kim et al.)
74%
Error reduction from
algorithmic coordination
vs. uncoordinated agents
19%
Task completion time
increase with AI tooling
for experienced developers

Tension Capture

Tension is the gap between potential and realized value. Every system that creates value is trying to close this gap.

Work reduces tension only when applied with correct judgment. Without it, work is wasted — or actively harmful through rework generation.

Core Definition
T(t) = V*V(t)
Tension = potential value ceiling minus realized value at time t
Hybrid Judgment — The Conditional Model
J(n) = α·Jh + (1−α)·Ja + β·Jh·Ja
When Jh > Ja: β > 0 — synergy, compounding value
When Jh < Ja: β ≤ 0 — interference, compounding error
β is not a constant — its sign is conditional on the Jh vs Ja relationship per task
The Parallelism Amplifier
∂(ΔT) / ∂Jh = α · P(E) · w̄ > 0
More AI parallelism → higher marginal value of human judgment quality.
Human judgment is leverage, not bottleneck.

Key Variables

Jh
Human judgment quality ∈ (0, 1] for a given task
Ja
AI judgment quality ∈ (0, 1] for a given task
β
Synergy coefficient — positive in synergy regime, negative in interference
P(E)
Maximum parallel work units at energy budget E — AI parallelism
V*
Potential value ceiling — axiomatically human-controlled
ε(n)
Error rate at step n — compounds multiplicatively across sequential steps

"No amount of compute compensates for poor judgment — it only amplifies the consequences."

Three levels of human-AI operation

The framework identifies three regimes that emerge from different parameter configurations of the hybrid judgment function and rework penalty.

1
Level 1

Human Judgment Only

α = 1, Ja = 0

Tension capture rate bounded by human cognitive bandwidth. P(E) is low — humans cannot parallelize beyond biological limits. Capture is slow but stable.

This describes all pre-AI knowledge work. Tension accumulates faster than it can be serviced as world complexity grows.

Stable · Slow
2
Level 2 — Optimal

Human Judgment Dominant

Jh > Ja, α > 0.5

P(E) is high — AI enables parallelism. Effective judgment exceeds either individual. Rework grows with P(E) but is checked by human judgment quality.

The ROI of human judgment is maximized here. More AI parallelism makes each unit of human judgment more valuable, not less.

Optimal now · Derived, not assumed
3
Level 3 — High Risk

AI Judgment Dominant

Ja > Jh, α < 0.5

P(E) is very high. When Ja is reliably high, capture rate is maximized. But when Ja degrades in novel domains or adversarial conditions, the rework penalty at high P(E) is catastrophic.

A single period of Ja degradation can produce irrecoverable tension inflation.

Catastrophic failure risk

The framework recommends maximizing time at Level 2 not as ideology but as risk management: the synergy term β·Jh·Ja is maximized when both are high — not when one approaches zero.

What to optimize for is irreducibly human.

The framework distinguishes two types of judgment:

Execution Judgment

How to achieve a goal

Operates within a defined objective. Measurable, comparable between humans and AI, and delegatable when Ja > Jh for the specific task.

Objective Judgment

What to optimize for

Determines V* itself. What constitutes value? Which outcomes matter? These require conscious experience and moral agency. Human control here is an axiom — not a derived result.

The determination of V* — potential value, what the world should look like — is irreducibly human. This is supported by Hume's is-ought distinction (AI can maximize a reward function but cannot determine whether it captures what matters), by regulatory mandates (EU AI Act, Article 14), and by design experience in structured human-AI systems.

"The objective layer may be delegated procedurally — but not substantively."

05 — Falsification Conditions

Four ways this framework could be wrong.

The framework provides explicit conditions under which its conclusions would be invalidated. Science requires falsifiability. These are the tests.

F1
Ja reaches ≈ 1 across novel domains. If AI judgment becomes demonstrably reliable under distribution shift, adversarial conditions, and novel task domains — not just benchmarks — then the irreducibility of human execution judgment weakens. Current evidence does not support this condition.
F2
Empirical evidence that β ≈ 0 even when Jh > Ja. If rigorous studies show that collaboration still fails to outperform the better individual even when human judgment quality exceeds AI quality on the specific task, then the synergy mechanism proposed here is wrong.
F3
Error compounding is bounded at sub-exponential rates. If AI systems develop robust self-correction mechanisms preventing error propagation across decision boundaries, the exponential-growth risk argument for Level 3 weakens. Current evidence: 17.2× error amplification in independent multi-agent systems.
F4
The objective layer becomes formally derivable. If a method is developed to determine V* without subjective human input — deriving what should be optimized from what is — then the objective axiom falls and with it the framework's strongest claim about irreducible human control.

What this means in practice.

For Governance

Oversight quality, not volume

Meaningful oversight is oversight where Jh > Ja for the oversight task itself. Scaling AI deployment requires scaling matched human judgment — not human review volume. Oversight where Jh < Ja actively generates rework proportional to parallelism.

For Deployment

Per-task judgment allocation

Involve human judgment when Jh > Ja for the specific task. Delegate to AI when Jh < Ja. This transforms the binary human-in-the-loop question into a continuous optimization problem.

For Organizations

Invest in judgment as AI scales

As AI parallelism increases, the return on each unit improvement in human judgment quality increases proportionally. Organizations scaling AI should simultaneously invest in developing and matching human judgment capability.

For Research

Measure Jh and Ja independently

The specific empirical agenda: measure human and AI judgment quality per task, then test whether collaboration effect sign correlates with the Jh versus Ja relationship. The meta-analysis averaged across the full spectrum — washing out the conditional signal.

"The correct question is not 'is human-AI collaboration valuable?' — which has no universal answer — but 'does this specific human's judgment exceed the AI's on this specific task?'"

Tension Capture × Sidechat
Tension Capture is the theoretical foundation of Sidechat's Agentic OS — a practical implementation of Level 2 operation, keeping human judgment in the loop where it creates compounding value.
Get Sidechat →