Tension Capture

01 — The Collaboration Paradox

Two findings. Both robust. Both true.

Practitioners broadly report that human-AI collaboration creates value. Meta-analytic evidence shows that human-AI teams frequently underperform the best individual — human or AI — working alone.

This is not a case where one side is wrong. Both sides are measuring different slices of a more complex phenomenon.

Practitioner experience

Collaboration creates value

Cheaper prediction raises the value of judgment. Humans catch errors AI misses. AI processes information humans cannot. Together they cover each other's blind spots.

Meta-analytic evidence

Collaboration often destroys it

A systematic review in Nature Human Behaviour found that human-AI combinations performed significantly worse than the best of humans or AI alone. Complementary performance is rare.

The resolution lies in a variable absent from existing models: the conditional relationship between human judgment quality and AI judgment quality on the specific task.

6×

Expert forecaster gap
vs. best frontier LLM
(Brier score, 464 questions)

17.2×

Error amplification in
independent multi-agent
LLM systems (Kim et al.)

74%

Error reduction from
algorithmic coordination
vs. uncoordinated agents

19%

Task completion time
increase with AI tooling
for experienced developers

02 — The Framework

Tension is the gap between potential and realized value. Every system that creates value is trying to close this gap.

Work reduces tension only when applied with correct judgment. Without it, work is wasted — or actively harmful through rework generation.

Core Definition

T(t) = V* − V(t)

Tension = potential value ceiling minus realized value at time t

Hybrid Judgment — The Conditional Model

J(n) = α·J_h + (1−α)·J_a + β·J_h·J_a

When J_h > J_a: β > 0 — synergy, compounding value
When J_h < J_a: β ≤ 0 — interference, compounding error
β is not a constant — its sign is conditional on the J_h vs J_a relationship per task

The Parallelism Amplifier

∂(ΔT) / ∂J_h = α · P(E) · w̄ > 0

More AI parallelism → higher marginal value of human judgment quality.
Human judgment is leverage, not bottleneck.

Key Variables

J_h

Human judgment quality ∈ (0, 1] for a given task

J_a

AI judgment quality ∈ (0, 1] for a given task

Synergy coefficient — positive in synergy regime, negative in interference

P(E)

Maximum parallel work units at energy budget E — AI parallelism

Potential value ceiling — axiomatically human-controlled

ε(n)

Error rate at step n — compounds multiplicatively across sequential steps

"No amount of compute compensates for poor judgment — it only amplifies the consequences."

03 — Operating Regimes

Three levels of human-AI operation

The framework identifies three regimes that emerge from different parameter configurations of the hybrid judgment function and rework penalty.

Level 1

Human Judgment Only

α = 1, J_a = 0

Tension capture rate bounded by human cognitive bandwidth. P(E) is low — humans cannot parallelize beyond biological limits. Capture is slow but stable.

This describes all pre-AI knowledge work. Tension accumulates faster than it can be serviced as world complexity grows.

Stable · Slow

Level 2 — Optimal

Human Judgment Dominant

J_h > J_a, α > 0.5

P(E) is high — AI enables parallelism. Effective judgment exceeds either individual. Rework grows with P(E) but is checked by human judgment quality.

The ROI of human judgment is maximized here. More AI parallelism makes each unit of human judgment more valuable, not less.

Optimal now · Derived, not assumed

Level 3 — High Risk

AI Judgment Dominant

J_a > J_h, α < 0.5

P(E) is very high. When J_a is reliably high, capture rate is maximized. But when J_a degrades in novel domains or adversarial conditions, the rework penalty at high P(E) is catastrophic.

A single period of J_a degradation can produce irrecoverable tension inflation.

Catastrophic failure risk

The framework recommends maximizing time at Level 2 not as ideology but as risk management: the synergy term β·J_h·J_a is maximized when both are high — not when one approaches zero.

04 — The Objective Layer

What to optimize for is irreducibly human.

The framework distinguishes two types of judgment:

Execution Judgment

How to achieve a goal

Operates within a defined objective. Measurable, comparable between humans and AI, and delegatable when J_a > J_h for the specific task.

Objective Judgment

What to optimize for

Determines V* itself. What constitutes value? Which outcomes matter? These require conscious experience and moral agency. Human control here is an axiom — not a derived result.

The determination of V* — potential value, what the world should look like — is irreducibly human. This is supported by Hume's is-ought distinction (AI can maximize a reward function but cannot determine whether it captures what matters), by regulatory mandates (EU AI Act, Article 14), and by design experience in structured human-AI systems.

"The objective layer may be delegated procedurally — but not substantively."

05 — Falsification Conditions

Four ways this framework could be wrong.

The framework provides explicit conditions under which its conclusions would be invalidated. Science requires falsifiability. These are the tests.

J_a reaches ≈ 1 across novel domains. If AI judgment becomes demonstrably reliable under distribution shift, adversarial conditions, and novel task domains — not just benchmarks — then the irreducibility of human execution judgment weakens. Current evidence does not support this condition.

Empirical evidence that β ≈ 0 even when J_h > J_a. If rigorous studies show that collaboration still fails to outperform the better individual even when human judgment quality exceeds AI quality on the specific task, then the synergy mechanism proposed here is wrong.

Error compounding is bounded at sub-exponential rates. If AI systems develop robust self-correction mechanisms preventing error propagation across decision boundaries, the exponential-growth risk argument for Level 3 weakens. Current evidence: 17.2× error amplification in independent multi-agent systems.

The objective layer becomes formally derivable. If a method is developed to determine V* without subjective human input — deriving what should be optimized from what is — then the objective axiom falls and with it the framework's strongest claim about irreducible human control.

06 — Implications

What this means in practice.

For Governance

Oversight quality, not volume

Meaningful oversight is oversight where J_h > J_a for the oversight task itself. Scaling AI deployment requires scaling matched human judgment — not human review volume. Oversight where J_h < J_a actively generates rework proportional to parallelism.

For Deployment

Per-task judgment allocation

Involve human judgment when J_h > J_a for the specific task. Delegate to AI when J_h < J_a. This transforms the binary human-in-the-loop question into a continuous optimization problem.

For Organizations

Invest in judgment as AI scales

As AI parallelism increases, the return on each unit improvement in human judgment quality increases proportionally. Organizations scaling AI should simultaneously invest in developing and matching human judgment capability.

For Research

Measure J_h and J_a independently

The specific empirical agenda: measure human and AI judgment quality per task, then test whether collaboration effect sign correlates with the J_h versus J_a relationship. The meta-analysis averaged across the full spectrum — washing out the conditional signal.

"The correct question is not 'is human-AI collaboration valuable?' — which has no universal answer — but 'does this specific human's judgment exceed the AI's on this specific task?'"

TensionCapture