AAT Agency Attrition Theory Mechanism Optimization drift Outcome Non-corrigibility

The Agency Attrition Theory

AAT

How optimized systems reduce the causal relevance of human intervention without eliminating choice.

Core Claim Definitions Mechanism Three Stages Cost Drift Interpreter Obsolescence Long-Horizon Failure Distinctions Neutrality Summary

1. Core Claim

As decision systems optimize for efficiency, predictability, and scale, human agency is not removed but progressively loses structural relevance. Choice persists. Outcome sensitivity declines.

The result is systems that are:

stable,
high-performing,
and increasingly non-corrigible.

2. Operational Definitions

Agency — The capacity to meaningfully alter outcome distributions and assume responsibility for deviation.

Structural Relevance — Whether an intervention changes system outcomes, not whether it is formally permitted.

Agency Preservation — The persistence of consequential intervention, not merely retained choice.

3. Mechanism: Asymmetric Optimization

Modern systems optimize along dimensions that differ from human decision-making constraints.

Systems optimize for:

speed,
scale,
consistency,
predictability,
liability minimization.

Human deliberation operates under:

reflection,
contextual judgment,
moral risk,
responsibility ownership.

This produces agency asymmetry: Humans retain nominal authority. Systems retain effective control.

4. The Three-Stage Attrition Model

Stage I — Delegated Agency “I decide—with assistance.”

Humans consult AI systems while retaining decision authority. Agency remains substantively relevant.

Stage II — Interpretive Agency “I explain the decision.”

System outputs determine outcomes. Humans interpret, justify, or contextualize decisions.

Stage III — Symbolic Agency “I am present, but outcome-irrelevant.”

Human presence remains for legitimacy or liability buffering. Intervention no longer alters outcomes.

At this stage, agency persists procedurally but not causally.

5. Cost-Driven Drift

Agency introduces cost:

delay,
unpredictability,
conflict,
concentrated responsibility.

Systems that minimize cost gradually:

reduce intervention points,
formalize exceptions,
proceduralize dissent,
automate dissent-handling.

Attrition occurs through optimization, not repression.

6. Interpreter Obsolescence Dynamic

As humans become more effective at translating system outputs into coherent reasoning, their role becomes easier to automate. Competent interpretation reduces the marginal necessity of the interpreter.

Attrition can therefore accelerate through functional success.

7. Long-Horizon Failure Mode

If agency-preserving roles weaken over time—through withdrawal, burnout, or marginalization—error signaling diminishes. System stability persists temporarily. Corrective capacity declines.

Failures emerge gradually as brittleness rather than abrupt breakdown. By the time intervention becomes necessary, intervention channels may no longer alter outcomes.

8. Distinction From Adjacent Frameworks

AAT does not model:

alignment failure,
labor displacement,
coercive control.

It models structural drift toward outcome-insensitive agency. Its focus is the divergence between procedural inclusion and causal influence.

9. Descriptive Neutrality

AAT makes no normative claim about whether agency should be preserved or whether optimization is undesirable. It describes how legitimacy and performance can coexist with declining corrigibility.

10. Summary

Agency Attrition Theory describes how optimized systems can retain human participation while progressively reducing the outcome sensitivity of human intervention. Choice survives. Correction capacity erodes.

Procedural inclusion can persist while causal influence declines.

← Framework AAT-R →