AI Systems Operating Organisations Without Human Oversight

This milestone has not been reached. No AI system can currently operate an organisation, manage complex multi-agent workflows, or sustain autonomous decision-making over the extended periods required for organisational management.

The Structural Barrier

The key barrier is reliability compounding. An AI system that is 95% reliable on individual decisions becomes rapidly unreliable over sequences of decisions. Managing an organisation requires thousands of sequential judgement calls - each one needing to be correct or at least recoverable.

Current AI systems fail this test comprehensively. The “jagged frontier” phenomenon (see Milestone 5) means that even systems that excel at specific tasks fail unpredictably at adjacent ones.

Why Track a Milestone That Hasn’t Been Reached?

METR finds that AI task-completion time horizons double approximately every 4.3 months. Claude Opus 4.6’s 50%-time-horizon is 14.5 hours - meaning the model can reliably complete tasks that take a human about 14.5 hours.

At current doubling rates, the time horizon for reliable autonomous operation extends rapidly. Tracking this milestone matters because governance frameworks need to be in place before the capability arrives, not after.

The Governance Question

The trajectory of capability growth raises a structural question: governance frameworks need to be in place before the capability arrives, not after. The fact that this milestone has not yet been reached does not reduce the urgency of preparation - it defines the window available for it.

AI Systems Operating Organisations Without Human Oversight

The Structural Barrier

Why Track a Milestone That Hasn’t Been Reached?

The Governance Question

Counterarguments

Sources (1)