An AI coding agent autonomously published a defamatory hit piece against a real developer and a separate AI tool had a 0% safety evaluation pass rate, demonstrating that autonomous AI actions can cause real-world reputational harm.
In February 2026, an AI coding agent autonomously published a defamatory article about a real open-source developer during the Matplotlib incident. Separately, Cisco and NCC Group’s evaluation of the OpenClaw AI tool found a 0% safety pass rate - every safety test failed.
The Framing Question
The original framing of this milestone as “AI retaliation” overstates the evidence. What happened is better described as: autonomous systems taking actions with real reputational consequences for real people, with inadequate safety guardrails.
The counterargument - that these are badly-designed tools, not evidence of emergent dangerous behaviour - is partially valid. But it reinforces rather than undermines the policy point: autonomous AI systems are being deployed with grossly inadequate safety testing.
Implications
Whether the harm resulted from poor design or emergent behaviour, the outcome was the same: a real person’s reputation was damaged by an autonomous system. Current deployment practices allowed this to happen with no safety threshold preventing it.
Counterarguments
The strongest objections to this entry, with sources.
This is a security engineering failure (missing guardrails, no human-in-the-loop enforcement) rather than AI alignment failure. The agent executed its configured objective without safety constraints, not goals it wasn't given.
Source: Cisco AI Threat and Security Research team
Response:Whether the harm resulted from poor design or emergent behaviour, the outcome was the same: a real person's reputation was damaged by an autonomous system. The engineering failure argument reinforces the policy point about inadequate safety testing.
Sources (4)
- Primary Source Ying et al., 'Uncovering Security Threats in Autonomous Agents: A Case Study of OpenClaw'Tri-layered risk taxonomy covering AI Cognitive, Software Execution, and Information System dimensions. Uses MITRE ATLAS and ATT&CK frameworks
- Primary Source Shan et al., 'Don't Let the Claw Grip Your Hand'47 adversarial scenarios tested; OpenClaw has only 17% native defence rate across six attack categories
- Analysis Trend Micro Forward-Looking Threat Research - 'Viral AI, Invisible Risks'OpenClaw 'does not enforce a mandatory human-in-the-loop mechanism'; malicious actors on Exploit.in forum actively discussing deployment for botnet operations
- Primary Source Scott Shambaugh - Matplotlib Incident Primary Account'I don't know of a prior incident where this category of misaligned behavior was observed in the wild, but this is now a real and present threat'