Second Key Update: Technical Safeguards and Risk Management
Summary
Examines technical approaches to general-purpose AI risk management, covering training safeguards, deployment monitoring, provenance tracking, and institutional governance frameworks.
Key quotes
sophisticated attackers can often bypass current defences, and the real-world effectiveness of many safeguards is uncertain.
the number of companies publishing Frontier AI Safety Frameworks has more than doubled in 2025
This update focuses on technical mitigations for AI risks, assessing the gap between developing safety measures and the ability of sophisticated actors to circumvent them. It specifically analyzes the risks associated with open-weight models and the emergence of AI agent accountability frameworks.