Second Key Update: Technical Safeguards and Risk Management

Summary

Examines technical approaches to general-purpose AI risk management, covering training safeguards, deployment monitoring, provenance tracking, and institutional governance frameworks.

Key quotes

sophisticated attackers can often bypass current defences, and the real-world effectiveness of many safeguards is uncertain.

the number of companies publishing Frontier AI Safety Frameworks has more than doubled in 2025

This update focuses on technical mitigations for AI risks, assessing the gap between developing safety measures and the ability of sophisticated actors to circumvent them. It specifically analyzes the risks associated with open-weight models and the emergence of AI agent accountability frameworks.