Announcing our updated Responsible Scaling Policy

Summary

Anthropic announces updates to its Responsible Scaling Policy (RSP), introducing refined capability thresholds and ASL Standards to manage catastrophic risks from frontier AI systems.

Key quotes

we will not train or deploy models unless we have implemented safety and security measures that keep risks below acceptable levels.

safeguards that scale with potential risks.

If a model can meaningfully assist someone with a basic technical background in creating or deploying CBRN weapons, we require enhanced security and deployment safeguards (ASL-3 standards).

The post outlines the evolution of Anthropic’s risk governance framework, focusing on AI Safety Level (ASL) standards. It specifically identifies autonomous AI R&D and CBRN weapons assistance as key capability thresholds.