AI Data Transparency: an Exploration Through the Lens of AI Incidents

Summary

An exploration of AI data transparency across systems associated with public incidents, finding persistent low levels of public documentation regarding training data and its curation.

Key quotes

low data transparency persists across a wide range of systems

the presence of model or system transparency documentation does not necessarily lead to presence of all desired transparency information

The research utilizes the AI Incidents Database (AIID) and a search protocol from the Stanford Foundation Model Transparency Index to analyze 54 AI systems. It highlights a ‘hierarchy of transparency’ where generative AI systems are more documented than autonomous driving or facial recognition systems.