Comprehensive Analysis of Transparency and Accessibility of ChatGPT, DeepSeek, and other SoTA Large Language Models
Summary
A systematic analysis of over 100 state-of-the-art LLMs, distinguishing between 'open-weight' and truly 'open-source' models based on transparency, licensing, and data availability.
Key quotes
many models marketed or perceived as “open” primarily provided open weights (i.e., publicly available trained parameters) rather than full open-source access
True open-source AI requires full transparency, including training data and development processes, fostering reproducibility and ethical AI advancements.
The study evaluates 112 LLMs from 2019 to 2025 against Open Source Initiative (OSI) standards. It highlights a trend of ‘open-washing,’ where models provide weights but withhold training data and methodologies.