What Is ChatGPT Doing … and Why Does It Work?
Summary
Stephen Wolfram explains the mechanics of LLMs, covering next-word probability, neural network training, embeddings, and the limitations imposed by computational irreducibility.
Key quotes
what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far
there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice
Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there’s a limit to what regularities there may be.
This article provides a high-level conceptual overview of how large language models function, moving from simple n-gram statistics to complex neural networks. It emphasizes the empirical nature of current AI training and the theoretical boundaries of learning.