What Is ChatGPT Doing … and Why Does It Work?

Summary

Stephen Wolfram explains the mechanics of LLMs, covering next-word probability, neural network training, embeddings, and the limitations imposed by computational irreducibility.

Key quotes

what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far

there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice

Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there’s a limit to what regularities there may be.

This article provides a high-level conceptual overview of how large language models function, moving from simple n-gram statistics to complex neural networks. It emphasizes the empirical nature of current AI training and the theoretical boundaries of learning.