BETA RELEASE

Summary

Analysis estimating that a typical GPT-4o query consumes roughly 0.3 watt-hours, challenging previous estimates of 3 watt-hours based on updated hardware and usage assumptions.

Key quotes

We find that typical ChatGPT queries using GPT-4o likely consume roughly 0.3 watt-hours, which is ten times less than the older estimate.
This difference comes from more efficient models and hardware compared to early 2023, and an overly pessimistic estimate of token counts in the original estimate.

The article provides a detailed breakdown of the compute and energy costs associated with LLM inference, specifically using GPT-4o and Nvidia H100 GPUs as reference points. It also discusses the impact of input length and the differences in energy consumption between standard and reasoning models.