BETA RELEASE

Summary

Evaluation of the feasibility, performance, and energy impact of deploying Large Language Models (LLMs) on mobile devices and local edge hardware using the BLaDE benchmarking infrastructure.

Key quotes

Tractability of the LLM inference workload does not imply deployability.
deploying LLMs on device is possible, but with noticeable impact latency, comfort and accuracy, especially on mid-tier devices.

The post describes the MELT (Mobile Evaluation of Language Transformers) framework used to measure latency, accuracy, and energy consumption across iOS and Android devices. It compares on-device execution with offloading to local edge devices like Nvidia Jetson boards.