A Survey on the Real Power of ChatGPT — arXiv

Summary

A survey evaluating ChatGPT's performance across seven NLP task categories, social implications, safety issues, and performance degradation over time, comparing it against fine-tuned models.

Key quotes

ChatGPT’s performance tends to be good in the zero and few shot settings, but still under-perform the fine tuned models.

the performance of ChatGPT degrades with time.

it is not clear whether LLMs are marking predictions based on true reasoning or heuristics.

The report analyzes the capabilities of ChatGPT in various NLP domains including classification, generation, and reasoning. It highlights the disparity between human preference for fluent outputs and actual factual accuracy or performance compared to supervised models.