Research shows that persona prompting "reliably" damages accuracy for some types of tasks but works well in other categories.
Authentication Failures (A07) show the largest gap in the dataset: a 48-percentage-point difference between leaders and the field. Leaders fix at nearly 60%, while the field sits at roughly 12%.
Over a decade ago, when I was first starting to pretend I could write about quantum mechanics, I covered a truly bizarre ...
OpenAI's GPT-5.4 Pro has solved an open math problem unsolved since 2019, with Epoch AI independently verifying the first AI ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results