Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened

abstract

We test recently released models from frontier companies to see whether progress has been made on their willingness to persuade on harmful topics like radicalization and child sexual abuse. We find that OpenAI’s GPT and Anthropic’s Claude models are trending in the right direction, with near zero compliance on extreme topics. But Google’s Gemini 3 Pro complies with almost any persuasion request in our evaluation, without jailbreaking.