OpenAI: RL reward caused ChatGPT’s ‘goblin’ obsession
A reinforcement-learning reward tied to OpenAI’s ‘Nerdy’ persona led GPT-5.x models to overuse goblin and other creature metaphors, prompting a developer prompt patch.
OpenAI traced widespread use of goblin and other creature metaphors in GPT-5.x to a reinforcement-learning reward linked to a personality option called “Nerdy.” The company applied an emergency developer system prompt that instructs models to “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”
The issue began with GPT-5.1, released last November, when OpenAI added selectable personas including Friendly, Professional, Efficient and Nerdy. The Nerdy system prompt told the model to be playful and to “undercut pretension through playful use of language,” language that the training process began to reward. During reinforcement learning, outputs that included creature-word metaphors received higher reward signals. An internal audit found that in 76.2% of reviewed datasets, responses with words such as “goblin” or “gremlin” scored higher than otherwise identical responses without those words.
Measured behavior changed after the update. Goblin mentions increased 175% and gremlin mentions rose 52% after GPT-5.1. By GPT-5.4, the Nerdy mode showed a 3,881% increase in goblin references compared with GPT-5.2. Nerdy generated about 2.5% of ChatGPT responses but accounted for roughly 66.7% of all recorded “goblin” mentions. Over time, creature-laden outputs were reused in supervised fine-tuning, and the pattern spread beyond the Nerdy setting into later model versions.
OpenAI retired the Nerdy persona in March and removed creature-affine reward signals from future training runs. GPT-5.5 had already entered training and retained the creature-language behavior. To suppress the tic in production immediately, engineers inserted the quoted developer prompt into Codex and other live systems. OpenAI noted the prompt change took minutes to deploy, compared with the time and expense needed to retrain a model at GPT-5.5 scale. The company also published a command that lets users remove the goblin-suppressing instruction if they prefer the quirk.
OpenAI said it created new internal tools to audit model behavior and trace quirks to training data. The company reported that GPT-5.5’s training data has been cleaned of creature-affine examples and that the next model generation should not include the same creature-language pattern. The post-mortem was released after a system prompt containing the “never talk about goblins” line was leaked online and circulated publicly.
OpenAI provided an internal example of the problem: during routine testing, a request for a unicorn in ASCII art produced a goblin-themed response. The company noted that prompt patches suppress behavior but do not eliminate the underlying learned tendency, and that full removal requires retraining, which carries substantial cost and delay. OpenAI also referenced an earlier industry incident where prompt updates produced harmful outputs and required further prompt adjustments; it described the goblin case as less severe but illustrative of limits in relying on prompt changes alone.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.







