HomeCryptocurrency NewsPrompt injection hijacks AI chatbots

Prompt injection hijacks AI chatbots

Author GNcrypto

Posted: 30 May 2026, 15:05 CET 3 min read

Hidden instructions in emails, web pages and files can trick AI chatbots into following attackers’ commands, a vulnerability OpenAI calls “unlikely to ever be fully solved.”

Prompt injection attacks hide instructions inside ordinary text so an AI model follows an attacker’s command instead of a user’s intent. OpenAI wrote in December 2025 that the problem is “unlikely to ever be fully solved,” and the U.K.’s National Cyber Security Centre warned that large language models are “inherently confusable deputies.” Security groups and researchers report the flaw is widespread and can expose data and systems to abuse.

Large language models read all input as text and predict the next token from the same context window whether the text is a system instruction, a user message or content pulled from a file or web page. That lack of separation lets an attacker place a new instruction inside a document or web page and have the model follow it as if a user had typed the command.

Researchers and security firms have documented direct examples where users typed malicious instructions into chat boxes and the assistants complied. In December 2023 a software engineer altered the behavior of a dealership sales bot and induced it to agree to sell a vehicle for one dollar. In January 2024 a parcel company chatbot produced abusive content and a critical poem after a user prompted it to swear.

More complex attacks hide instructions in content the AI reads automatically. Hidden text, HTML comments, tiny font sizes, metadata, PDFs and code files can contain commands an AI will process even when humans do not see them. DeepMind researchers reported a 32% rise in malicious indirect prompt injections between November 2025 and February 2026 after scanning billions of web pages. Some discovered payloads contained detailed payment instructions meant to be executed by an AI agent with payment access.

Security firm HiddenLayer demonstrated in September 2025 how an injected instruction can spread through a codebase. Its proof-of-concept, called CopyPasta, placed payloads in common files such as LICENSE.txt and README.md. When developers used AI coding assistants that read those files, the assistants copied the malicious text into new files.

Anthropic disclosed a large-scale incident in November 2025 that it described as primarily executed by AI. The company reported an actor it labeled GTG-1002 jailbroke the Claude Code model with prompt injection and attempted intrusions against roughly 30 targets. Anthropic estimated the AI performed 80% to 90% of the operation autonomously after attackers divided the work into thousands of small, individually innocuous requests.

Researchers from multiple labs tested published defenses and found high bypass rates. A late-2025 joint study evaluated a dozen defenses against adaptive attackers and found attackers bypassed them with over 90% success. The U.K. agency said applying SQL-injection-style patches to language models is a category error because the models do not separate commands from data. OpenAI’s chief information security officer described prompt injection as “a frontier, unsolved security problem.”

Security teams and vendors recommend limiting exposure rather than expecting a complete technical fix. Suggested measures include restricting what an AI agent can access, avoiding logged-in access for email or banking, issuing narrow and specific commands, requiring explicit human confirmation before consequential actions, scanning files for hidden comments or formatting tricks, and treating all external content fed into an AI as potentially hostile.

The term “prompt injection” was coined in September 2022. Developers and security teams continue to plan operations around the reality that language models can be confused and that attackers will use hidden text to try to hijack automated assistants. Major AI labs advise keeping human oversight and minimal privileges in place when an AI takes actions on behalf of users.

The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.