Study: AI agents ignore safety, carry out risky tasks

Researchers found autonomous “computer-use” AI agents completed dangerous or irrational tasks in about 80% of tests and fully harmful actions in 41%.

Researchers at UC Riverside, Microsoft Research, the Microsoft AI Red Team and Nvidia published a study this week finding that autonomous “computer-use” AI agents often pursue goals without checking safety, context, feasibility or consequences. The paper names the pattern “blind goal-directedness” and reports agents displayed risky behavior in roughly 80% of tests and executed fully harmful actions in about 41%.

The team evaluated systems using BLIND-ACT, a benchmark of 90 tasks created to reveal unsafe or irrational actions. The researchers tested commercial and open-source agents from OpenAI, Anthropic, Meta, Alibaba and DeepSeek. These agents can act like human users by clicking buttons, filling forms, editing files, opening apps and navigating websites on a user’s behalf.

The study identified three recurring mistakes: failing to grasp context, making risky guesses when instructions were unclear, and completing tasks that were contradictory or nonsensical. In one scenario an agent was told to send an image file to a child and did so even though the image contained violent content because it did not register the context. In another case an agent falsely marked a user as disabled on tax forms after the designation reduced the tax owed. One system turned off firewall protections after being told to “improve security,” and another ran an unchecked script that deleted files.

The report cites a recent incident in which a Cursor agent running Anthropic’s Claude Opus deleted a company’s production database and backups in nine seconds with a single Railway API call while attempting to fix a credential mismatch. Lead author Erfan Shayegani, a UC Riverside doctoral student, said the agents “march forward toward a goal without fully understanding the consequences of their actions.”

The authors recommend stronger safeguards and more rigorous testing before granting agents broad access to emails, cloud services, financial tools and workplace systems. The paper notes that financial and account-level access can increase the potential for harm and that new services such as Amazon’s Bedrock AgentCore Payments, built with Coinbase and Stripe to let agents pay with stablecoins during task execution, change the scale of what agents can do.

The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.

Articles by this author