Open-source AI guardrails removed in minutes, tests find
Tests found public tools can strip safety guardrails from open-source models in under 10 minutes, letting altered versions answer queries on malware, chemical hazards and bioweapons.
Tests published Monday by journalists working with an AI safety group found that publicly available tools can remove safety protections from open-source AI models built by major technology firms in under 10 minutes. Modified versions of those models produced responses to prompts the originals refused, including detailed requests linked to malware, chemical hazards and bioweapons.
The experiments used code from public repositories to change model weights and remove embedded guardrails. The researchers ran the process without specialist hardware. The work focused on models whose weights had been published and were therefore downloadable and editable by third parties.
Researchers noted a distinction between proprietary and open-source systems. Proprietary models remain under developer control and can be restricted through hosting and access limits. Open-source models can be mirrored, modified and redistributed outside the original developers’ oversight, which the testers said makes enforcing safety measures after release more difficult.
The findings were raised in the context of regulatory efforts. Frameworks under development in the European Union and emerging proposals in the United Kingdom and United States set standards for model design and testing. Testers and industry figures said those measures may not prevent harmful uses once models are widely redistributed.
Markus Levin, co-founder of a decentralized infrastructure firm, observed that “control shifts once open models are released” and called attention to the gap between model creation and downstream use. David Minarsch, an AI agent platform executive, argued that regulation aimed at deployment, distribution and real-world use could be more effective than rules focused only on development. Ronghui Gu, chief executive of a blockchain security company, recommended stronger standards for hosted models and checks to detect malicious behavior in third-party tools and autonomous agents before they are deployed.
Experts compared the issue to past experiences with open-source software and cryptocurrency code, where public distribution has made suppression difficult. Potential responses discussed include tighter controls on distribution channels, certification or monitoring of hosted models, and requirements for runtime safety checks in deployed systems, particularly for autonomous agents operating with limited human oversight.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.






