Anthropic launches Claude Opus 4.5 with record score on internal coding test

Anthropic launches Claude Opus 4.5 with record score on internal coding test - GNcrypto

Anthropic introduced Claude Opus 4.5, reporting the model topped every human on its two-hour take-home coding exam.

Anthropic launched Claude Opus 4.5 and reported that the upgraded model outscored every human candidate on its internal two-hour coding exam. The release adds new capabilities for software development, autonomous agents, and spreadsheet and financial analysis.

Opus 4.5 is available in the Claude apps, through the API, and on major cloud platforms. Pricing is $5 per million input tokens and $25 per million output tokens.

Backed by Amazon, Alphabet and Microsoft, Anthropic positions Opus 4.5 as its most capable Claude model to date, with deep reasoning, long-context handling, and computer use across tasks such as modeling, forecasting, and document creation for enterprise workflows.

For software engineering, the firm reported that Opus 4.5 exceeded historical human scores on a take-home performance engineering exam. The test measures technical skill and judgment under time pressure. 

According to Anthropic, the result came from giving the model multiple attempts per problem and selecting the best answer. The company noted that the exam does not measure non-technical skills such as collaboration.

Beyond coding, the model adds features for planning, executing, and refining long-running work. According to Anthropic, agents can store insights from prior sessions, reuse them later, coordinate sub agents for complex projects, and support extended tasks like deep research and slide or spreadsheet workflows.

In one agent benchmark scenario, the model resolved an airline request by upgrading the ticket class before changing flights to comply with policy. The benchmark counted the outcome as a failure because it was not anticipated. Anthropic treats such cases as part of safety testing intended to limit rule gaming.

Anthropic states that Opus 4.5 can reach quality outcomes in fewer steps than prior versions, supported by context compaction and advanced tool use for longer-running agents and large workflows.

On security, Anthropic characterizes Opus 4.5 as its most robustly aligned release to date, citing improved resistance to prompt-injection attacks. The company pointed to independent testing of strong prompt-injection attempts indicating Opus 4.5 was harder to manipulate than other frontier systems.

Developers indicated that customers with access to Opus 4.5 will receive higher usage limits, removing Opus-specific caps for Claude and Claude Code users. Limits are set per model and may change as new systems arrive.

As we reported earlier, Alpha Arena has returned and launched the next season of AI trading, in which eight language models entered real trading on the US stock market, receiving $10,000 for four tasks. Last time, Qwen 3 Max won, showing a 22% return on the crypto market.

The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy, and Disclaimers.

Articles by this author