Anthropic Claude Opus 4.7 release: How to test, benchmarks, security

Anthropic has been shipping products and making news at a rapid pace in 2026, and on Thursday, the AI company announced the launch of Claude Opus 4.7.
Claude Opus 4.7 is the most intelligent Anthropic model available to the general public. Notably, Anthropic said in a press release that Opus 4.7 not as powerful as Claude Mythos, which Anthropic deemed too dangerous to release publicly.
Claude Opus is a family of hybrid reasoning models capable of multistep reasoning and advanced coding. Until the announcement of the Claude Mythos on April 7, the Claude Opus was considered the most advanced Anthropic series of AI models.
Don’t miss our latest news: Add Mashable as a trusted news source on Google.
How to try Claude Opus 4.7
Claude Opus 4.7 is available now through Claude AI, Claude API, and Anthropic partners like Microsoft Foundry. The new model has the same price as the Claude Opus 4.6.
Anthropic makes case for anthropomorphizing AI in ‘inconclusive’ research paper
However, Anthropic noted that because “Opus 4.7 thinks more at higher effort levels,” it uses more output tokens than its predecessor. Users can learn more about how to optimize token usage in the Opus 4.7 migration guide.
How does Claude Opus 4.7 improve over 4.6
As expected, Claude Opus 4.7 offers improved capabilities across the board.
In particular, Anthropic says that Claude Opus 4.7 is better for advanced coding tasks, artificial intelligence, and document analysis. Anthropic also says Opus 4.7 is “fun and intuitive when completing professional tasks, producing high-quality communications, slides, and documents.”
Mashable Light Speed
“Users report being able to delegate their most difficult coding work – the kind that required close supervision – to Opus 4.7 with confidence. Opus 4.7 handles complex, long-lasting tasks with robustness and consistency, pays close attention to instructions, and plans ways to verify its results before reporting back,” reads Anthropic’s blog post.
Claude Opus 4.7: Benchmark performance
Anthropic released a model card detailing how Claude Opus 4.7 compares to other Anthropic models and frontier models from OpenAI, Google, and xAI.
Opus 4.7 still lags behind the unreleased Claude Mythos, which Anthropic reports scores the highest in popular benchmarks like Humanity’s Last Exam. “Claude Opus 4.7 is less powerful than the Claude Mythos preview on every relevant axis we measured and does not improve our power threshold,” the model card says.
The AI industry has a serious Chicken Little problem
Opened Humanity’s Final Test (excluding tools), Anthropic reports that the Claude Opus 4.7 outperforms all other frontier models except the Claude Mythos.
-
Claude Mythos got 56.8 percent in HLE
-
Claude Opus 4.7 scored 46.9 percent
-
The Gemini 3.1 Pro scored 44.4 percent
-
GPT-5-4 Pro scored 42.7 percent
-
Claude Opus 4.6 scored 40.0 percent
For instruments, GPT-5-4-Pro scored 58.7 percent compared to Opus 4.7’s 54.7 percent. Mythos beat them both by 64.7 percent.
Mashable has not independently verified these benchmark results. Full results are available on the Opus 4.7 card model.
Credit: Anthropic
Overall, Anthropic scored the Opus 4.7 above the other top models in some benchmarks, though the Gemini 3.1 Pro and GPT-5-4 scored higher in other areas.
Claude Opus 4.7: Safety and hallucinations
Anthropic also reports that Opus 4.7 shows a low risk of malicious behavior, with a similar risk profile as Opus 4.6.
For example, Anthropic says that Opus 4.7 is less likely to detect missing objects and shows lower rates of reward hacking.
“Claude Opus 4.7 is more reliable than Opus 4.6 or Sonnet 4.6, with a significant reduction in the rate of key omissions, and a moderate improvement in the accuracy and rates of forgotten entries,” the model card says.
Want to learn more about getting the most out of your technology? Sign up for Mashable’s Top Stories and Deals newsletters today.



