Anthropic's Claude 3.5 Sonnet Outshines GPT-4o in Benchmarks

Anthropic has recently introduced its mid-tier AI model, Claude 3.5 Sonnet outshines GPT-4o, which has quickly established itself as a standout performer, surpassing not only its competitors but also the high-performing Claude 3 Opus in various benchmark evaluations.

Accessible Platforms and Pricing

Claude 3.5 Sonnet is now freely available on Claude.ai and the Claude iOS app. For users requiring higher rate limits, the model is included in the Claude Pro and Team plans. Additionally, it can be accessed through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The pricing for Claude 3.5 Sonnet is competitive, set at $3 per million input tokens and $15 per million output tokens, with an impressive 200K token context window.

Performance and Capabilities

Anthropic claims that Claude 3.5 Sonnet sets new standards across several key areas:

Graduate-level reasoning (GPQA)
Undergraduate-level knowledge (MMLU)
Coding proficiency (HumanEval)

The model excels in understanding nuances, humor, and complex instructions, producing high-quality content with a natural tone. It runs at twice the speed of Claude 3 Opus, making it ideal for complex tasks such as context-sensitive customer support and multi-step workflow orchestration. In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, significantly outperforming Claude 3 Opus, which achieved a 38% success rate.

Vision Capabilities

Claude 3.5 Sonnet also demonstrates enhanced vision capabilities, surpassing Claude 3 Opus in standard vision benchmarks. This improvement is particularly notable in tasks requiring visual reasoning, such as interpreting charts and graphs. The model can accurately transcribe text from imperfect images, a valuable feature for industries like retail, logistics, and financial services.

New Features and Safety Measures

Alongside the model launch, Anthropic has introduced a new feature called Artifacts on Claude.ai. This feature enhances user interaction with the AI, allowing users to view, edit, and build upon Claude’s generated content in real-time, fostering a more collaborative work environment.

Despite the significant advancements, Claude 3.5 Sonnet maintains Anthropic’s commitment to safety and privacy. The models undergo rigorous testing to mitigate misuse, with external experts such as the UK’s AI Safety Institute (UK AISI) and child safety experts at Thorn involved in the process. Anthropic emphasizes its dedication to user privacy, stating that no customer or user-submitted data has been used to train their generative models without explicit permission.

Future Plans

Looking ahead, Anthropic plans to expand the Claude 3.5 model family with the release of Claude 3.5 Haiku and Claude 3.5 Opus later this year. The company is also developing new modalities and features to support more business use cases, including integrations with enterprise applications and a memory feature for more personalized user experiences.

Stay tuned to AIBlock Insider for more insights and updates on the latest developments in blockchain and AI technology.

Author Bio

Sarah Wilson – Author and Content Writer at AiBlock Insider