On Tuesday, Anthropic unveiled substantial upgrades to its already robust Claude 3.5 Sonnet, improving its performance even further. Alongside this update, the startup introduced a lighter version, Claude 3.5 Haiku. The new Sonnet iteration features a public beta option that allows the AI to exert basic control over the computer it operates on.
Claude 3.5 Sonnet has established itself as a top contender for coding tasks, and the latest version reveals noteworthy enhancements across various metrics, consistently outperforming both Gemini 1.5 and GPT-4o on a range of industry tests. Although Gemini 1.5 Pro managed to surpass 3.5 Sonnet in a single evaluation, this was limited to the MATH benchmark.
The new 3.5 Haiku is quite impressive in its own right, despite its smaller size. Scheduled for release later this month, 3.5 Haiku outperforms Claude 3.0 Opus, which was the largest model from the previous generation. This model is also particularly adept in coding tasks, achieving a score of 40.6% on the SWE-bench Verified — outshining both GPT-4o and the first iteration of 3.5 Sonnet.
The upgraded Claude 3.5 Sonnet can now interact with desktop applications using the “Computer Use” API. This enables the AI to generate essential keystrokes, mouse clicks, and movements, mimicking human interaction. Anthropic emphasizes that this system is still quite experimental and may have errors. The public beta release aims to gather feedback from developers to accelerate improvements in the API’s functionality.
“Our aim was to train Claude to understand what’s visible on a screen and effectively use the software tools to complete tasks,” stated Anthropic in a blog post. “When a developer directs Claude to utilize a software application and provides appropriate access, Claude assesses the screenshots displayed to the user, determining how much to move the cursor both vertically and horizontally to click the correct area.”
Essentially, it operates as an AI agent capable of automating various software tasks, ranging from generating and evaluating marketing leads to analyzing medical data trends or simply navigating websites to fill out forms. You can consider this a more sophisticated evolution of existing Robotic Process Automation systems.
Companies such as Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company are among the initial users of this new capability. For instance, Replit is looking to utilize Computer Control to “develop a crucial feature for evaluating applications as they are created for the Replit Agent product,” according to the announcement.
There’s no need to worry about the AI seizing control autonomously; Anthropic insists that human oversight remains crucial. “Users maintain control by providing precise prompts that guide Claude’s actions, such as ‘utilize data from my computer and online to complete this form,’” an Anthropic representative explained to TechCrunch. “Individuals dictate and restrict access as necessary. Claude translates user prompts into computer commands (e.g., cursor movement, clicking, typing) to fulfill those specific tasks.”
However, Anthropic acknowledges that the Computer Control feature could be exploited for actions like generating spam, spreading misinformation, or engaging in fraudulent activities. In response, they have implemented new classifiers designed to detect when the API is used and assess whether that usage could cause harm.