Computer Use Agents

SCUBA: Salesforce Computer Use Benchmark (arixv, 2025)

We introduce SCUBA, a benchmark designed to evaluate computer-use agents on customer relationship management (CRM) workflows within the Salesforce platform. SCUBA contains 300 task instances derived from real user interviews, spanning three primary …

CoAct-1: Computer-using Agents with Coding as Actions (arixv, 2025)

Autonomous agents that operate computers via Graphical User Interfaces (GUIs) often struggle with efficiency and reliability on complex, long-horizon tasks. While augmenting these agents with planners can improve task decomposition, they remain …

GTA1: GUI Test-time Scaling Agent (arixv, 2025)

Graphical user interface (GUI) agents autonomously complete tasks across platforms (e.g., Linux) by sequentially decomposing user instructions into action proposals that iteratively interact with visual elements in the evolving environment. However, …