We introduce SCUBA, a benchmark designed to evaluate computer-use agents on customer relationship management (CRM) workflows within the Salesforce platform. SCUBA contains 300 task instances derived from real user interviews, spanning three primary …
Autonomous agents that operate computers via Graphical User Interfaces (GUIs) often struggle with efficiency and reliability on complex, long-horizon tasks. While augmenting these agents with planners can improve task decomposition, they remain …
Graphical user interface (GUI) agents autonomously complete tasks across platforms (e.g., Linux) by sequentially decomposing user instructions into action proposals that iteratively interact with visual elements in the evolving environment. However, …