Computer Use Tool
Status: Planned
The most powerful tool — gives the daemon direct control over the computer.
Capabilities
- Take screenshots
- Move mouse cursor
- Click (left, right, double)
- Type text
- Press keyboard shortcuts
- Scroll
- Drag and drop
Interface
typescript
interface ComputerUseTool extends Tool { name: "computer_use"; actions: { // Vision screenshot(): Promise<{ image: Buffer; dimensions: Size }>; // Mouse click(x: number, y: number): Promise<void>; doubleClick(x: number, y: number): Promise<void>; rightClick(x: number, y: number): Promise<void>; moveMouse(x: number, y: number): Promise<void>; drag(fromX: number, fromY: number, toX: number, toY: number): Promise<void>; scroll(direction: "up" | "down", amount: number): Promise<void>; // Keyboard type(text: string): Promise<void>; keyPress(key: string, modifiers?: string[]): Promise<void>; }; }
Implementation Options
macOS
- AppleScript - Basic automation
- Accessibility APIs - Full control (requires permissions)
- CGEvent - Low-level input events
Cross-Platform
- nut-tree/nut.js - Node.js native automation
- RobotJS - Older but stable
Visual Understanding
- MLX Grounding Model - Understand what's on screen
- OCR - Extract text from screenshots
Security Considerations
- Requires accessibility permissions on macOS
- Should confirm before sensitive actions
- Log all actions for audit trail
Open Questions
- How to handle multi-monitor setups?
- How to deal with high-DPI/Retina displays?
- Should we support window-specific actions?