CLI tool for AI agents to control Windows Remote Desktop sessions
npm install agent-rdpA CLI tool for AI agents to control Windows Remote Desktop sessions, built on IronRDP.
Claude Code automating SQLite database and table creation via RDP:
https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421
- Connect to RDP servers - Full RDP protocol support with TLS and CredSSP authentication
- Take screenshots - Capture the remote desktop as PNG or JPEG
- Mouse control - Click, double-click, right-click, drag, scroll
- Keyboard input - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.)
- Clipboard sync - Copy/paste text between local machine and remote Windows
- Drive mapping - Map local directories as network drives on the remote machine
- UI Automation - Interact with Windows applications via accessibility API (click, select, toggle, expand)
- OCR text location - Find text on screen using OCR when UI Automation isn't available
- JSON output - Structured output for AI agent consumption
- Session management - Multiple named sessions with automatic daemon lifecycle
``bash`
npm install -g agent-rdp
`bash`
npx add-skill https://github.com/thisnick/agent-rdp
`bash`
git clone https://github.com/thisnick/agent-rdp
cd agent-rdp
pnpm install
pnpm build # Build native binary
pnpm build:ts # Build TypeScript
`bashUsing command line (password visible in process list - not recommended)
agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret'
$3
`bash
Save to file
agent-rdp screenshot --output desktop.pngOutput as base64 (for AI agents)
agent-rdp screenshot --base64With JSON output
agent-rdp --json screenshot --base64
`$3
`bash
Click at position
agent-rdp mouse click 500 300Right-click
agent-rdp mouse right-click 500 300Double-click
agent-rdp mouse double-click 500 300Move cursor
agent-rdp mouse move 100 200Drag from (100,100) to (500,500)
agent-rdp mouse drag 100 100 500 500
`$3
`bash
Type text (supports Unicode)
agent-rdp keyboard type "Hello, World!"Press key combinations
agent-rdp keyboard press "ctrl+c"
agent-rdp keyboard press "alt+tab"
agent-rdp keyboard press "ctrl+shift+esc"Press single keys (use press command)
agent-rdp keyboard press enter
agent-rdp keyboard press escape
agent-rdp keyboard press f5
`$3
`bash
agent-rdp scroll up --amount 3
agent-rdp scroll down --amount 5
agent-rdp scroll left
agent-rdp scroll right
`$3
Find text on screen using OCR (powered by ocrs). Useful when UI Automation can't access certain elements (WebView content, some dialogs).
`bash
Find lines containing text
agent-rdp locate "Cancel"Pattern matching (glob-style)
agent-rdp locate "Save*" --patternGet all text on screen
agent-rdp locate --allJSON output
agent-rdp locate "OK" --json
`Returns text lines with coordinates for clicking:
`
Found 1 line(s) containing 'Cancel':
'Cancel Button' at (650, 420) size 80x14 - center: (690, 427)To click the first match: agent-rdp mouse click 690 427
`$3
`bash
Set clipboard text (available when you paste on Windows)
agent-rdp clipboard set "Hello from CLI"Get clipboard text (after copying on Windows)
agent-rdp clipboard getWith JSON output
agent-rdp --json clipboard get
`$3
Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified.
`bash
Map local directories during connection
agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \
--drive /home/user/documents:Documents \
--drive /tmp/shared:SharedList mapped drives
agent-rdp drive list
`On the remote Windows machine, mapped drives appear in File Explorer as network locations.
$3
Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC.
For detailed documentation, see AUTOMATION.md.
`bash
Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automationTake an accessibility tree snapshot (refs are always included)
agent-rdp automate snapshotSnapshot filtering options (like agent-browser)
agent-rdp automate snapshot -i # Interactive elements only
agent-rdp automate snapshot -c # Compact (remove empty structural elements)
agent-rdp automate snapshot -d 3 # Limit depth to 3 levels
agent-rdp automate snapshot -s "~Notepad" # Scope to a window/element
agent-rdp automate snapshot -i -c -d 5 # Combine optionsPattern-based element operations (refs use @eN format)
agent-rdp automate click "#SaveButton" # Click button
agent-rdp automate click "@e5" # Click by ref number from snapshot
agent-rdp automate click "@e5" -d # Double-click (for file list items)
agent-rdp automate select "@e10" # Select item (SelectionItemPattern)
agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern)
agent-rdp automate expand "@e3" # Expand menu (ExpandCollapsePattern)
agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10)Fill text fields
agent-rdp automate fill ".Edit" "Hello World"Window operations
agent-rdp automate window list
agent-rdp automate window focus "~Notepad"Run PowerShell commands
agent-rdp automate run "Get-Process" --wait
agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout
`Selector Types:
-
@e5 or @5 - Reference number from snapshot (e prefix recommended)
- #SaveButton - Automation ID
- .Edit - Win32 class name
- ~pattern - Wildcard name match
- File - Element name (exact match)Snapshot Output Format:
`
- Window "Notepad" [ref=e1, id=Notepad]
- MenuBar "Application" [ref=e2]
- MenuItem "File" [ref=e3]
- Edit "Text Editor" [ref=e5, value="Hello"]
`$3
`bash
List active sessions
agent-rdp session listGet current session info
agent-rdp session infoClose a session
agent-rdp session closeUse a named session
agent-rdp --session work connect --host work-pc.local ...
agent-rdp --session work screenshot
`$3
`bash
agent-rdp disconnect
`$3
Open the web-based viewer to see the remote desktop in your browser:
`bash
Open viewer (connects to default streaming port 9224)
agent-rdp viewSpecify a different port
agent-rdp view --port 9224
`The viewer requires WebSocket streaming to be enabled. Start a session with streaming:
`bash
agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
agent-rdp view
`JSON Output
All commands support
--json for structured output:`bash
agent-rdp --json screenshot --base64
`Success response:
`json
{
"success": true,
"data": {
"type": "screenshot",
"width": 1920,
"height": 1080,
"format": "png",
"base64": "iVBORw0KGgo..."
}
}
`Error response:
`json
{
"success": false,
"error": {
"code": "not_connected",
"message": "Not connected to an RDP server"
}
}
`Environment Variables
| Variable | Description |
|----------|-------------|
|
AGENT_RDP_HOST | RDP server hostname or IP |
| AGENT_RDP_PORT | RDP server port (default: 3389) |
| AGENT_RDP_USERNAME | RDP username |
| AGENT_RDP_PASSWORD | RDP password |
| AGENT_RDP_SESSION | Session name (default: "default") |
| AGENT_RDP_STREAM_PORT | WebSocket streaming port (0 = disabled) |Node.js API
Use agent-rdp programmatically from Node.js/TypeScript:
`typescript
import { RdpSession } from 'agent-rdp';const rdp = new RdpSession({ session: 'default' });
await rdp.connect({
host: '192.168.1.100',
username: 'Administrator',
password: 'secret',
width: 1280,
height: 800,
drives: [{ path: '/tmp/share', name: 'Share' }],
enableWinAutomation: true, // Enable UI Automation
});
// Screenshot
const { base64, width, height } = await rdp.screenshot({ format: 'png' });
// Mouse
await rdp.mouse.click({ x: 100, y: 200 });
await rdp.mouse.rightClick({ x: 100, y: 200 });
await rdp.mouse.doubleClick({ x: 100, y: 200 });
await rdp.mouse.move({ x: 150, y: 250 });
await rdp.mouse.drag({ from: { x: 100, y: 100 }, to: { x: 500, y: 500 } });
// Keyboard
await rdp.keyboard.type({ text: 'Hello World' });
await rdp.keyboard.press({ keys: 'ctrl+c' });
await rdp.keyboard.press({ keys: 'enter' }); // Single keys use press()
// Scroll
await rdp.scroll.up(); // Default amount: 3
await rdp.scroll.down({ amount: 5 }); // Custom amount
await rdp.scroll.up({ x: 500, y: 300 }); // Scroll at position
// Clipboard
await rdp.clipboard.set({ text: 'text to copy' });
const text = await rdp.clipboard.get();
// Locate text using OCR
const matches = await rdp.locate({ text: 'Cancel' });
if (matches.length > 0) {
await rdp.mouse.click({ x: matches[0].center_x, y: matches[0].center_y });
}
// Get all text on screen
const allText = await rdp.locate({ all: true });
// Automation (requires --enable-win-automation at connect)
const snapshot = await rdp.automation.snapshot({ interactive: true });
await rdp.automation.click('@e5'); // Click button by ref
await rdp.automation.click('@e5', { doubleClick: true }); // Double-click
await rdp.automation.select('@e10'); // Select item
await rdp.automation.toggle('@e7'); // Toggle checkbox
await rdp.automation.expand('@e3'); // Expand menu
await rdp.automation.contextMenu('@e5'); // Open context menu
await rdp.automation.fill('#input', 'text'); // Fill text field
await rdp.automation.run('notepad.exe'); // Run command
await rdp.automation.waitFor('#SaveButton', { timeout: 5000 });
// Window management
const windows = await rdp.automation.listWindows();
await rdp.automation.focusWindow('~Notepad');
await rdp.automation.maximizeWindow();
// Drives
const drives = await rdp.drives.list();
// Session info
const info = await rdp.getInfo();
// Disconnect
await rdp.disconnect();
`$3
Enable WebSocket streaming for real-time screen capture and bidirectional clipboard support:
`typescript
const rdp = new RdpSession({
session: 'viewer',
streamPort: 9224, // Enable streaming
});await rdp.connect({...});
// Connect your WebSocket client to receive JPEG frames
const streamUrl = rdp.getStreamUrl(); // "ws://localhost:9224"
`For the complete WebSocket protocol specification (message types, clipboard flow, input handling), see WEBSOCKET.md.
Architecture
agent-rdp uses a daemon-per-session architecture:
1. CLI (
agent-rdp) - Parses commands and communicates with the daemon
2. Daemon - Maintains the RDP connection and processes commands
3. IPC - Unix sockets (macOS/Linux) or TCP (Windows)The daemon is automatically started on the first command and persists until explicitly closed or the session times out.
Limitations
$3
- WebViews: UI Automation cannot interact with WebView content (e.g., Windows Start menu search, Edge browser content, Electron apps). Use
Win+R or automate run to launch programs directly instead of clicking through menus.
- UAC Dialogs: User Account Control elevation prompts run on a secure desktop and are not accessible via UI Automation. There is no good workaround - the remote user must interact with UAC manually, or UAC must be disabled (not recommended for security reasons).$3
When UI Automation cannot access certain elements, the
locate command provides OCR-based text detection:`bash
agent-rdp locate "Button Text" # Find text and get coordinates
agent-rdp mouse click # Click at returned coordinates
``This is not highly reliable (OCR can misread characters, miss text, or return imprecise coordinates), but may work for simple cases like dialog buttons.
Claude models (in non-computer-use mode, such as Claude Code) are poor at estimating pixel coordinates from screenshots. Do not ask Claude to look at a screenshot and guess where to click - it will likely be inaccurate.
Gemini models are generally good at pixel coordinate estimation from images.
If you need vision-based coordinate detection with Claude, implement your own harness using Claude's Computer Use Tool which is specifically designed for this purpose.
- Rust 1.75 or later
- Target RDP server with Network Level Authentication (NLA) enabled
MIT OR Apache-2.0 (same as IronRDP)