Midscene.js Computer Desktop Automation
npm install @midscene/computerMidscene.js Computer Desktop Automation - AI-powered desktop automation for Windows, macOS, and Linux.
- š„ļø Desktop Automation: Control mouse, keyboard, and screen
- šø Screenshot Capture: Take screenshots of any display
- š±ļø Mouse Operations: Click, double-click, right-click, hover, drag & drop
- āØļø Keyboard Input: Type text, press keys, shortcuts
- š Scroll Operations: Scroll in any direction
- š¼ļø Multi-Display Support: Work with multiple monitors
- š¤ AI-Powered: Use natural language to control your desktop
- š MCP Server: Expose capabilities via Model Context Protocol
``bash`
npm install @midscene/computeror
pnpm add @midscene/computer
This package uses native modules for desktop control:
- screenshot-desktop: For capturing screenshots@computer-use/libnut
- : For mouse and keyboard control
These modules require compilation on installation. Make sure you have the necessary build tools:
macOS: Install Xcode Command Line Tools
`bash`
xcode-select --install
Linux: Install build essentials and ImageMagick
`bashUbuntu/Debian
sudo apt-get install build-essential libx11-dev libxtst-dev libpng-dev imagemagick
Note: ImageMagick is required for screenshot capture on Linux.
Windows: Install Windows Build Tools
`bash
npm install --global windows-build-tools
`Quick Start
$3
`typescript
import { agentFromComputer } from '@midscene/computer';// Create an agent
const agent = await agentFromComputer({
aiActionContext: 'You are controlling a desktop computer.',
});
// Use AI to perform actions
await agent.aiAct('move mouse to center of screen');
await agent.aiAct('click on the desktop');
await agent.aiAct('type "Hello World"');
// Query information
const screenInfo = await agent.aiQuery(
'{width: number, height: number}, get screen resolution',
);
// Assert conditions
await agent.aiAssert('There is a desktop visible');
`$3
`typescript
import { ComputerDevice, agentFromComputer } from '@midscene/computer';// List all displays
const displays = await ComputerDevice.listDisplays();
console.log('Available displays:', displays);
// Connect to a specific display
const agent = await agentFromComputer({
displayId: displays[0].id,
});
`$3
`typescript
import { checkComputerEnvironment } from '@midscene/computer';const env = await checkComputerEnvironment();
console.log('Platform:', env.platform);
console.log('Available:', env.available);
console.log('Displays:', env.displays);
`Available Actions
The ComputerDevice supports the following actions:
- Tap: Single click at element center
- DoubleClick: Double click at element center
- RightClick: Right click at element center
- Hover: Move mouse to element center
- Input: Type text with different modes (replace/clear/append)
- Scroll: Scroll in any direction (up/down/left/right)
- KeyboardPress: Press keyboard keys with modifiers
- DragAndDrop: Drag from one element to another
- ClearInput: Clear input field content
- ListDisplays: Get all available displays
Platform-Specific Shortcuts
$3
- Modifier key: Cmd (Command)
- Open search: Cmd+Space
- Select all: Cmd+A
- Copy: Cmd+C
- Paste: Cmd+V$3
- Modifier key: Ctrl (Control)
- Open search: Windows key or Super key
- Select all: Ctrl+A
- Copy: Ctrl+C
- Paste: Ctrl+VTesting
$3
`bash
pnpm test
`$3
`bash
Set AI_TEST_TYPE environment variable
AI_TEST_TYPE=computer pnpm test:ai
`Available AI tests:
-
basic.test.ts: Basic desktop interactions
- multi-display.test.ts: Multi-display support
- web-browser.test.ts: Browser automation
- text-editor.test.ts: Text editor operationsMCP Server
Start the MCP server for AI assistant integration:
`typescript
import { mcpServerForAgent } from '@midscene/computer/mcp-server';
import { agentFromComputer } from '@midscene/computer';const agent = await agentFromComputer();
const { server } = mcpServerForAgent(agent);
await server.launch();
`Available MCP tools:
-
computer_connect: Connect to desktop display
- computer_list_displays: List all available displays
- Plus all standard Midscene tools (aiAct, aiQuery, aiAssert, etc.)Architecture
This package follows the same architecture pattern as
@midscene/android and @midscene/ios:`
packages/computer/
āāā src/
ā āāā device.ts # ComputerDevice - core device implementation
ā āāā agent.ts # ComputerAgent - agent wrapper
ā āāā utils.ts # Utility functions
ā āāā mcp-server.ts # MCP server
ā āāā mcp-tools.ts # MCP tools definitions
āāā tests/
ā āāā unit-test/ # Unit tests (no native dependencies)
ā āāā ai/ # AI-powered integration tests
āāā README.md
`API Reference
$3
`typescript
class ComputerDevice implements AbstractInterface {
constructor(options?: ComputerDeviceOpt); static listDisplays(): Promise;
async connect(): Promise;
async screenshotBase64(): Promise;
async size(): Promise;
actionSpace(): DeviceAction[];
async destroy(): Promise;
}
`$3
`typescript
class ComputerAgent extends PageAgent {
// Inherits all PageAgent methods
async aiAct(action: string): Promise;
async aiQuery(query: string): Promise;
async aiAssert(assertion: string): Promise;
async aiWaitFor(condition: string): Promise;
}
`$3
`typescript
async function agentFromComputer(
opts?: ComputerAgentOpt
): Promise;async function checkComputerEnvironment(): Promise;
async function getConnectedDisplays(): Promise;
``MIT
See the main Midscene.js repository for contributing guidelines.