GUI agent for web applications - add intelligent automation to any webpage with a single script
npm install @page-agent/core

    
The GUI Agent Living in Your Webpage. Control web interfaces with natural language.
π English | δΈζ
π π Demo | π Documentation
---
- π― Easy Integration
- No python. No headless browser. No browser extension. Just in-page scripts.
- π Client-Side Processing
- π§ DOM Extraction
- π¬ Natural Language Interface
- π¨ UI with Human in the loop
And π
- π§ͺ cross-page control with an experimental chrome extension - packages/extension
π πΊοΈ Roadmap
Fastest way to try PageAgent with our free Demo LLM:
``html`
src="https://cdn.jsdelivr.net/npm/page-agent@1.1.1/dist/iife/page-agent.demo.js"
crossorigin="true"
>
> - β οΈ For technical evaluation only. Demo LLM has rate limits and usage restrictions. May change without notice.
> - π· Bring your own LLM API.
| Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.1.1/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.1.1/files/dist/iife/page-agent.demo.js |
`bash`
npm install page-agent
`javascript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'deepseek-chat',
baseURL: 'https://api.deepseek.com',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
`
PageAgent adopts a simplified monorepo structure:
``
packages/
βββ core/ # Core agent logic without UI(npm: @page-agent/core)
βββ page-agent/ # Exported agent and demo(npm: page-agent)
βββ llms/ # LLM client (npm: @page-agent/llms)
βββ page-controller/ # DOM operations & Visual Mask (npm: @page-agent/page-controller)
βββ ui/ # Panel & i18n (npm: @page-agent/ui)
βββ website/ # Demo & Documentation site
We welcome contributions from the community! Follow our instructions in CONTRIBUTING.md for environment setup and local development.
Please read Code of Conduct before contributing.
This project builds upon the excellent work of browser-use.
PageAgent is designed for client-side web enhancement, not server-side automation.
`
DOM processing components and prompt are derived from browser-use:
Browser Use
Copyright (c) 2024 Gregor Zunic
Licensed under the MIT License
Original browser-use project:
We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make
this project possible.
Third-party dependencies and their licenses can be found in the package.json
file and in the node_modules directory after installation.
``
---
β Star this repo if you find PageAgent helpful!