PyAutoGUI 桌面自动化工具
适用版本:4.0.8.1+
PyAutoGUI 通过真实桌面会话控制浏览器,适合“需要模拟用户操作”的场景。
内置 3 个能力:
open_url(url):打开 URLread_active_tab():读取当前标签页内容open_and_read_url(url):先打开再读取
1. 初始化参数
python
from agently.builtins.tools import PyAutoGUI
pyauto = PyAutoGUI(
pause=0.05,
fail_safe=True,
new_tab=True,
wait_seconds=1.5,
dry_run=True,
type_interval=0.01,
open_mode="hotkey", # "hotkey" | "system"
activate_browser=False,
browser_app=None,
activate_wait_seconds=0.4,
read_wait_seconds=0.4,
max_content_length=24000,
response_mode="markdown", # "markdown" | "text"
)最重要的几个参数:
dry_run:默认True,只返回拟执行动作,不实际操作open_mode:hotkey:用快捷键操作地址栏system:系统方式打开 URL
response_mode:读取标签页时返回 markdown 或 text
2. 直接调用
python
import asyncio
from agently.builtins.tools import PyAutoGUI
pyauto = PyAutoGUI(dry_run=False, open_mode="hotkey", activate_browser=True)
async def main():
opened = await pyauto.open_url("https://agently.tech")
print("OPEN:", opened)
page = await pyauto.read_active_tab()
print("READ:", page)
asyncio.run(main())3. 作为 Agent 工具接入
python
from agently import Agently
from agently.builtins.tools import PyAutoGUI
agent = Agently.create_agent()
pyauto = PyAutoGUI(dry_run=False, open_mode="hotkey", activate_browser=True)
agent.use_tools([
pyauto.open_url,
pyauto.read_active_tab,
pyauto.open_and_read_url,
])
result = agent.input("打开 agently.tech 并读取页面要点").start()
print(result)通过
tool_info_list注册时,工具名分别是:pyautogui_open_url、pyautogui_read_active_tab、pyautogui_open_and_read_url。
4. 平台与权限限制
hotkey模式需要真实 GUI 会话- Linux 下若无
DISPLAY会报错 read_active_tab()当前仅支持 macOS(Darwin)- macOS 可能需要在系统隐私设置中授权:
- Accessibility / Input Monitoring
- Automation(允许控制浏览器)
5. 建议
- 先用
dry_run=True校验动作序列,再切到真实执行 - 对生产任务,建议加上超时与结果校验(例如 URL/标题/关键字段)
- 将
PyAutoGUI作为兜底工具,常规场景优先使用Playwright