Skip to content

PyAutoGUI 桌面自动化工具

适用版本:4.0.8.1+

PyAutoGUI 通过真实桌面会话控制浏览器,适合“需要模拟用户操作”的场景。

内置 3 个能力:

  • open_url(url):打开 URL
  • read_active_tab():读取当前标签页内容
  • open_and_read_url(url):先打开再读取

1. 初始化参数

python
from agently.builtins.tools import PyAutoGUI

pyauto = PyAutoGUI(
    pause=0.05,
    fail_safe=True,
    new_tab=True,
    wait_seconds=1.5,
    dry_run=True,
    type_interval=0.01,
    open_mode="hotkey",         # "hotkey" | "system"
    activate_browser=False,
    browser_app=None,
    activate_wait_seconds=0.4,
    read_wait_seconds=0.4,
    max_content_length=24000,
    response_mode="markdown",   # "markdown" | "text"
)

最重要的几个参数:

  • dry_run:默认 True,只返回拟执行动作,不实际操作
  • open_mode
    • hotkey:用快捷键操作地址栏
    • system:系统方式打开 URL
  • response_mode:读取标签页时返回 markdown 或 text

2. 直接调用

python
import asyncio
from agently.builtins.tools import PyAutoGUI

pyauto = PyAutoGUI(dry_run=False, open_mode="hotkey", activate_browser=True)

async def main():
    opened = await pyauto.open_url("https://agently.tech")
    print("OPEN:", opened)

    page = await pyauto.read_active_tab()
    print("READ:", page)

asyncio.run(main())

3. 作为 Agent 工具接入

python
from agently import Agently
from agently.builtins.tools import PyAutoGUI

agent = Agently.create_agent()
pyauto = PyAutoGUI(dry_run=False, open_mode="hotkey", activate_browser=True)

agent.use_tools([
    pyauto.open_url,
    pyauto.read_active_tab,
    pyauto.open_and_read_url,
])

result = agent.input("打开 agently.tech 并读取页面要点").start()
print(result)

通过 tool_info_list 注册时,工具名分别是: pyautogui_open_urlpyautogui_read_active_tabpyautogui_open_and_read_url

4. 平台与权限限制

  • hotkey 模式需要真实 GUI 会话
  • Linux 下若无 DISPLAY 会报错
  • read_active_tab() 当前仅支持 macOS(Darwin)
  • macOS 可能需要在系统隐私设置中授权:
    • Accessibility / Input Monitoring
    • Automation(允许控制浏览器)

5. 建议

  • 先用 dry_run=True 校验动作序列,再切到真实执行
  • 对生产任务,建议加上超时与结果校验(例如 URL/标题/关键字段)
  • PyAutoGUI 作为兜底工具,常规场景优先使用 Playwright