Playwright 浏览器工具
适用版本:4.0.8.1+
Playwright 是 Agently 的内置浏览器工具,适合处理“仅靠 HTTP 抓取不够”的页面:
- 需要执行前端 JS 后再读取内容
- 需要拿到页面标题、最终跳转 URL、状态码
- 需要可选截图或链接提取
1. 初始化参数
python
from agently.builtins.tools import Playwright
playwright = Playwright(
headless=True,
timeout=30000,
proxy=None,
user_agent=None,
response_mode="markdown", # "markdown" | "text"
max_content_length=8000,
include_links=False,
max_links=120,
screenshot_path=None,
)核心参数:
response_mode:markdown会把<a>转成 markdown 链接;text返回纯文本include_links:是否额外返回linksscreenshot_path:设置后会保存全页截图
2. 直接调用
python
import asyncio
from agently.builtins.tools import Playwright
playwright = Playwright(headless=True, response_mode="markdown")
async def main():
result = await playwright.open("https://agently.tech")
print(result)
asyncio.run(main())3. 作为 Agent 工具接入
python
from agently import Agently
from agently.builtins.tools import Playwright
agent = Agently.create_agent()
playwright = Playwright(headless=True, response_mode="markdown")
agent.use_tools([playwright.open])
result = agent.input("先浏览 Agently 官网,再总结 TriggerFlow 的作用").start()
print(result)通过
tool_info_list注册时,工具名为playwright_open。
4. 返回结构(成功)
典型字段:
okrequested_urlnormalized_urlurl(最终 URL)statustitlecontent_formatcontentscreenshot_pathlinks(仅当include_links=True)
失败时返回 ok=False 和 error。
5. 使用建议
- 首次使用前先安装浏览器驱动(
playwright install) - 抓取稳定性优先时建议设置
timeout与proxy - 若后续需要精确读取元素,不建议只依赖
content文本,可结合专用抓取流程