type
Post
status
Published
date
Apr 1, 2026
slug
blog-advanced-tool-use
summary
探讨Agent 工具调用痛点(如全量加载MCP与多步执行的上下文膨胀,中间结果污染上下文,多次往返时延高),学习 Anthropic 的 工具调用 解决方案(工具搜索工具实现按需加载,编程式工具调用保护上下文),探讨LLM如何从JSON填空进阶到程序编排,实现更优雅的工具调用实践
tags
开发
category
技术分享
titleIcon
password
icon
insider
🎮
TL;DR
基于Anthropic 博客的进阶工具调用(Advanced Tool Use)功能以及相关工程实践精读。
核心痛点 (Pain Points):Agent 在全量加载工具和多步执行时,极易出现 Context Bloat(上下文膨胀),大量中间结果污染上下文导致模型性能下降,且多次往返调用的延迟与Token 成本极高。
应对策略 (Solutions):引入工具搜索(Tool Search Tool)实现工具按需动态加载;采用编程式工具调用(Programmatic Tool Calling)将数据处理下放至独立的代码沙盒中,以保护推理窗口;提供工具使用示例(Tool Use Examples)解决原生 JSON Schema 的参数格式歧义。
架构演进 (Evolution):将 LLM 从JSON 填空转变为程序编排,将 Context 视为纯粹的逻辑推理工作区,而非临时数据库。

进阶工具调用实践

  • 参考源
Anthropic Just Changed How Agents Call Tools. I Stole It for My Qwen3.5 Agent
👉 Get ALL of our systems & join hundreds of AI builders in our community https://www.theaiautomators.com/?utm_source=youtube&utm_medium=video&utm_campaign=tutorial&utm_content=cc-tool-use 🔗 PRDs: GitHub Repo: https://github.com/theaiautomators/claude-code-agentic-rag-series/tree/main/ep5-advanced-tool-use What if your AI agent could cut its token usage dramatically by searching for tools on demand and writing code to orchestrate dozens of tool calls, instead of letting the LLM handle each one individually? In this video, we implement two advanced tool calling patterns from Anthropic's engineering post into our custom Python and React AI agent: the tool search tool and programmatic tool calling via code execution sandboxes. These aren't Claude-exclusive features. They're universal agent design patterns you can use with any model or framework. We test both approaches side by side, comparing traditional tool calling against programmatic execution, and we run the same task on Claude Haiku and the brand new Qwen 3.5 27B running locally on an RTX 5090. 🔗 Links: GitHub Repo: https://github.com/theaiautomators/claude-code-agentic-rag-series Anthropic Advanced Tool Use Post: https://www.anthropic.com/engineering/advanced-tool-use LLM Sandbox: https://github.com/vndee/llm-sandbox Cloudflare Code Mode : https://blog.cloudflare.com/code-mode/ Episode 4 (Skills & Sandboxes): https://www.youtube.com/watch?v=4Tp6nPZa5is Full codebase available to AI Automators community members 📌 What's covered: - The problem: tool definitions eating context before you even send a message - Tool search: deferring tool loading so the agent discovers what it needs on demand - Programmatic tool calling: letting the LLM write and execute code to orchestrate tool calls in a loop - The tool bridge architecture: how the sandbox calls tools securely without direct internet access or API credentials - Side-by-side comparison of traditional vs programmatic approaches on a budget compliance task (20 team members, 3 tools) - Real results: traditional approach missed answers at 76K tokens; programmatic got them all correct - Running Qwen 3.5 27B locally and comparing it against Claude Haiku - Tool use examples: multi-shot prompting for complex parameter handling (72% to 90% accuracy improvement) - When to use each pattern: context bloat, intermediate result pollution, or parameter accuracy issues 🔍 Tech stack: - Python backend / React frontend - LLM Sandbox (Docker-based isolated code execution) - Langfuse (observability and token tracking) - Qwen 3.5 27B via Ollama (local, RTX 5090) vs Claude Haiku via OpenRouter - Tool bridge pattern for secure sandbox-to-API communication - gVisor (recommended for production sandbox hardening) Key takeaway: Traditional tool calling breaks down as you scale to dozens of tools and hundreds of data points. Tool search keeps your context lean, and programmatic tool calling lets the agent write efficient loops instead of burning tokens on repetitive LLM round trips. These are model-agnostic patterns you can build into any agent system today. 📌 This is Episode 5 of our AI Builder series where we're building a full AI agent web app from scratch using Claude Code. ⏱️ Timestamps: 00:00 Intro & Tool Search 03:56 Programmatic Tool Calling 10:19 Architecture Overview 15:47 Tool Use Examples #AI #ToolCalling #AIAgents #CodeExecution #Docker #Sandboxes #ClaudeCode #AgenticRAG #LLM #MCP #Qwen #Anthropic
Anthropic Just Changed How Agents Call Tools. I Stole It for My Qwen3.5 Agent

解决的痛点

痛点一:预加载导致的 Context Bloat

情景:往Agent里将所有可能用到的工具定义(Tool Definitions)在一开始就全量打包塞进 System Prompt 中。从而导致:
  • 快速膨胀的 Token 消耗:如果集成多个 MCP (Model Context Protocol) 服务器时,上下文会被迅速吞噬。例如 GitHub 包含 35 个工具(约占 26K tokens,以Anthropic的为基准,后续版本已经轻量化)、Slack 的 11 个工具(约占 21K tokens)等,58个工具就能在对话开始前耗尽 55K 的上下文。
    • notion image
    • yt实测60个工具,一个hello跑13k
    • notion image
      notion image
      (图源均为yt视频)
  • Wrong Tool 问题:面对庞杂的工具库,模型不仅注意力严重分散,还容易在名称相似的工具中发生“幻觉”(如选错 notification-send-user 和 notification-send-channel)。
But token cost isn't the only issue. The most common failures are wrong tool selection and incorrect parameters, especially when tools have similar names like notification-send-user vs. notification-send-channel.

痛点二:中间结果堆积与无效流转

除了工具定义会大量占用,作为多次来回的中间结果(Intermediate Results)同样会占据相当一部分上下文。
notion image
(图源yt视频)
  • 业务场景:Q3 差旅预算合规性检查。系统提供了 get_team_members, get_expenses, get_budget_by_level 三个工具,目标是找出超支的员工。
  • 传统模式的低效:为了排查 20 个员工的数据,Agent 会陷入“单次 API 调用 -> 返回结果 -> 再次调用”的循环。按yt的测试,haiku4.5执行 56 次工具调用,烧掉 76,000+ tokens,耗时 29s 结果甚至还漏了一个
notion image
notion image
notion image
notion image

工具搜索工具 (Tool Search Tool)

Anthropic 给出的第一个解法是:推迟加载(Deferred loading),按需发现。实现层面,在API调用上改defer_loading为True,TOOLS照传
The Tool Search Tool lets Claude dynamically discover tools instead of loading all definitions upfront. You provide all your tool definitions to the API, but mark tools with defer_loading: true to make them discoverable on-demand. Deferred tools aren't loaded into Claude's context initially. Claude only sees the Tool Search Tool itself plus any tools with defer_loading: false (your most critical, frequently-used tools).
  • yt测试,hello场景(此处暂时没有工具调用,接着往下看),降低至6.3k
notion image
  • 一个叫tool_search的tool
notion image
  • 需求里涉及调用github mcp中的一项list_commits的工具,模型自己通过tool_search工具调用发现与载入(其完整tool definition)需要的工具
notion image
  • 执行此工具
notion image
  • 检查可以看到对应工具的tool definition
notion image
  • 第二次不用重复加载
notion image
  • 观察tool_search前后加载的工具,验证按需加载
notion image
notion image
A社效果图
notion image
A社效果描述
  • Jira、Slack 等无关工具不需要堆入上下文Context,消耗从传统的 ~77K 骤降至 ~8.7K,同时排除干扰项,工具选择准确率大幅提升(Opus 4.5 在测试中从 79.5% 跃升至 88.1%,Opus 4 49% → 74%)。

编程式工具调用 (Programmatic Tool Calling)

这一块需要审慎看待一下,通过代码生成代替MCP调用可能因不清楚API格式导致多次迭代浪费上下文(Haiku4.5实测出现)
  • 代码生成代替多轮对话(Code Generation):遇到复杂的遍历和比对任务,Agent 直接生成一段包含 for 循环、并发 (asyncio.gather)、条件判断以及错误处理的 Python 或 TypeScript 脚本。
  • 隔离执行与结果收敛:业务脚本被发送到沙盒环境中(Code Execution tool)执行。2,000 多条报销明细仅在脚本的运行内存中流转并求和。模型最终只会看到一段极简的 stdout 结果,避免Context 被无用中间数据污染。
 
haiku4.5写不对脚本的尴尬拉扯
notion image
效果烂完
notion image
第二次尝试,先查了一些定义再写的脚本
notion image
本地qwen3.5 27b效果
notion image

零信任沙盒与工具桥接 (Tool Bridge)

如果放任llm 快乐yolo,不作任何约束,容易出现安全风险。为了构建出能上生产的安全执行环境,需要引入严格的隔离与桥接机制:
  • 轻量级 LLM 沙盒(LLM Sandbox):采用类似于 lm-sandbox 的开源方案。后端一旦接收到生成的代码,会在瞬间启动一个轻量级的 Docker/Podman 容器
  • 绝对的网络隔离:出于安全考虑,这个沙盒容器被剥夺了所有互联网访问权限,更无法直连生产数据库,(也可以上G-Visor)。
  • 工具桥接(Tool Bridge)
    • 既然沙盒没有网络,代码中的 get_expenses 怎么查询数据?这就需要桥接:
      1. 系统会将庞大且复杂的 MCP 工具转化为普通的 Python 存根函数(Stubs),并在初始化时注入沙盒。
      1. 当大模型的脚本执行到存根函数时,调用请求会被截获,并附带当前的 Session ID 路由给高度信任的宿主机(FastAPI 后端)。
      1. 后端代理完成身份验证与真实的 API 访问,拿到 JSON 后再穿透回沙盒中给 Python 代码使用。
 
 
[2026.3.27]Blog精读 - Anthropic 长任务 Agent Harness 解析与架构演进[2026.3.27]待进一步考察实用价值的AI前沿技术
Loading...
2024-2026CamelliaV.

CamelliaV | Java;前端;AI;ACGN;