[2026.4.1]Blog精读 - Anthropic 进阶工具调用实践

type

Post

status

Published

date

Apr 1, 2026

slug

blog-advanced-tool-use

summary

探讨Agent 工具调用痛点（如全量加载MCP与多步执行的上下文膨胀，中间结果污染上下文，多次往返时延高），学习 Anthropic 的工具调用解决方案（工具搜索工具实现按需加载，编程式工具调用保护上下文），探讨LLM如何从JSON填空进阶到程序编排，实现更优雅的工具调用实践

进阶工具调用实践

参考源

Introducing advanced tool use on the Claude Developer Platform

Claude can now discover, learn, and execute tools dynamically to enable agents that take action in the real world. Here’s how.

https://www.anthropic.com/engineering/advanced-tool-use

Anthropic Just Changed How Agents Call Tools. I Stole It for My Qwen3.5 Agent

👉 Get ALL of our systems & join hundreds of AI builders in our community https://www.theaiautomators.com/?utm_source=youtube&utm_medium=video&utm_campaign=tutorial&utm_content=cc-tool-use 🔗 PRDs: GitHub Repo: https://github.com/theaiautomators/claude-code-agentic-rag-series/tree/main/ep5-advanced-tool-use What if your AI agent could cut its token usage dramatically by searching for tools on demand and writing code to orchestrate dozens of tool calls, instead of letting the LLM handle each one individually? In this video, we implement two advanced tool calling patterns from Anthropic's engineering post into our custom Python and React AI agent: the tool search tool and programmatic tool calling via code execution sandboxes. These aren't Claude-exclusive features. They're universal agent design patterns you can use with any model or framework. We test both approaches side by side, comparing traditional tool calling against programmatic execution, and we run the same task on Claude Haiku and the brand new Qwen 3.5 27B running locally on an RTX 5090. 🔗 Links: GitHub Repo: https://github.com/theaiautomators/claude-code-agentic-rag-series Anthropic Advanced Tool Use Post: https://www.anthropic.com/engineering/advanced-tool-use LLM Sandbox: https://github.com/vndee/llm-sandbox Cloudflare Code Mode : https://blog.cloudflare.com/code-mode/ Episode 4 (Skills & Sandboxes): https://www.youtube.com/watch?v=4Tp6nPZa5is Full codebase available to AI Automators community members 📌 What's covered: - The problem: tool definitions eating context before you even send a message - Tool search: deferring tool loading so the agent discovers what it needs on demand - Programmatic tool calling: letting the LLM write and execute code to orchestrate tool calls in a loop - The tool bridge architecture: how the sandbox calls tools securely without direct internet access or API credentials - Side-by-side comparison of traditional vs programmatic approaches on a budget compliance task (20 team members, 3 tools) - Real results: traditional approach missed answers at 76K tokens; programmatic got them all correct - Running Qwen 3.5 27B locally and comparing it against Claude Haiku - Tool use examples: multi-shot prompting for complex parameter handling (72% to 90% accuracy improvement) - When to use each pattern: context bloat, intermediate result pollution, or parameter accuracy issues 🔍 Tech stack: - Python backend / React frontend - LLM Sandbox (Docker-based isolated code execution) - Langfuse (observability and token tracking) - Qwen 3.5 27B via Ollama (local, RTX 5090) vs Claude Haiku via OpenRouter - Tool bridge pattern for secure sandbox-to-API communication - gVisor (recommended for production sandbox hardening) Key takeaway: Traditional tool calling breaks down as you scale to dozens of tools and hundreds of data points. Tool search keeps your context lean, and programmatic tool calling lets the agent write efficient loops instead of burning tokens on repetitive LLM round trips. These are model-agnostic patterns you can build into any agent system today. 📌 This is Episode 5 of our AI Builder series where we're building a full AI agent web app from scratch using Claude Code. ⏱️ Timestamps: 00:00 Intro & Tool Search 03:56 Programmatic Tool Calling 10:19 Architecture Overview 15:47 Tool Use Examples #AI #ToolCalling #AIAgents #CodeExecution #Docker #Sandboxes #ClaudeCode #AgenticRAG #LLM #MCP #Qwen #Anthropic

https://www.youtube.com/watch?v=R7OCrqyGMeY

解决的痛点

痛点一：预加载导致的 Context Bloat

情景：往Agent里将所有可能用到的工具定义（Tool Definitions）在一开始就全量打包塞进 System Prompt 中。从而导致：

快速膨胀的 Token 消耗：如果集成多个 MCP (Model Context Protocol) 服务器时，上下文会被迅速吞噬。例如 GitHub 包含 35 个工具（约占 26K tokens，以Anthropic的为基准，后续版本已经轻量化）、Slack 的 11 个工具（约占 21K tokens）等，58个工具就能在对话开始前耗尽 55K 的上下文。

yt实测60个工具，一个hello跑13k

（图源均为yt视频）

Wrong Tool 问题：面对庞杂的工具库，模型不仅注意力严重分散，还容易在名称相似的工具中发生“幻觉”（如选错 notification-send-user 和 notification-send-channel）。

But token cost isn't the only issue. The most common failures are wrong tool selection and incorrect parameters, especially when tools have similar names like notification-send-user vs. notification-send-channel.

痛点二：中间结果堆积与无效流转

除了工具定义会大量占用，作为多次来回的中间结果（Intermediate Results）同样会占据相当一部分上下文。

（图源yt视频）

业务场景：Q3 差旅预算合规性检查。系统提供了 get_team_members, get_expenses, get_budget_by_level 三个工具，目标是找出超支的员工。

传统模式的低效：为了排查 20 个员工的数据，Agent 会陷入“单次 API 调用 -> 返回结果 -> 再次调用”的循环。按yt的测试，haiku4.5执行 56 次工具调用，烧掉 76,000+ tokens，耗时 29s 结果甚至还漏了一个

工具搜索工具 (Tool Search Tool)

Anthropic 给出的第一个解法是：推迟加载（Deferred loading），按需发现。实现层面，在API调用上改defer_loading为True，TOOLS照传

The Tool Search Tool lets Claude dynamically discover tools instead of loading all definitions upfront. You provide all your tool definitions to the API, but mark tools with defer_loading: true to make them discoverable on-demand. Deferred tools aren't loaded into Claude's context initially. Claude only sees the Tool Search Tool itself plus any tools with defer_loading: false (your most critical, frequently-used tools).