Domain 2
Task 2.1

Tool Schema Design

Learn this interactively

What You Need to Know

Tool descriptions are the PRIMARY mechanism LLMs use for tool selection. This is not supplementary metadata. It is not an afterthought. It is THE mechanism. When a model receives a set of tools, it reads the descriptions to decide which tool to call. If those descriptions are minimal — something like "Retrieves customer information" — the model lacks the context to differentiate between tools that serve overlapping purposes.

What Makes a Good Tool Description

A production-grade tool description includes five elements:

  1. What the tool does — its primary purpose, stated unambiguously
  2. What inputs it expects — data types, formats, constraints, and required versus optional fields
  3. Example queries it handles well — concrete use cases that anchor the model's understanding
  4. Edge cases and limitations — what the tool does NOT do, and what happens when inputs fall outside expected ranges
  5. Explicit boundaries — when to use THIS tool versus similar tools in the same toolkit

Here is the difference between a minimal and a production-grade description:

Minimal (causes misrouting):

get_customer: "Retrieves customer information" lookup_order: "Retrieves order details"

Production-grade (reliable selection):

get_customer: "Looks up a customer account by email address, phone number, or customer ID. Returns customer profile (name, contact details, account status, loyalty tier). Use this when you need to verify who the customer is. Do NOT use for order-specific queries — use lookup_order for those."

lookup_order: "Retrieves order details by order number (format: #NNNNN) or tracking ID. Returns order status, items, shipping details, and refund eligibility. Use this when a customer asks about a specific order. Do NOT use for customer identity verification — use get_customer for that."

The second version gives the model explicit disambiguation. It knows which identifiers each tool accepts, what each returns, and crucially, when NOT to use each tool.

The Misrouting Problem

Two tools with overlapping or near-identical descriptions cause selection confusion. The exam's Q2 presents exactly this scenario: get_customer and lookup_order with minimal descriptions, causing the agent to route "check my order #12345" to the wrong tool.

The exam tests your ability to identify the correct fix. There are four plausible options, and three of them are wrong:

  • Expand descriptions — correct. Low effort, high leverage, directly addresses the root cause.
  • Few-shot examples — wrong. Adds token overhead without fixing why the model is confused. You are treating symptoms, not the disease.
  • Routing classifier — wrong. Over-engineered as a first step. Bypasses the LLM's natural language understanding and adds infrastructure complexity.
  • Tool consolidation — wrong as a first step. It is a valid architectural choice long-term, but requires significantly more effort than expanding descriptions.

The exam consistently favours low-effort, high-leverage fixes. Better descriptions before routing classifiers. Scoped access before full access. Community servers before custom builds.

Tool Splitting

Generic tools with broad responsibilities create ambiguity. The fix is to split them into purpose-specific tools with defined input/output contracts.

Before splitting:

analyze_document: "Analyses a document and returns results"

After splitting:

extract_data_points: "Extracts structured data fields (dates, amounts, names) from a document"

summarize_content: "Produces a concise summary of a document's key arguments and conclusions"

verify_claim_against_source: "Checks whether a specific claim is supported by the source document, returning supporting/contradicting evidence"

Each resulting tool has a narrow, clearly described purpose. The model can select the right one based on what the user actually needs.

Tool Renaming for Clarity

When two tools have confusingly similar names, renaming eliminates functional overlap at the interface level. For example, renaming analyze_content to extract_web_results with a web-specific description makes the tool's purpose unambiguous without changing its implementation.

System Prompt Interactions

Keyword-sensitive instructions in system prompts can create unintended tool associations that override well-written descriptions. If your system prompt says "always check customer details before proceeding", the model may associate any customer-related query with get_customer regardless of what the tool descriptions say.

Always review system prompts for conflicts after updating tool descriptions. This is a subtle failure mode that the exam tests.

Key Concept

Tool descriptions are the primary mechanism LLMs use for tool selection. When misrouting occurs, the first fix is always to improve descriptions — not to add few-shot examples, routing classifiers, or tool consolidation.

Exam Traps

EXAM TRAP

Choosing few-shot examples to fix tool misrouting caused by minimal descriptions

Few-shot examples add token overhead without addressing the root cause. The model is confused because descriptions do not differentiate the tools — fix the descriptions first.

EXAM TRAP

Implementing a routing classifier as the first step to fix tool selection

A routing classifier is over-engineered as a first response. It bypasses the LLM's natural language understanding and adds infrastructure the exam does not consider proportionate.

EXAM TRAP

Consolidating similar tools into one as the first step

Tool consolidation is a valid long-term architectural choice, but it requires more effort than expanding descriptions. The exam favours low-effort, high-leverage first steps.

EXAM TRAP

Ignoring system prompt wording after updating tool descriptions

Keyword-sensitive instructions in system prompts can silently override well-written tool descriptions, creating unintended tool associations.

Practice Scenario

Production logs show an agent frequently calls get_customer when users ask about orders (e.g. 'check my order #12345'), instead of calling lookup_order. Both tools have minimal descriptions ('Retrieves customer information' / 'Retrieves order details') and accept similar identifier formats. What is the most effective first step to improve tool selection reliability?

Build Exercise

Design Tool Descriptions That Eliminate Misrouting

Beginner
30 minutes

What you'll learn

  • Understand that tool descriptions are the primary mechanism LLMs use for tool selection
  • Write production-grade descriptions with purpose, inputs, examples, edge cases, and boundaries
  • Diagnose misrouting caused by ambiguous or overlapping descriptions
  • Identify system prompt conflicts that override well-written tool descriptions
  1. Create two MCP tools with intentionally ambiguous descriptions (e.g. get_customer: Retrieves customer information and lookup_order: Retrieves order details)

    Why: Reproducing a misrouting scenario first-hand builds intuition for why minimal descriptions fail. The exam tests your ability to identify ambiguous descriptions as the root cause of tool selection errors.

    You should see: Two tool definitions registered with your MCP server, each having a single-sentence description that does not mention input formats, example queries, or boundaries.

  2. Test with 10 queries covering different user intents and log which tool the model selects for each

    Why: Quantifying selection accuracy before and after description changes gives you concrete evidence of the impact. The exam expects you to know that description quality directly affects selection reliability.

    You should see: A log showing at least 2-3 misrouted queries where the model selected get_customer for order-related queries or vice versa, demonstrating the ambiguity problem.

  3. Rewrite both descriptions to include: purpose, expected inputs with formats, example queries, edge cases, and explicit boundaries against the other tool

    Why: This is the core exam skill — the lowest-effort, highest-leverage fix for misrouting. Production-grade descriptions include all five elements: purpose, inputs, examples, edge cases, and boundaries.

    You should see: Each tool description is 3-5 sentences long, explicitly states accepted identifier formats, gives example queries, and includes a boundary statement like "Do NOT use for order-specific queries — use lookup_order for those."

  4. Re-run the same 10 queries and compare selection accuracy before and after

    Why: Measuring improvement validates that description quality is the root cause. The exam expects you to understand that better descriptions produce measurably better selection without any architectural changes.

    You should see: Selection accuracy improves to 9/10 or 10/10 correct, with previously misrouted queries now hitting the correct tool. A clear before/after comparison showing the improvement.

  5. Review your system prompt for keyword-sensitive instructions that could override the improved descriptions

    Why: System prompt conflicts are a subtle failure mode the exam tests. Keywords like "always check customer details" can create unintended tool associations that override even well-written descriptions.

    You should see: A list of any keyword-sensitive phrases in your system prompt that could trigger incorrect tool associations, along with rewritten versions that avoid the conflict.

Sources