Structured Output and Tool Calling: Making an LLM Act on Your Systems

A chatbot that only talks is a novelty. The moment an AI feature earns its keep is when it does something: looks up a real order, books a real slot, files a real ticket, updates a real record. To get there you need two things the casual demo skips, getting structured data out of the model reliably, and letting it call your code safely. Done well, this is what turns a clever conversation into an actual assistant.

This guide covers both. It explains why you cannot just parse the model's prose, how to get validated structured output, how tool calling lets the model use your functions while your code stays in charge, and the guardrails that keep it from doing something it should not. The examples target a Node or Vercel app.

Why "just parse the text" fails

The tempting first approach is to ask the model for an answer and pick the pieces out of its reply with string matching or a regex. It works in testing and breaks in production, because natural language is not a stable format. The model phrases things differently each time, adds a friendly sentence you did not expect, formats a date three ways across three runs. Any parser built on prose is built on sand. You need the model to emit a defined structure, and you need to verify it.

Structured output: ask for a schema, then validate

Modern models can return JSON constrained to a schema you specify, which removes most of the guesswork. Define the exact shape you want, request it, and then validate the result anyway before trusting it, because constrained output reduces errors but does not eliminate them. On a validation failure, retry or fall back rather than letting malformed data flow downstream.

const Extract = z.object({
  intent: z.enum(["track_order", "return", "question"]),
  orderId: z.string().optional(),
});

const raw = await model.chat({ messages, responseFormat: { schema: Extract } });
const parsed = Extract.safeParse(JSON.parse(raw));     // never skip this step
if (!parsed.success) return retryOrFallback();

This single discipline, defined schema plus validation, is what makes the rest of your code able to rely on the model's output the way it relies on any other typed function.

Tool calling: letting the model use your functions

Tool calling (also called function calling) flips the interaction. Instead of the model answering directly, you describe a set of functions it may use, each with a name, a description, and a parameter schema. The model reads the conversation, decides which tool fits and with what arguments, and returns that choice as structured data. Crucially, the model does not run anything, it proposes, and your code executes. That separation is the whole safety model: the model decides, your application acts.

const tools = [{
  name: "get_order_status",
  description: "Look up the current status of a customer order by its id.",
  parameters: {
    type: "object",
    properties: { orderId: { type: "string" } },
    required: ["orderId"],
  },
}];

Design tools like an API you would hand a junior developer

The model is only as safe and capable as the tools you give it, so design them with care. Keep each one small and single-purpose, name it clearly, and describe precisely what it does and when to use it, because the description is how the model decides. Validate every argument before acting, make tools idempotent where you can, and give each the least privilege it needs. A tool called refund_order should refund one order with checks, not expose raw database access. If you would not hand a function to a new developer without guardrails, do not hand it to the model.

The execution loop

Putting it together is a loop. The model returns either a final answer or a tool call. When it calls a tool, you validate the arguments, run the function, feed the result back, and let the model continue with that new information, repeating until it produces an answer.

let res = await model.chat({ messages, tools });
while (res.toolCall) {
  const args = ToolArgs[res.toolCall.name].parse(res.toolCall.arguments); // validate first
  const result = await runTool(res.toolCall.name, args);                  // your code acts
  messages.push(toolResult(res.toolCall.id, result));
  res = await model.chat({ messages, tools });                            // model continues
}
return res.text;

Bound the loop so a confused model cannot spin forever, and log each tool call so you can see what the assistant actually did.

Guardrails and keeping humans in the loop

Validation of arguments is the baseline, not the finish line. Maintain an allowlist of what the model may call, and for anything irreversible or sensitive, charging a card, deleting data, emailing a customer, require an explicit confirmation step rather than letting the model do it autonomously. A good pattern is to let the assistant handle read and low-risk actions on its own, and to route high-stakes actions to a human approval before they run. The model proposing an action and a person approving it is both safer and, often, exactly the workflow users want.

A structured-output and tool-calling checklist

Never parse the model's prose, request structured output against a schema and validate it.
Retry or fall back on validation failure instead of passing malformed data on.
Describe tools clearly, the description is how the model chooses correctly.
Keep tools small, single-purpose, validated, idempotent, and least-privilege.
Run the model-then-execute loop in your code, the model proposes, you act.
Bound the loop and log every tool call.
Allowlist tools, and require human confirmation for irreversible or sensitive actions.

FAQ

How do I get reliable JSON out of an LLM?

Ask for it with a defined schema using the model's structured-output mode, then validate the result against that schema in your code before using it. Constrained output greatly reduces malformed responses but does not guarantee them, so the validation step is not optional. On a failure, retry the call or fall back, rather than letting unverified data flow into the rest of your system. Never extract structured data by parsing the model's prose.

What is tool calling and how is it different from just asking the model?

With tool calling you describe functions the model may use, each with a name and a parameter schema, and instead of answering directly the model returns a structured request to call one with specific arguments. Your code, not the model, runs the function and feeds the result back. That separation is the point: the model decides what to do and your application controls whether and how it actually happens, which is what makes acting on real systems safe.

How do I stop the model from doing something dangerous with a tool?

Layer the guardrails. Validate every argument against a schema, keep an allowlist of callable tools, give each tool the least privilege it needs, and make irreversible or sensitive actions require explicit human confirmation rather than running autonomously. Because your code executes tools, not the model, you hold the final say on every action, so put the real checks at execution time instead of trusting the model to behave.

Should the AI execute actions on its own or ask first?

Let it act autonomously on reads and low-risk operations where a mistake is cheap, and require human approval for anything irreversible or high-stakes like refunds, deletions, or outbound messages. The model proposing an action and a person confirming it is both safer and frequently the workflow users actually prefer. Decide per tool where the line sits, and enforce it in your execution code rather than in the prompt.

Why not just use a regex to pull data from the response?

Because natural language is not a stable format. The model rephrases, adds extra words, and formats values differently across runs, so any parser built on its prose breaks as soon as the wording shifts. Structured output with schema validation gives you a defined, checkable shape your code can rely on like any typed function, which is the difference between a feature that works in a demo and one that holds up in production.

If you want an AI feature that does real work in your systems rather than just chatting, tell me what it should be able to do and I will map out the tools and guardrails to build it safely.

Structured Output and Tool Calling: Making an LLM Act on Your Systems

Why "just parse the text" fails

Structured output: ask for a schema, then validate

Tool calling: letting the model use your functions

Design tools like an API you would hand a junior developer

The execution loop

Guardrails and keeping humans in the loop

A structured-output and tool-calling checklist

FAQ

How do I get reliable JSON out of an LLM?

What is tool calling and how is it different from just asking the model?

How do I stop the model from doing something dangerous with a tool?

Should the AI execute actions on its own or ask first?

Why not just use a regex to pull data from the response?

Want a hand applying this?

Go deeper

Adding an AI Assistant to Your Website: What Actually Works

An AI Assistant That Actually Knows Your Store (Not a Generic Bot)

Should You Build an AI Chatbot or Buy One?

Practical AI for Small Businesses: 5 Automations Worth Building

API Integrations: Why Connecting Your Stack Beats Copy-Paste

The Hidden Cost of Manual Workflows (and What to Automate First)