Structured Output: Hướng Dẫn LLM Trả Về JSON Hợp Lệ

LLM Sinh JSON — Tại Sao Luôn Sai?

Đưa cho LLM task đơn giản: trả về JSON chứa danh sách sản phẩm. Kết quả: thiếu field, sai kiểu, thừa comma. Lần nữa — JSON hợp lệ nhưng schema khác hẳn.

JSON là text, model sinh text từng token. Không có gì guarantee mỗi lần đều đúng schema.

Structured Output là giải pháp: ép model tuân theo schema bằng code.

Tool Calling: Cách Reliable Nhất

Tool calling — thay vì yêu cầu model sinh JSON, developer định nghĩa function với schema cụ thể. Model gọi function đó với arguments đúng format.

const tool = {
  type: "function",
  function: {
    name: "extract_products",
    description: "Trích xuất sản phẩm từ text",
    parameters: {
      type: "object",
      properties: {
        products: {
          type: "array",
          items: {
            type: "object",
            properties: {
              name: { type: "string" },
              price: { type: "number" },
            },
            required: ["name", "price"],
          },
        },
      },
      required: ["products"],
    },
  },
};

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "iPhone 16 Pro 29 triệu" }],
  tools: [{ type: "function", function: tool }],
  tool_choice: { type: "function", function: { name: "extract_products" } },
});

const result = JSON.parse(
  completion.choices[0].message.tool_calls![0].function.arguments
);
// { products: [{ name: "iPhone 16 Pro", price: 29000000 }] }

tool_choice ép model gọi đúng function — JSON trả về luôn hợp lệ và đúng schema.

Zod: Type-Safe Từ Đầu Đến Cuối

Dùng raw object schema dễ sai. Zod validate và generate schema type-safe:

import { z } from "zod";

const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  category: z.string().optional(),
});

const ResponseSchema = z.object({
  products: z.array(ProductSchema),
});

// Validate — nếu model trả sai type, catch lỗi tại runtime
const result = ResponseSchema.parse(JSON.parse(toolCallArguments));
result.products[0].price; // number, type-safe

Anthropic Claude

Anthropic hỗ trợ structured output qua tool calling:

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Phân tích sentiment: 'Sản phẩm tuyệt vời'" }],
  tools: [{
    name: "sentiment_analysis",
    input_schema: {
      type: "object",
      properties: {
        sentiment: { type: "string", enum: ["positive", "negative", "neutral"] },
        confidence: { type: "number" },
      },
      required: ["sentiment", "confidence"],
    },
  }],
});

Khi Nào Dùng Gì?

Tool calling — mặc định. Reliable nhất, được guarantee bởi provider.
JSON mode — JSON đơn giản, schema ít ràng buộc. Nhanh hơn nhưng không guarantee đầy đủ.
System prompt + few-shot — fallback khi model không hỗ trợ tool calling (model nhỏ, local). Ít reliable nhất.

Tips Thực Tế

Luôn validate output — Dùng Zod hoặc type guard. Model tốt đến đâu cũng có thể trả sai ở edge case.
Schema đơn giản — Schema phức tạp = model dễ sai. Chia thành nhiều tool nếu cần nested sâu.
Mô tả field rõ — description trong schema giúp model hiểu cần trả gì.
Temperature thấp — temperature: 0 giúp output ổn định. Creative không phải mục tiêu ở đây.
Retry khi parse lỗi — Gửi lại lỗi cho model, hầu hết tự sửa trong 1-2 lần.

Bài viết này là phần 5 của series AI For Developers.

Structured Output: Hướng Dẫn LLM Trả Về JSON Hợp Lệ

LLM Sinh JSON — Tại Sao Luôn Sai?

Tool Calling: Cách Reliable Nhất

Zod: Type-Safe Từ Đầu Đến Cuối

Anthropic Claude

Khi Nào Dùng Gì?

Tips Thực Tế

Related Posts

Zoom-in: LLM

Claude Opus 4.8: Anthropic Cải Thiện "Trung Thực", Giảm Giá Fast Mode 3 Lần

Ngôn ngữ Lập trình Đơn giản và AI Coding Agent