如何让大语言模型输出 JSON

已过时，OpenAI 推出了 JSON Mode 来让 LLM 输出 JSON，可以参考：https://platform.openai.com/docs/guides/text-generation/json-mode

前言

在过去的一段时间里，我们似乎处在人工智能的浪潮中。大语言模型（LLM）在越来越多的应用场景中落地，逐渐成为生活和工作中不可或缺的一部分。我们都知道 LLM 擅长用自然语言来解答和分析问题。但是，想要让 LLM 与业务程序进行衔接，目前还需要具备稳定输出 JSON 或 YAML 等结构化数据的能力。

当然我们可以在 Prompt 中强调想要的输出格式，比如 JSON：

请从给定文本中提取以下信息，并以 JSON 对象的形式返回：
- name
- major
- school

给定文本如下：
"""
学生姓名叫张三，专业：计算机科学，来自清华大学
"""

LLM 可能会按照你的预期，输出你想要的的 JSON 结构，像这样：

{
  "name": "张三",
  "major": "计算机科学",
  "school": "清华大学"
}

也可能会额外附带一些描述语句，像下面这样，不稳定的输出就很难给到代码来直接使用。

在给定文本中，提取了以下信息：
```json
{
  "name": "张三",
  "major": "计算机科学",
  "school": "清华大学"
}
```

TypeChat

微软为了优化上面 JSON 输出问题开源的一个 Node.js 包，官网地址：https://microsoft.github.io/TypeChat/ 。需要在和 LLM 交互之前，提前用 TypeScript 定义所需数据的 schema 描述。

export interface Student {
  // 学生姓名
  name: string;
  // 学生专业
  major: string;
  // 学生的大学名称
  school: string;
}

TypeChat 在背后的实现分为三部分：

读取 schema 中的 TypeScript 类型，填充到内置的 prompt 内。
附带上用户的 request 向 LLM 发起请求，等待 LLM 回复完毕。
使用 Zod 来校验 LLM 的回复是否符合 schema 中的定义。如果不符合的话，就会发起一次 createRepairPrompt 来尝试修复 JSON 数据。

export function createProgramTranslator(
8 collapsed lines
  model: TypeChatLanguageModel,
  schema: string
): TypeChatJsonTranslator<Program> {
  const validator = createTypeScriptJsonValidator<Program>(schema, "Program");
  validator.createModuleTextFromJson = createModuleTextFromProgram;
  const translator = createJsonTranslator<Program>(model, validator);
  translator.createRequestPrompt = createRequestPrompt;
  translator.createRepairPrompt = createRepairPrompt;
  return translator;

  function createRequestPrompt(request: string) {
    return (
      `You are a service that translates user requests into programs represented as JSON using the following TypeScript definitions:\n` +
      `\`\`\`\n${programSchemaText}\`\`\`\n` +
      `The programs can call functions from the API defined in the following TypeScript definitions:\n` +
      `\`\`\`\n${validator.getSchemaText()}\`\`\`\n` +
      `The following is a user request:\n` +
      `"""\n${request}\n"""\n` +
      `The following is the user request translated into a JSON program object with 2 spaces of indentation and no properties with the value undefined:\n`
    );
  }

  function createRepairPrompt(validationError: string) {
    return (
      `The JSON program object is invalid for the following reason:\n` +
      `"""\n${validationError}\n"""\n` +
      `The following is a revised JSON program object:\n`
    );
  }
}

async function translate(
  request: string,
  promptPreamble?: string | PromptSection[]
30 collapsed lines
) {
  const preamble: PromptSection[] =
    typeof promptPreamble === "string"
      ? [{ role: "user", content: promptPreamble }]
      : promptPreamble ?? [];
  let prompt: PromptSection[] = [
    ...preamble,
    { role: "user", content: typeChat.createRequestPrompt(request) },
  ];
  let attemptRepair = typeChat.attemptRepair;
  while (true) {
    const response = await model.complete(prompt);
    if (!response.success) {
      return response;
    }
    const responseText = response.data;
    const startIndex = responseText.indexOf("{");
    const endIndex = responseText.lastIndexOf("}");
    if (!(startIndex >= 0 && endIndex > startIndex)) {
      return error(`Response is not JSON:\n${responseText}`);
    }
    const jsonText = responseText.slice(startIndex, endIndex + 1);
    let jsonObject;
    try {
      jsonObject = JSON.parse(jsonText) as object;
    } catch (e) {
      return error(e instanceof SyntaxError ? e.message : "JSON parse error");
    }
    if (typeChat.stripNulls) {
      stripNulls(jsonObject);
    }
    const schemaValidation = validator.validate(jsonObject);
    const validation = schemaValidation.success
      ? typeChat.validateInstance(schemaValidation.data)
      : schemaValidation;
    if (validation.success) {
      return validation;
    }
    if (!attemptRepair) {
      return error(
        `JSON validation failed: ${validation.message}\n${jsonText}`
      );
7 collapsed lines
    }
    prompt.push({ role: "assistant", content: responseText });
    prompt.push({
      role: "user",
      content: typeChat.createRepairPrompt(validation.message),
    });
    attemptRepair = false;
  }
}

可以看到 TypeChat 的思路还是很巧妙的，类似于 Function Calling ，同时在背后实现了校验、重试等逻辑。但是，最终能否稳定输出 JSON 取决于 LLM 模型本身的能力。如果模型的能力不足，使用过程中可能会频繁触发 createRepairPrompt 或者导致任务失败，会对最终的用户体验产生影响，可以按照 TypeChat - 使用技巧这一章的建议来进行优化。

另外，想要减少重试和失败的次数，同时对 JSON 的依赖没那么高，也可以考虑使用 YAML 甚至 Markdown 来输出所需内容的格式。相较于JSON，这些格式更加宽容，容错率更高，不会因为一个符号的输出问题而导致整个内容提取和校验失败，坏处就是解析和校验的成本相较于 JSON 更高。

当然，OpenAI 也考虑到了这样的需求，推出了JSON Mode 让 LLM 输出 JSON。但是，并不支持使用 JSON Schema 来约束输出，目前感觉实用价值不太高。