LLM 格式化输出Json

如何解析 JSON 输出

前提条件

在阅读本教程之前，你需要对以下概念有一定的了解：

聊天模型（Chat Models）
输出解析器（Output Parsers）
提示模板（Prompt Templates）
结构化输出（Structured Output）
将可运行对象链接在一起（Chaining Runnables Together）

为什么需要解析 JSON 输出？

虽然一些模型提供商支持内置的方式返回结构化输出，但并非所有提供商都支持。我们可以使用输出解析器来帮助用户通过提示指定任意的 JSON 模式，查询模型以生成符合该模式的输出，并最终将该模式解析为 JSON。

注意：大语言模型是“有漏洞的抽象”！你需要使用具有足够能力的 LLM 来生成格式良好的 JSON。

JsonOutputParser 是一个内置选项，用于提示并解析 JSON 输出。虽然它在功能上与 PydanticOutputParser 类似，但它还支持流式返回部分 JSON 对象。

以下是一个示例，展示如何与 Pydantic 结合使用，方便地声明预期的模式：

安装依赖

bash
%pip install -qU langchain langchain-openai

示例代码

python
import os
from getpass import getpass

# 设置 OpenAI API 密钥
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# 初始化模型
model = ChatOpenAI(temperature=0)

# 定义期望的数据结构
class Joke(BaseModel):
    setup: str = Field(description="笑话的开头问题")
    punchline: str = Field(description="笑话的结尾答案")

# 定义查询
joke_query = "Tell me a joke."

# 设置解析器并将指令注入提示模板
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# 构建链
chain = prompt | model | parser

# 调用链
result = chain.invoke({"query": joke_query})
print(result)

输出示例

json
{
    "setup": "Why couldn't the bicycle stand up by itself?",
    "punchline": "Because it was two tired!"
}

解析器的格式指令

parser.get_format_instructions() 返回的指令如下：

plaintext
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```json
{
    "properties": {
        "setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"},
        "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}
    },
    "required": ["setup", "punchline"]
}

你可以尝试在提示的其他部分添加自己的格式化提示，以增强或替换默认指令。

流式输出

JsonOutputParser 支持流式返回部分 JSON 对象。以下是一个流式输出的示例：

python
for s in chain.stream({"query": joke_query}):
    print(s)

输出示例：

json
{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why couldn'}
{'setup': "Why couldn't"}
{'setup': "Why couldn't the"}
{'setup': "Why couldn't the bicycle"}
{'setup': "Why couldn't the bicycle stand"}
{'setup': "Why couldn't the bicycle stand up"}
{'setup': "Why couldn't the bicycle stand up by"}
{'setup': "Why couldn't the bicycle stand up by itself"}
{'setup': "Why couldn't the bicycle stand up by itself?"}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': ''}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two tired'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two tired!'}

不使用 Pydantic

你也可以在不使用 Pydantic 的情况下使用 JsonOutputParser。这会提示模型返回 JSON，但不提供关于模式的具体信息。

python
joke_query = "Tell me a joke."

parser = JsonOutputParser()

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

result = chain.invoke({"query": joke_query})
print(result)