LLM格式化输出XML

如何解析 XML 输出：使用 LangChain 和 Anthropic 的 Claude-2 模型

在本教程中，我们将学习如何使用 LangChain 和 Anthropic 的 Claude-2 模型生成 XML 格式的输出，并将其解析为更易用的格式。我们将逐步介绍如何设置环境、构建提示模板、调用模型并解析输出。

前提条件

在开始之前，确保你已经熟悉以下概念：

聊天模型（Chat Models）
输出解析器（Output Parsers）
提示模板（Prompt Templates）
结构化输出（Structured Output）
将多个可运行组件链接在一起（Chaining Runnables）

此外，不同的大型语言模型（LLM）在处理特定数据时可能有不同的优势。某些模型在生成 XML 格式的输出时可能更为可靠。本教程将使用 Anthropic 的 Claude-2 模型，该模型在生成 XML 标签方面表现优异。

安装依赖

首先，我们需要安装必要的 Python 包：

bash
%pip install -qU langchain langchain-anthropic

设置 API 密钥

接下来，设置 Anthropic 的 API 密钥：

python
import os
from getpass import getpass

if "ANTHROPIC_API_KEY" not in os.environ:
    os.environ["ANTHROPIC_API_KEY"] = getpass("请输入你的 Anthropic API 密钥：")

调用模型生成 XML 输出

我们从一个简单的请求开始，要求模型生成 Tom Hanks 的简短电影作品列表，并将每部电影用 <movie> 标签包裹。

python
from langchain_anthropic import ChatAnthropic
from langchain_core.output_parsers import XMLOutputParser
from langchain_core.prompts import PromptTemplate

# 初始化模型
model = ChatAnthropic(model="claude-2.1", max_tokens_to_sample=512, temperature=0.1)

# 定义查询
actor_query = "生成 Tom Hanks 的简短电影作品列表。"

# 调用模型生成 XML 输出
output = model.invoke(
    f"""{actor_query}
请将每部电影用 <movie></movie> 标签包裹。"""
)

print(output.content)

输出结果可能如下：

xml
<movie>Splash</movie>
<movie>Big</movie>
<movie>A League of Their Own</movie>
<movie>Sleepless in Seattle</movie>
<movie>Forrest Gump</movie>
<movie>Toy Story</movie>
<movie>Apollo 13</movie>
<movie>Saving Private Ryan</movie>
<movie>Cast Away</movie>
<movie>The Da Vinci Code</movie>

使用 XMLOutputParser 解析输出

虽然模型生成了 XML 格式的输出，但我们希望将其解析为更易用的格式。我们可以使用 XMLOutputParser 来添加默认的格式指令，并将输出的 XML 解析为字典。

python
# 初始化 XMLOutputParser
parser = XMLOutputParser()

# 获取格式指令
format_instructions = parser.get_format_instructions()

# 构建提示模板
prompt = PromptTemplate(
    template="""{query}\n{format_instructions}""",
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)

# 将提示模板、模型和解析器链接在一起
chain = prompt | model | parser

# 调用链并解析输出
output = chain.invoke({"query": actor_query})
print(output)

输出结果可能如下：

python
{'filmography': [{'movie': [{'title': 'Big'}, {'year': '1988'}]}, {'movie': [{'title': 'Forrest Gump'}, {'year': '1994'}]}, {'movie': [{'title': 'Toy Story'}, {'year': '1995'}]}, {'movie': [{'title': 'Saving Private Ryan'}, {'year': '1998'}]}, {'movie': [{'title': 'Cast Away'}, {'year': '2000'}]}]}

自定义 XML 标签

我们还可以通过指定自定义标签来进一步定制输出。你可以根据需要添加或替换默认的格式指令。

python
# 初始化 XMLOutputParser 并指定自定义标签
parser = XMLOutputParser(tags=["movies", "actor", "film", "name", "genre"])

# 获取格式指令
format_instructions = parser.get_format_instructions()

# 构建提示模板
prompt = PromptTemplate(
    template="""{query}\n{format_instructions}""",
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)

# 将提示模板、模型和解析器链接在一起
chain = prompt | model | parser

# 调用链并解析输出
output = chain.invoke({"query": actor_query})
print(output)

输出结果可能如下：

python
{'movies': [{'actor': [{'name': 'Tom Hanks'}, {'film': [{'name': 'Forrest Gump'}, {'genre': 'Drama'}]}, {'film': [{'name': 'Cast Away'}, {'genre': 'Adventure'}]}, {'film': [{'name': 'Saving Private Ryan'}, {'genre': 'War'}]}]}]}

流式输出解析

XMLOutputParser 还支持流式输出解析。以下是一个示例：

python
for s in chain.stream({"query": actor_query}):
    print(s)

输出结果可能如下：

python
{'movies': [{'actor': [{'name': 'Tom Hanks'}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Forrest Gump'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'Drama'}]}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Cast Away'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'Adventure'}]}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Saving Private Ryan'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'War'}]}]}]}

下一步

现在你已经学会了如何提示模型生成 XML 输出并解析它。接下来，你可以查看更广泛的指南，了解如何获取结构化输出的其他相关技术。

希望这篇教程对你有所帮助！如果你有任何问题或建议，欢迎随时反馈。

目录

如何解析 XML 输出：使用 LangChain 和 Anthropic 的 Claude-2 模型

前提条件

安装依赖

设置 API 密钥

调用模型生成 XML 输出

使用 XMLOutputParser 解析输出

自定义 XML 标签

流式输出解析

下一步