编辑
2025-02-11
后端
00
请注意,本文编写于 87 天前,最后修改于 87 天前,其中某些信息可能已经过时。

目录

使用本地LLM实现Self-RAG
设置
本地LLM和嵌入
创建索引
检索评分器
生成
幻觉评分器
答案评分器
问题重写器
构建图
运行
总结

使用本地LLM实现Self-RAG

Self-RAG是一种结合了自我反思/自我评分的RAG(检索增强生成)策略。在论文中,Self-RAG通过以下几个决策点来优化检索和生成过程:

  1. 是否从检索器中检索(R)

    • 输入:问题(x)或问题(x)和生成结果(y)
    • 决定何时从检索器R中检索D个文档块
    • 输出:是、否、继续
  2. 检索到的文档块D是否与问题x相关

    • 输入:问题(x)和文档块(d)
    • 判断d是否为解决x提供了有用的信息
    • 输出:相关、不相关
  3. LLM生成的每个文档块D的生成结果是否与文档块相关(是否存在幻觉等)

    • 输入:问题(x)、文档块(d)、生成结果(y)
    • 判断y中的所有可验证陈述是否由d支持
    • 输出:完全支持、部分支持、不支持
  4. LLM生成的每个文档块D的生成结果是否对问题x有用

    • 输入:问题(x)、生成结果(y)
    • 判断y是否对解决x有用
    • 输出:5、4、3、2、1

我们将使用LangGraph从头实现这些思想。


设置

首先,安装所需的包并设置API密钥。

python
%%capture --no-stderr %pip install -U langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph nomic[local] import getpass import os def _set_env(key: str): if key not in os.environ: os.environ[key] = getpass.getpass(f"{key}:") _set_env("NOMIC_API_KEY")

本地LLM和嵌入

  1. 本地嵌入
    使用Nomic的GPT4AllEmbeddings(),支持Nomic发布的v1和v1.5嵌入模型。
    参考文档

  2. 本地LLM

    • 下载Ollama应用。
    • Mistral版本Mixtral版本下载模型。
    • 使用以下命令拉取模型:
      bash
      ollama pull mistral
    • 设置本地LLM名称:
      python
      local_llm = "mistral"

创建索引

索引3篇博客文章。

python
from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.document_loaders import WebBaseLoader from langchain_community.vectorstores import Chroma from langchain_nomic.embeddings import NomicEmbeddings urls = [ "https://lilianweng.github.io/posts/2023-06-23-agent/", "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/", "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/", ] docs = [WebBaseLoader(url).load() for url in urls] docs_list = [item for sublist in docs for item in sublist] text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=250, chunk_overlap=0 ) doc_splits = text_splitter.split_documents(docs_list) # 添加到向量数据库 vectorstore = Chroma.from_documents( documents=doc_splits, collection_name="rag-chroma", embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local"), ) retriever = vectorstore.as_retriever()

检索评分器

python
from langchain.prompts import PromptTemplate from langchain_community.chat_models import ChatOllama from langchain_core.output_parsers import JsonOutputParser # LLM llm = ChatOllama(model=local_llm, format="json", temperature=0) prompt = PromptTemplate( template="""You are a grader assessing relevance of a retrieved document to a user question. \n Here is the retrieved document: \n\n {document} \n\n Here is the user question: {question} \n If the document contains keywords related to the user question, grade it as relevant. \n It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n Provide the binary score as a JSON with a single key 'score' and no premable or explanation.""", input_variables=["question", "document"], ) retrieval_grader = prompt | llm | JsonOutputParser() question = "agent memory" docs = retriever.invoke(question) doc_txt = docs[1].page_content print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

生成

python
from langchain import hub from langchain_core.output_parsers import StrOutputParser # Prompt prompt = hub.pull("rlm/rag-prompt") # LLM llm = ChatOllama(model=local_llm, temperature=0) # 后处理 def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) # Chain rag_chain = prompt | llm | StrOutputParser() # 运行 generation = rag_chain.invoke({"context": docs, "question": question}) print(generation)

幻觉评分器

python
# LLM llm = ChatOllama(model=local_llm, format="json", temperature=0) # Prompt prompt = PromptTemplate( template="""You are a grader assessing whether an answer is grounded in / supported by a set of facts. \n Here are the facts: \n ------- \n {documents} \n ------- \n Here is the answer: {generation} Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts. \n Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.""", input_variables=["generation", "documents"], ) hallucination_grader = prompt | llm | JsonOutputParser() hallucination_grader.invoke({"documents": docs, "generation": generation})

答案评分器

python
# LLM llm = ChatOllama(model=local_llm, format="json", temperature=0) # Prompt prompt = PromptTemplate( template="""You are a grader assessing whether an answer is useful to resolve a question. \n Here is the answer: \n ------- \n {generation} \n ------- \n Here is the question: {question} Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question. \n Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.""", input_variables=["generation", "question"], ) answer_grader = prompt | llm | JsonOutputParser() answer_grader.invoke({"question": question, "generation": generation})

问题重写器

python
# LLM llm = ChatOllama(model=local_llm, temperature=0) # Prompt re_write_prompt = PromptTemplate( template="""You a question re-writer that converts an input question to a better version that is optimized \n for vectorstore retrieval. Look at the initial and formulate an improved question. \n Here is the initial question: \n\n {question}. Improved question with no preamble: \n """, input_variables=["generation", "question"], ) question_rewriter = re_write_prompt | llm | StrOutputParser() question_rewriter.invoke({"question": question})

构建图

将上述流程封装为图。

python
from langgraph.graph import END, StateGraph, START workflow = StateGraph(GraphState) # 定义节点 workflow.add_node("retrieve", retrieve) # 检索 workflow.add_node("grade_documents", grade_documents) # 评分文档 workflow.add_node("generate", generate) # 生成 workflow.add_node("transform_query", transform_query) # 重写问题 # 构建图 workflow.add_edge(START, "retrieve") workflow.add_edge("retrieve", "grade_documents") workflow.add_conditional_edges( "grade_documents", decide_to_generate, { "transform_query": "transform_query", "generate": "generate", }, ) workflow.add_edge("transform_query", "retrieve") workflow.add_conditional_edges( "generate", grade_generation_v_documents_and_question, { "not supported": "generate", "useful": END, "not useful": "transform_query", }, ) # 编译 app = workflow.compile()

运行

python
from pprint import pprint # 运行 inputs = {"question": "Explain how the different types of agent memory work?"} for output in app.stream(inputs): for key, value in output.items(): # 节点 pprint(f"Node '{key}':") # 可选:打印每个节点的完整状态 # pprint.pprint(value["keys"], indent=2, width=80, depth=None) pprint("\n---\n") # 最终生成结果 pprint(value["generation"])

总结

通过本教程,我们使用本地LLM和LangGraph实现了Self-RAG策略,优化了检索和生成过程。Self-RAG通过自我反思和评分机制,显著提升了生成结果的质量和相关性。

本文作者:yowayimono

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!