代码助手

代码生成与 RAG 和自我修正

AlphaCodium 提出了一种使用控制流的代码生成方法。

核心思想：迭代式地构建编码问题的答案。

AlphaCodium 通过迭代测试和改进答案，使用公开和 AI 生成的测试来验证特定问题的解决方案。

我们将使用 LangGraph 从头实现其中的一些想法：

从用户指定的文档集开始。
使用长上下文 LLM 来消化这些文档，并通过 RAG 回答问题。
调用工具生成结构化输出。
在将解决方案返回给用户之前，执行两个单元测试（检查导入和代码执行）。

设置

首先，安装所需的包并设置所需的 API 密钥。

python
! pip install -U langchain_community langchain-openai langchain-anthropic langchain langgraph bs4

import getpass
import os

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")
_set_env("ANTHROPIC_API_KEY")

文档加载

加载 LangChain Expression Language (LCEL) 文档作为示例。

python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader

# LCEL 文档
url = "https://python.langchain.com/docs/concepts/lcel/"
loader = RecursiveUrlLoader(
    url=url, max_depth=20, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

# 按 URL 排序并获取文本
d_sorted = sorted(docs, key=lambda x: x.metadata["source"])
d_reversed = list(reversed(d_sorted))
concatenated_content = "\n\n\n --- \n\n\n".join(
    [doc.page_content for doc in d_reversed]
)

LLM 配置

我们将尝试使用 OpenAI 和 Claude3 进行代码生成。

python
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# OpenAI 配置
code_gen_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """你是精通 LCEL（LangChain 表达式语言）的编码助手。\n
            以下是完整的 LCEL 文档：\n ------- \n {context} \n ------- \n
            根据提供的文档回答用户问题。确保提供的代码可以执行，并包含所有必需的导入和变量定义。
            结构化你的答案：1) 代码解决方案的描述，2) 导入语句，3) 可执行的代码块。用户问题是：""",
        ),
        ("placeholder", "{messages}"),
    ]
)

# 数据模型
class code(BaseModel):
    """LCEL 问题代码解决方案的 Schema"""
    prefix: str = Field(description="问题描述和解决方案方法")
    imports: str = Field(description="导入语句代码块")
    code: str = Field(description="不包括导入语句的代码块")

expt_llm = "gpt-4o-mini"
llm = ChatOpenAI(temperature=0, model=expt_llm)
code_gen_chain_oai = code_gen_prompt | llm.with_structured_output(code)

代码生成与测试

生成代码并执行单元测试。

python
from langchain_anthropic import ChatAnthropic

# Anthropic 配置
code_gen_prompt_claude = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """<instructions> 你是精通 LCEL 的编码助手。\n
            以下是 LCEL 文档：\n ------- \n {context} \n ------- \n
            根据提供的文档回答用户问题。确保提供的代码可以执行，并包含所有必需的导入和变量定义。
            结构化你的答案：1) 代码解决方案的描述，2) 导入语句，3) 可执行的代码块。
            调用代码工具以正确结构化输出。</instructions> \n 用户问题是：""",
        ),
        ("placeholder", "{messages}"),
    ]
)

expt_llm = "claude-3-opus-20240229"
llm = ChatAnthropic(
    model=expt_llm,
    default_headers={"anthropic-beta": "tools-2024-04-04"},
)

structured_llm_claude = llm.with_structured_output(code, include_raw=True)

代码检查与修正

定义代码检查和修正的逻辑。

python
def code_check(state: GraphState):
    """检查代码"""
    print("---CHECKING CODE---")
    messages = state["messages"]
    code_solution = state["generation"]
    iterations = state["iterations"]

    # 检查导入
    try:
        exec(code_solution.imports)
    except Exception as e:
        print("---CODE IMPORT CHECK: FAILED---")
        error_message = [("user", f"导入检查失败: {e}")]
        messages += error_message
        return {"error": "yes"}

    # 检查代码执行
    try:
        exec(code_solution.imports + "\n" + code_solution.code)
    except Exception as e:
        print("---CODE BLOCK CHECK: FAILED---")
        error_message = [("user", f"代码执行检查失败: {e}")]
        messages += error_message
        return {"error": "yes"}

    print("---NO CODE TEST FAILURES---")
    return {"error": "no"}

完整代码

以下是完整的代码实现，包括 LangGraph 的工作流定义和测试逻辑。

python
from langgraph.graph import END, StateGraph, START

workflow = StateGraph(GraphState)

# 定义节点
workflow.add_node("generate", generate)  # 生成解决方案
workflow.add_node("check_code", code_check)  # 检查代码
workflow.add_node("reflect", reflect)  # 反思错误

# 构建图
workflow.add_edge(START, "generate")
workflow.add_edge("generate", "check_code")
workflow.add_conditional_edges(
    "check_code",
    decide_to_finish,
    {
        "end": END,
        "reflect": "reflect",
        "generate": "generate",
    },
)
workflow.add_edge("reflect", "generate")
app = workflow.compile()

测试与评估

使用 LangSmith 进行测试和评估。

python
from langsmith import Client

client = Client()

# 克隆公共数据集
try:
    public_dataset = "https://smith.langchain.com/public/326674a6-62bd-462d-88ae-eea49d503f9d/d"
    client.clone_public_dataset(public_dataset)
except:
    print("请设置 LangSmith")

# 自定义评估
def check_import(run: Run, example: Example) -> dict:
    imports = run.outputs.get("imports")
    try:
        exec(imports)
        return {"key": "import_check", "score": 1}
    except Exception:
        return {"key": "import_check", "score": 0}

def check_execution(run: Run, example: Example) -> dict:
    imports = run.outputs.get("imports")
    code = run.outputs.get("code")
    try:
        exec(imports + "\n" + code)
        return {"key": "code_execution_check", "score": 1}
    except Exception:
        return {"key": "code_execution_check", "score": 0}

结果

LangGraph 表现优于基础案例：添加重试循环提高了性能。
反思并未帮助：在重试之前进行反思反而导致性能下降。
GPT-4 优于 Claude3：Claude3 在某些运行中因工具使用错误而失败。

总结：通过 LangGraph 和 RAG，我们能够迭代生成和优化代码解决方案，确保其正确性和可执行性。

目录