agentic_huge_data_base / wiki
页面 Graphiti · 6.1 LLM 客户端架构·DeepWiki 中文全文译文

6.1 · LLM 客户端架构(LLM Client Architecture)

时序知识图谱与动态事实记忆 · 聚焦本章的模块关系、源码依据与实现要点。

项目Graphiti 章节6.1 状态全文译文 模块模型调用与提供方适配、接口与服务契约、界面与交互、入库与解析
源码线索
  • examples/azure-openai/azure_openai_neo4j.py
  • examples/gliner2/.env.example
  • examples/gliner2/README.md
  • examples/gliner2/gliner2_neo4j.py
  • graphiti_core/cross_encoder/gemini_reranker_client.py
  • graphiti_core/embedder/azure_openai.py
  • graphiti_core/embedder/gemini.py
  • graphiti_core/llm_client/anthropic_client.py
  • graphiti_core/llm_client/azure_openai_client.py
  • graphiti_core/llm_client/client.py
模块标签
  • 模型调用与提供方适配
  • 接口与服务契约
  • 界面与交互
  • 入库与解析
  • 系统架构

章节正文

LLM 客户端架构

大语言模型(LLM)客户端架构

相关源文件

本章引用的主要源码文件:

  • examples/azure-openai/azure_openai_neo4j.py
  • examples/gliner2/.env.example
  • examples/gliner2/README.md
  • examples/gliner2/gliner2_neo4j.py
  • graphiti_core/cross_encoder/gemini_reranker_client.py
  • graphiti_core/embedder/azure_openai.py
  • graphiti_core/embedder/gemini.py
  • graphiti_core/llm_client/anthropic_client.py
  • graphiti_core/llm_client/azure_openai_client.py
  • graphiti_core/llm_client/client.py
  • graphiti_core/llm_client/config.py
  • graphiti_core/llm_client/errors.py
  • graphiti_core/llm_client/gemini_client.py
  • graphiti_core/llm_client/gliner2_client.py
  • graphiti_core/llm_client/groq_client.py
  • graphiti_core/llm_client/openai_base_client.py
  • graphiti_core/llm_client/openai_client.py
  • graphiti_core/llm_client/openai_generic_client.py
  • graphiti_core/llm_client/token_tracker.py
  • mcp_server/config/mcp_config_stdio_example.json
  • mcp_server/src/services/factories.py
  • tests/cross_encoder/test_gemini_reranker_client.py
  • tests/llm_client/test_anthropic_client.py
  • tests/llm_client/test_azure_openai_client.py
  • tests/llm_client/test_errors.py
  • tests/llm_client/test_gemini_client.py
  • tests/test_text_utils.py

本文档记录了 graphiti-core 中的大语言模型(LLM)客户端子系统:包括抽象基类、配置对象、所有具体提供商实现,以及包裹每次大语言模型(LLM)调用的横切行为(重试逻辑、缓存、追踪和 Token 追踪)。

有关嵌入向量和重排序服务的集成,请参见 6.2。有关传入这些客户端的提示模板,请参见 6.3。有关如何端到端配置特定提供商,请参见 9.3

类层次结构

Graphiti 中的所有大语言模型(LLM)客户端共享一个以 LLMClient 为根的继承树。

大语言模型(LLM)客户端的类层次结构

Graphiti · 类层次结构 · 图 1
Graphiti · 类层次结构 · 图 1

来源:graphiti_core/llm_client/client.py:71-147, graphiti_core/llm_client/openai_base_client.py:40-95, graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118

LLMConfig

LLMConfig 是传递给每个客户端构造函数的配置对象。它是一个普通的 Python 类(不是 Pydantic 模型)。

字段类型默认值描述
api_keystr | NoneNone提供商 API 密钥
modelstr | NoneNone主模型标识符
small_modelstr | NoneNone用于简单提示的较小/较便宜模型
base_urlstr | NoneNone覆盖 API 基础 URL(例如,用于本地端点)
temperaturefloat1.0采样温度 graphiti_core/llm_client/config.py:20
max_tokensint16384最大输出 Token 数 graphiti_core/llm_client/config.py:19

ModelSize 是一个枚举,包含两个值:smallmedium graphiti_core/llm_client/config.py:23-25。所有对 generate_response 的调用都接受一个 model_size 参数;客户端会将 ModelSize.small 路由到 small_model,将 ModelSize.medium 路由到 model

来源:graphiti_core/llm_client/config.py:19-69

LLMClient 抽象基类

graphiti_core/llm_client/client.py:71-147 中的 LLMClient 是所有提供商实现的抽象基类。

构造函数
LLMClient(config: LLMConfig | None, cache: bool = False)

如果 configNone,则会使用默认的 LLMConfig() graphiti_core/llm_client/client.py:73-74。当 cache=True 时,会创建一个指向 ./llm_cacheLLMCache 实例 graphiti_core/llm_client/client.py:35, graphiti_core/llm_client/client.py:87-88

generate_response — 公共接口

这是调用者的唯一公共入口点。其签名如下:

async generate_response(
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int | None = None,
    model_size: ModelSize = ModelSize.medium,
    group_id: str | None = None,
    prompt_name: str | None = None,
) -> dict[str, Any]

基类实现按顺序执行以下步骤 graphiti_core/llm_client/client.py:155-247

  1. 如果提供了 response_model,则将其 JSON 模式追加到最后一条消息中 graphiti_core/llm_client/client.py:167-173
  2. 将多语言提取指令(来自 get_extraction_language_instruction(group_id))追加到第一条消息中 graphiti_core/llm_client/client.py:176
  3. 对每条消息调用 _clean_input,以去除无效的 Unicode 和控制字符 graphiti_core/llm_client/client.py:178-179
  4. 打开一个追踪跨度(llm.generate)并设置属性,包括 llm.providermodel.sizemax_tokenscache.enabled,以及可选的 prompt.name graphiti_core/llm_client/client.py:182-191
  5. 检查缓存;如果命中,则立即返回 graphiti_core/llm_client/client.py:194-197
  6. 调用 _generate_response_with_retry,该方法使用 Tenacity 重试逻辑包装了抽象的 _generate_response graphiti_core/llm_client/client.py:202-212
  7. 如果启用了缓存,则将结果存储到缓存中 graphiti_core/llm_client/client.py:214-216
抽象方法:_generate_response
@abstractmethod
async def _generate_response(
    self,
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int = DEFAULT_MAX_TOKENS,
    model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
    pass

来源:graphiti_core/llm_client/client.py:139-147

具体实现

具体客户端类比较表

上游 SDK默认主模型结构化输出方法
OpenAIClientopenaigpt-4.1-miniresponses.parse(推理)/ chat.completions(标准)
AzureOpenAILLMClientopenai(Azure)_(由调用者设置)_responses.parse(o1/o3/gpt-5)/ beta.chat.completions.parse(标准)
OpenAIGenericClientopenaigpt-4.1-minijson_schema 响应格式
AnthropicClientanthropicclaude-haiku-4-5-latest工具使用(_create_tool
GeminiClientgoogle-genaigemini-3-flash-previewresponse_mime_type=application/json
GroqClientgroqllama-3.1-70b-versatilejson_object 响应格式
GLiNER2Clientglinergliner_medium-v2.1本地模型推理

来源:graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118

OpenAI 系列(BaseOpenAIClientOpenAIClientAzureOpenAILLMClient

BaseOpenAIClient 持有 OpenAI 兼容 API 的共享逻辑 graphiti_core/llm_client/openai_base_client.py:40-58。它定义了两个抽象钩子:_create_structured_completion_create_completion

OpenAIClient 通过前缀(gpt-5o1o3)检测推理模型 graphiti_core/llm_client/openai_client.py:77-79。对于这些模型,它会调用 client.responses.parse graphiti_core/llm_client/openai_client.py:99;对于标准模型,它会调用 client.chat.completions.create,并设置 response_format={'type': 'json_object'} graphiti_core/llm_client/openai_client.py:119-125

AzureOpenAILLMClient 根据 _supports_reasoning_features(model) 将请求路由到 responses.parsebeta.chat.completions.parse graphiti_core/llm_client/azure_openai_client.py:74-104

OpenAIGenericClient

专为本地模型(Ollama、LM Studio)设计。它使用 json_schema 响应格式 graphiti_core/llm_client/openai_generic_client.py:115-121。默认 max_tokens 为 16,384,以确保兼容性 graphiti_core/llm_client/openai_generic_client.py:75-76

AnthropicClient

使用工具使用 API 进行结构化输出。_create_toolresponse_model 生成工具定义 graphiti_core/llm_client/anthropic_client.py:177-220。它通过 ANTHROPIC_MODEL_MAX_TOKENS 处理模型特定的 Token 限制 graphiti_core/llm_client/anthropic_client.py:75-97

GeminiClient

google-genai 集成。它通过 _check_safety_blocks 处理安全过滤器 graphiti_core/llm_client/gemini_client.py:128-152,并通过 _check_prompt_blocks 处理提示拦截 graphiti_core/llm_client/gemini_client.py:154-162。它支持 Gemini 2.5+ 模型的 thinking_config graphiti_core/llm_client/gemini_client.py:109-110

横切行为

通过 generate_response 的调用流程

Graphiti · 横切行为 · 图 2
Graphiti · 横切行为 · 图 2

来源:graphiti_core/llm_client/client.py:155-247

重试逻辑

客户端使用 Tenacity 进行自动重试。is_server_or_retry_error 决定某个异常(如 RateLimitError 或 5xx 状态码)是否需要进行重试 graphiti_core/llm_client/client.py:62-69

客户端策略尝试次数
LLMClient指数退避(5-120 秒)4 graphiti_core/llm_client/client.py:117-118
BaseOpenAIClient类常量2 graphiti_core/llm_client/openai_base_client.py:49
AnthropicClientSDK 内部1 graphiti_core/llm_client/anthropic_client.py:146
GeminiClient类常量2 graphiti_core/llm_client/gemini_client.py:93

来源:graphiti_core/llm_client/client.py:116-126, graphiti_core/llm_client/openai_base_client.py:49, graphiti_core/llm_client/anthropic_client.py:146, graphiti_core/llm_client/gemini_client.py:93

Token 追踪

TokenUsageTracker graphiti_core/llm_client/token_tracker.py 记录每个提示的使用情况。具体客户端在收到 API 响应后会记录使用情况,以追踪输入和输出 Token graphiti_core/llm_client/openai_base_client.py:127-130, graphiti_core/llm_client/anthropic_client.py:417-422

响应缓存

LLMCache graphiti_core/llm_client/cache.py 将响应存储在 ./llm_cachegraphiti_core/llm_client/client.py:35。缓存键是模型和消息的 MD5 哈希值 graphiti_core/llm_client/client.py:149-153

提供商到代码的映射

每个提供商的文件和类位置

Graphiti · 提供商到代码的映射 · 图 3
Graphiti · 提供商到代码的映射 · 图 3

来源:graphiti_core/llm_client/client.py:1-147, graphiti_core/llm_client/openai_base_client.py:1-38, graphiti_core/llm_client/anthropic_client.py:1-44, graphiti_core/llm_client/gemini_client.py:1-43, graphiti_core/llm_client/groq_client.py:1-34, graphiti_core/llm_client/gliner2_client.py:1-32