LLM 客户端架构
大语言模型(LLM)客户端架构
相关源文件
本章引用的主要源码文件:
examples/azure-openai/azure_openai_neo4j.pyexamples/gliner2/.env.exampleexamples/gliner2/README.mdexamples/gliner2/gliner2_neo4j.pygraphiti_core/cross_encoder/gemini_reranker_client.pygraphiti_core/embedder/azure_openai.pygraphiti_core/embedder/gemini.pygraphiti_core/llm_client/anthropic_client.pygraphiti_core/llm_client/azure_openai_client.pygraphiti_core/llm_client/client.pygraphiti_core/llm_client/config.pygraphiti_core/llm_client/errors.pygraphiti_core/llm_client/gemini_client.pygraphiti_core/llm_client/gliner2_client.pygraphiti_core/llm_client/groq_client.pygraphiti_core/llm_client/openai_base_client.pygraphiti_core/llm_client/openai_client.pygraphiti_core/llm_client/openai_generic_client.pygraphiti_core/llm_client/token_tracker.pymcp_server/config/mcp_config_stdio_example.jsonmcp_server/src/services/factories.pytests/cross_encoder/test_gemini_reranker_client.pytests/llm_client/test_anthropic_client.pytests/llm_client/test_azure_openai_client.pytests/llm_client/test_errors.pytests/llm_client/test_gemini_client.pytests/test_text_utils.py
本文档记录了 graphiti-core 中的大语言模型(LLM)客户端子系统:包括抽象基类、配置对象、所有具体提供商实现,以及包裹每次大语言模型(LLM)调用的横切行为(重试逻辑、缓存、追踪和 Token 追踪)。
有关嵌入向量和重排序服务的集成,请参见 6.2。有关传入这些客户端的提示模板,请参见 6.3。有关如何端到端配置特定提供商,请参见 9.3。
类层次结构
Graphiti 中的所有大语言模型(LLM)客户端共享一个以 LLMClient 为根的继承树。
大语言模型(LLM)客户端的类层次结构
来源:graphiti_core/llm_client/client.py:71-147, graphiti_core/llm_client/openai_base_client.py:40-95, graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118
LLMConfig
LLMConfig 是传递给每个客户端构造函数的配置对象。它是一个普通的 Python 类(不是 Pydantic 模型)。
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
api_key | str | None | None | 提供商 API 密钥 |
model | str | None | None | 主模型标识符 |
small_model | str | None | None | 用于简单提示的较小/较便宜模型 |
base_url | str | None | None | 覆盖 API 基础 URL(例如,用于本地端点) |
temperature | float | 1.0 | 采样温度 graphiti_core/llm_client/config.py:20 |
max_tokens | int | 16384 | 最大输出 Token 数 graphiti_core/llm_client/config.py:19 |
ModelSize 是一个枚举,包含两个值:small 和 medium graphiti_core/llm_client/config.py:23-25。所有对 generate_response 的调用都接受一个 model_size 参数;客户端会将 ModelSize.small 路由到 small_model,将 ModelSize.medium 路由到 model。
来源:graphiti_core/llm_client/config.py:19-69
LLMClient 抽象基类
graphiti_core/llm_client/client.py:71-147 中的 LLMClient 是所有提供商实现的抽象基类。
构造函数
LLMClient(config: LLMConfig | None, cache: bool = False)
如果 config 为 None,则会使用默认的 LLMConfig() graphiti_core/llm_client/client.py:73-74。当 cache=True 时,会创建一个指向 ./llm_cache 的 LLMCache 实例 graphiti_core/llm_client/client.py:35, graphiti_core/llm_client/client.py:87-88。
generate_response — 公共接口
这是调用者的唯一公共入口点。其签名如下:
async generate_response(
messages: list[Message],
response_model: type[BaseModel] | None = None,
max_tokens: int | None = None,
model_size: ModelSize = ModelSize.medium,
group_id: str | None = None,
prompt_name: str | None = None,
) -> dict[str, Any]
基类实现按顺序执行以下步骤 graphiti_core/llm_client/client.py:155-247:
- 如果提供了
response_model,则将其 JSON 模式追加到最后一条消息中graphiti_core/llm_client/client.py:167-173。 - 将多语言提取指令(来自
get_extraction_language_instruction(group_id))追加到第一条消息中graphiti_core/llm_client/client.py:176。 - 对每条消息调用
_clean_input,以去除无效的 Unicode 和控制字符graphiti_core/llm_client/client.py:178-179。 - 打开一个追踪跨度(
llm.generate)并设置属性,包括llm.provider、model.size、max_tokens、cache.enabled,以及可选的prompt.namegraphiti_core/llm_client/client.py:182-191。 - 检查缓存;如果命中,则立即返回
graphiti_core/llm_client/client.py:194-197。 - 调用
_generate_response_with_retry,该方法使用 Tenacity 重试逻辑包装了抽象的_generate_responsegraphiti_core/llm_client/client.py:202-212。 - 如果启用了缓存,则将结果存储到缓存中
graphiti_core/llm_client/client.py:214-216。
抽象方法:_generate_response
@abstractmethod
async def _generate_response(
self,
messages: list[Message],
response_model: type[BaseModel] | None = None,
max_tokens: int = DEFAULT_MAX_TOKENS,
model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
pass
来源:graphiti_core/llm_client/client.py:139-147
具体实现
具体客户端类比较表
| 类 | 上游 SDK | 默认主模型 | 结构化输出方法 |
|---|---|---|---|
OpenAIClient | openai | gpt-4.1-mini | responses.parse(推理)/ chat.completions(标准) |
AzureOpenAILLMClient | openai(Azure) | _(由调用者设置)_ | responses.parse(o1/o3/gpt-5)/ beta.chat.completions.parse(标准) |
OpenAIGenericClient | openai | gpt-4.1-mini | json_schema 响应格式 |
AnthropicClient | anthropic | claude-haiku-4-5-latest | 工具使用(_create_tool) |
GeminiClient | google-genai | gemini-3-flash-preview | response_mime_type=application/json |
GroqClient | groq | llama-3.1-70b-versatile | json_object 响应格式 |
GLiNER2Client | gliner | gliner_medium-v2.1 | 本地模型推理 |
来源:graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118
OpenAI 系列(BaseOpenAIClient、OpenAIClient、AzureOpenAILLMClient)
BaseOpenAIClient 持有 OpenAI 兼容 API 的共享逻辑 graphiti_core/llm_client/openai_base_client.py:40-58。它定义了两个抽象钩子:_create_structured_completion 和 _create_completion。
OpenAIClient 通过前缀(gpt-5、o1、o3)检测推理模型 graphiti_core/llm_client/openai_client.py:77-79。对于这些模型,它会调用 client.responses.parse graphiti_core/llm_client/openai_client.py:99;对于标准模型,它会调用 client.chat.completions.create,并设置 response_format={'type': 'json_object'} graphiti_core/llm_client/openai_client.py:119-125。
AzureOpenAILLMClient 根据 _supports_reasoning_features(model) 将请求路由到 responses.parse 或 beta.chat.completions.parse graphiti_core/llm_client/azure_openai_client.py:74-104。
OpenAIGenericClient
专为本地模型(Ollama、LM Studio)设计。它使用 json_schema 响应格式 graphiti_core/llm_client/openai_generic_client.py:115-121。默认 max_tokens 为 16,384,以确保兼容性 graphiti_core/llm_client/openai_generic_client.py:75-76。
AnthropicClient
使用工具使用 API 进行结构化输出。_create_tool 从 response_model 生成工具定义 graphiti_core/llm_client/anthropic_client.py:177-220。它通过 ANTHROPIC_MODEL_MAX_TOKENS 处理模型特定的 Token 限制 graphiti_core/llm_client/anthropic_client.py:75-97。
GeminiClient
与 google-genai 集成。它通过 _check_safety_blocks 处理安全过滤器 graphiti_core/llm_client/gemini_client.py:128-152,并通过 _check_prompt_blocks 处理提示拦截 graphiti_core/llm_client/gemini_client.py:154-162。它支持 Gemini 2.5+ 模型的 thinking_config graphiti_core/llm_client/gemini_client.py:109-110。
横切行为
通过 generate_response 的调用流程
来源:graphiti_core/llm_client/client.py:155-247
重试逻辑
客户端使用 Tenacity 进行自动重试。is_server_or_retry_error 决定某个异常(如 RateLimitError 或 5xx 状态码)是否需要进行重试 graphiti_core/llm_client/client.py:62-69。
| 客户端 | 策略 | 尝试次数 |
|---|---|---|
LLMClient | 指数退避(5-120 秒) | 4 graphiti_core/llm_client/client.py:117-118 |
BaseOpenAIClient | 类常量 | 2 graphiti_core/llm_client/openai_base_client.py:49 |
AnthropicClient | SDK 内部 | 1 graphiti_core/llm_client/anthropic_client.py:146 |
GeminiClient | 类常量 | 2 graphiti_core/llm_client/gemini_client.py:93 |
来源:graphiti_core/llm_client/client.py:116-126, graphiti_core/llm_client/openai_base_client.py:49, graphiti_core/llm_client/anthropic_client.py:146, graphiti_core/llm_client/gemini_client.py:93
Token 追踪
TokenUsageTracker graphiti_core/llm_client/token_tracker.py 记录每个提示的使用情况。具体客户端在收到 API 响应后会记录使用情况,以追踪输入和输出 Token graphiti_core/llm_client/openai_base_client.py:127-130, graphiti_core/llm_client/anthropic_client.py:417-422。
响应缓存
LLMCache graphiti_core/llm_client/cache.py 将响应存储在 ./llm_cache 中 graphiti_core/llm_client/client.py:35。缓存键是模型和消息的 MD5 哈希值 graphiti_core/llm_client/client.py:149-153。
提供商到代码的映射
每个提供商的文件和类位置
来源:graphiti_core/llm_client/client.py:1-147, graphiti_core/llm_client/openai_base_client.py:1-38, graphiti_core/llm_client/anthropic_client.py:1-44, graphiti_core/llm_client/gemini_client.py:1-43, graphiti_core/llm_client/groq_client.py:1-34, graphiti_core/llm_client/gliner2_client.py:1-32