6 · LLM 与 Embeddings（LLM and Embeddings）

记忆管道与知识图谱构建 · 聚焦本章的模块关系、源码依据与实现要点。

项目Cognee 章节6 状态全文译文模块模型调用与提供方适配、系统架构、接口与服务契约、界面与交互

项目要点页2.5 参考项目项目章节目录Cognee DeepWiki 原始章节LLM and Embeddings 上一章5.7 下一章6.1

源码线索

.env.template
README.md
assets/cognee_benefits.png
cognee/api/v1/config/config.py
cognee/infrastructure/llm/config.py
cognee/infrastructure/llm/structured_output_framework/baml/baml_src/extraction/acreate_structured_output.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/azure_openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/bedrock/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py

模块标签

模型调用与提供方适配
系统架构
接口与服务契约
界面与交互
配置治理

章节正文

LLM 与 Embeddings

原始 DeepWiki 页面https://deepwiki.com/topoteretes/cognee/6-llm-and-embeddings

大语言模型（LLM）与嵌入向量

目的与范围

本文档介绍 Cognee 的大语言模型（LLM）集成层，该层为多个大语言模型提供商（OpenAI、Anthropic、Gemini、Mistral、Ollama、Bedrock、LlamaCpp）提供了统一的接口，用于结构化输出生成、文本嵌入向量和多模态处理。系统采用适配器模式，包含各提供商的具体实现，通过 instructor 或 BAML 框架强制使用 Pydantic 模式，并集成了全面的错误处理机制，包括重试、备用模型和速率限制。

有关配置的详细信息，请参见大语言模型提供商配置。有关嵌入向量服务的信息，请参见嵌入向量服务。有关结构化输出的详细信息，请参见结构化输出框架。有关架构细节，请参见大语言模型适配器架构。有关弹性特性，请参见错误处理与备用机制。

架构总览

Cognee 的大语言模型系统由三个主要层组成：

配置层：LLMConfig 类从环境变量加载设置。
适配器层：各提供商对 LLMInterface 协议的具体实现。
结构化输出层：与 instructor 或 BAML 框架集成，实现类型安全的响应。

高层架构

Cognee · 高层架构 · 图 1

来源：cognee/infrastructure/llm/config.py:15-88，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py:70-93，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:37-107

大语言模型配置

LLMConfig 类是一个 Pydantic 设置模型，从环境变量加载大语言模型配置。它处理从 API 密钥到速率限制阈值以及结构化输出框架选择的所有内容。

字段	默认值	描述
`structured_output_framework`	"instructor"	"instructor" 或 "baml"
`llm_provider`	"openai"	提供商标识符
`llm_model`	"openai/gpt-5-mini"	模型标识符（litellm 格式）
`llm_api_key`	None	API 认证密钥
`llm_max_completion_tokens`	16384	最大输出 Token 数
`llm_rate_limit_enabled`	False	启用速率限制

特殊行为：

引号去除：strip_quotes_from_strings 校验器会去除环境变量中的引号 cognee/infrastructure/llm/config.py:91-125。
Ollama 校验：ensure_env_vars_for_ollama 确保在使用 Ollama 时设置了 LLM_ENDPOINT 等必需变量 cognee/infrastructure/llm/config.py:155-200。

有关详细信息，请参见大语言模型提供商配置。

来源：cognee/infrastructure/llm/config.py:15-200，.env.template:6-91

提供商工厂与接口

系统通过 get_llm_client() 使用工厂模式，根据配置实例化正确的适配器。该工厂会被缓存以提高性能 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py:198-200。

所有适配器必须实现 LLMInterface 协议，该协议主要要求实现 acreate_structured_output 方法，用于从大语言模型生成类型安全的响应 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py:8-35。

有关详细信息，请参见大语言模型适配器架构。

来源：cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py:160-200，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py:8-35

适配器实现

Cognee 为各种大语言模型生态系统提供了专门的适配器：

GenericAPIAdapter：兼容 LiteLLM 的提供商的基础类，实现了通用的重试和备用逻辑 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py:58-115。
OpenAIAdapter：专门针对 GPT 模型，包含对 gpt-5 系列的特殊处理 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:37-107。
AnthropicAdapter：使用原生 Anthropic SDK 并结合 instructor 补丁 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py:29-67。
MistralAdapter：与 Mistral AI 集成，并支持原生转录功能 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py:32-73。
OllamaAPIAdapter：通过 Ollama 的 OpenAI 兼容端点实现本地大语言模型使用 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py:33-77。

有关详细信息，请参见大语言模型适配器架构。

来源：cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:37-107，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py:29-67，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py:32-73

结构化输出框架

Cognee 使用两个主要框架确保大语言模型输出符合 Pydantic 模型的结构化和有效性：

Instructor：默认框架，包装提供商客户端以处理 JSON 提取和校验 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:96-104。
BAML：用于高性能提取的替代框架。启用时，它使用 ClientRegistry 和动态类型构建 cognee/infrastructure/llm/structured_output_framework/baml/baml_src/extraction/acreate_structured_output.py:36-76。

有关详细信息，请参见结构化输出框架。

来源：cognee/infrastructure/llm/config.py:42-58，cognee/infrastructure/llm/structured_output_framework/baml/baml_src/extraction/acreate_structured_output.py:36-76

错误处理与备用机制

Cognee 专为生产环境可靠性设计，内置了弹性机制：

重试逻辑：所有大语言模型调用都使用 tenacity 重试机制，并采用指数退避策略 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:109-117。
速率限制：集成的 llm_rate_limiter_context_manager 可防止 API 过度使用 cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:144-145。
备用模型：当发生内容策略违规或特定 API 错误时，系统可以自动切换到 fallback_model cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:164-190。

有关详细信息，请参见错误处理与备用机制。

来源：cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:109-190，cognee/infrastructure/llm/config.py:64-71

嵌入向量服务

嵌入向量通过 EmbeddingEngine（在子页面中记录）管理，并通过 LLMConfig 进行配置。

提供商支持：支持 OpenAI、FastEmbed、Ollama 等 .env.template:19-22。
配置项：维度、批处理大小和速率限制均可配置 .env.template:70-80。

有关详细信息，请参见嵌入向量服务。

来源：.env.template:19-22，cognee/infrastructure/llm/config.py:68-71