6.5 · 错误处理与回退机制

记忆管道与知识图谱构建 · 聚焦本章的模块关系、源码依据与实现要点。

项目Cognee 章节6.5 状态全文译文模块测试、发布与运维、检索、召回与索引、模型调用与提供方适配、配置治理

项目要点页2.5 参考项目项目章节目录Cognee DeepWiki 原始章节Error Handling and Fallbacks 上一章6.4 下一章7

源码线索

cognee/exceptions/__init__.py
cognee/exceptions/exceptions.py
cognee/infrastructure/databases/exceptions/__init__.py
cognee/infrastructure/databases/exceptions/exceptions.py
cognee/infrastructure/databases/vector/exceptions/exceptions.py
cognee/infrastructure/llm/__init__.py
cognee/infrastructure/llm/utils.py
cognee/modules/engine/operations/setup.py
cognee/modules/pipelines/layers/setup_and_check_environment.py
cognee/tests/unit/infrastructure/llm/test_embedding_connection_dimensions.py

模块标签

测试、发布与运维
检索、召回与索引
模型调用与提供方适配
配置治理
存储与持久化

章节正文

错误处理与回退机制

原始 DeepWiki 页面https://deepwiki.com/topoteretes/cognee/6.5-error-handling-and-fallbacks

错误处理与回退机制

基于 Tenacity 的重试机制

Cognee 使用 tenacity 库为大语言模型（LLM）API 调用实现指数退避重试逻辑。所有适配器实现都会在其 acreate_structured_output 方法上应用重试装饰器，以处理瞬时故障。

重试配置

重试装饰器的配置如下：

停止条件：128 秒后停止重试。
等待策略：带抖动的指数退避（8-128 秒）。
排除的错误：NotFoundError 和 AuthenticationError 不会触发重试，因为它们被视为不可恢复的配置问题。
日志记录：休眠事件以 DEBUG 级别记录。

来源： cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:109-117，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py:109-117

重试流程示意图

下图展示了 tenacity 重试逻辑如何与 API 请求和错误处理交互。

LLM 请求重试流程

Cognee · 重试流程示意图 · 图 1

来源： cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:109-205，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py:109-217

连接测试与自动恢复

Cognee 在环境设置阶段实现了主动连接检查，以确保在管线执行开始前，大语言模型（LLM）和嵌入向量服务是可访问的。

连接测试

在 setup_and_check_environment 过程中，Cognee 会执行以下检查：

LLM 连接：调用 test_llm_connection()，该方法会向 LLM 网关发起一个简单的结构化输出请求 cognee/infrastructure/llm/utils.py:81-94。
嵌入向量连接：调用 test_embedding_connection()，该方法使用配置的 vector_engine.embedding_engine 对示例文本进行嵌入向量化 cognee/infrastructure/llm/utils.py:116-134。
超时处理：两个测试都包裹在 CONNECTION_TEST_TIMEOUT_SECONDS（默认 30 秒）的超时机制中 cognee/infrastructure/llm/utils.py:14。

自动检测与回退

如果 test_embedding_connection() 成功，它会返回检测到的向量维度。系统随后调用 determine_embedding_dimensions()：

如果环境变量中显式设置了 EMBEDDING_DIMENSIONS，则将其作为权威来源 cognee/infrastructure/llm/utils.py:162-164。
否则，系统会自动同步 embedding_config 和活跃的 embedding_engine 维度，使其与提供商检测到的尺寸一致 cognee/infrastructure/llm/utils.py:170-182。

来源： cognee/modules/pipelines/layers/setup_and_check_environment.py:22-58，cognee/infrastructure/llm/utils.py:81-183

内容策略违规处理

Cognee 通过两层方法处理来自大语言模型（LLM）提供商的内容策略违规：检测策略违规，并在抛出错误前尝试回退模型。

检测到的策略违规错误

错误类型	来源	描述
`ContentFilterFinishReasonError`	OpenAI SDK	当 OpenAI 的内容过滤器阻止响应时抛出。
`ContentPolicyViolationError`	LiteLLM	当 LiteLLM 检测到内容策略违规时抛出。
`InstructorRetryException`	Instructor	当结构化输出提取失败时抛出；会检查是否包含策略相关字符串。

来源： cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:5-7，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py:9-11

内容策略处理逻辑

当检测到内容策略违规时，适配器会：

捕获 ContentFilterFinishReasonError、ContentPolicyViolationError 或 InstructorRetryException。
检查错误是否确实与内容策略相关（对于 InstructorRetryException，会验证错误消息中是否包含 "content management policy"）。
如果 LLMConfig 中配置了 fallback_model 和 fallback_api_key，则尝试回退。
使用回退模型配置重试相同的请求。
如果回退也失败或未配置回退，则抛出 ContentPolicyFilterError。

来源： cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py:164-205，cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py:167-217

回退模型配置

当主模型因内容策略违规而失败时，回退模型提供了替代的大语言模型（LLM）。它们通过环境变量进行配置，并存储在 LLMConfig 类中。

配置字段

LLMConfig 类定义了与回退相关的字段：

字段	类型	环境变量	描述
`fallback_api_key`	`str`	`FALLBACK_API_KEY`	回退模型的 API 密钥。
`fallback_endpoint`	`str`	`FALLBACK_ENDPOINT`	回退模型的端点 URL。
`fallback_model`	`str`	`FALLBACK_MODEL`	回退模型的标识符（例如 "gpt-3.5-turbo"）。

来源： cognee/infrastructure/llm/config.py:75-78

异常层级结构

Cognee 使用结构化的异常层级结构来区分校验错误、系统故障和瞬时问题。所有自定义异常都继承自 CogneeApiError，后者会自动在指定级别记录错误。

系统异常结构

Cognee · 异常层级结构 · 图 2

来源： cognee/exceptions/exceptions.py:7-92，cognee/infrastructure/databases/exceptions/exceptions.py:5-200

关键数据库异常

异常类	用途	HTTP 状态码
`DatabaseNotCreatedError`	在调用 `setup()` 之前访问数据库时抛出。	422 `cognee/infrastructure/databases/exceptions/exceptions.py:5-20`
`EntityNotFoundError`	当请求的记录不存在时抛出。	404 `cognee/infrastructure/databases/exceptions/exceptions.py:23-48`
`CacheConnectionError`	当 Redis/缓存后端不可达时抛出。	503 `cognee/infrastructure/databases/exceptions/exceptions.py:137-150`
`EmbeddingException`	嵌入向量生成失败时的通用错误。	422 `cognee/infrastructure/databases/exceptions/exceptions.py:89-105`
`MissingQueryParameterError`	当搜索调用时没有提供查询文本或向量时抛出。	400 `cognee/infrastructure/databases/exceptions/exceptions.py:107-120`

来源： cognee/infrastructure/databases/exceptions/exceptions.py:5-200

API 与配置错误恢复

Cognee 提供了从配置错误或大数据载荷中恢复的机制：

使用日志回退：_log_usage_async 函数设计为静默处理自身内部错误。如果缓存引擎（Redis）不可用，它会记录一条警告，但允许主函数执行继续而不中断 cognee/shared/usage_logger.py:178-193。
配置校验：config._update_config 方法在更新运行时设置之前，使用 hasattr() 执行属性校验。如果提供了无效属性，它会抛出 InvalidConfigAttributeError，而不是允许系统状态损坏 cognee/api/v1/config/config.py:189-210。
连接测试跳过：用户可以将 COGNEE_SKIP_CONNECTION_TEST 设置为 true，以在隔离或受限环境中运行时跳过 LLM 和嵌入向量的可达性检查 cognee/modules/pipelines/layers/setup_and_check_environment.py:38-46。

来源： cognee/shared/usage_logger.py:178-193，cognee/api/v1/config/config.py:189-210，cognee/modules/pipelines/layers/setup_and_check_environment.py:38-46