6.2 · 嵌入服务（Embedding Services）

记忆管道与知识图谱构建 · 聚焦本章的模块关系、源码依据与实现要点。

项目Cognee 章节6.2 状态全文译文模块模型调用与提供方适配、系统架构、检索、召回与索引、存储与持久化

项目要点页2.5 参考项目项目章节目录Cognee DeepWiki 原始章节Embedding Services 上一章6.1 下一章6.3

源码线索

cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
cognee/infrastructure/databases/vector/embeddings/OpenAICompatibleEmbeddingEngine.py
cognee/infrastructure/databases/vector/embeddings/config.py
cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py
cognee/tasks/storage/index_data_points.py
cognee/tests/unit/infrastructure/databases/vector/test_embedding_config.py
cognee/tests/unit/infrastructure/test_embedding_context_window_fallbacks.py
cognee/tests/unit/infrastructure/test_openai_compatible_embedding_engine.py

模块标签

模型调用与提供方适配
系统架构
检索、召回与索引
存储与持久化
接口与服务契约

章节正文

嵌入服务

原始 DeepWiki 页面https://deepwiki.com/topoteretes/cognee/6.2-embedding-services

嵌入向量服务

架构总览

EmbeddingEngine 是一个核心基础设施组件，所有向量数据库适配器（PGVectorAdapter、LanceDBAdapter 等）和图谱索引系统都会使用它。该组件通过 EmbeddingEngine 基类定义的统一接口，抽象了不同的嵌入向量提供方 cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py:1-25。

组件依赖流程

Cognee · 组件依赖流程 · 图 1

来源： cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py:9-38、cognee/infrastructure/databases/vector/embeddings/config.py:62-85、cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py:42-128、cognee/tasks/storage/index_data_points.py:10-28

配置与维度解析

嵌入向量配置通过 EmbeddingConfig 类进行管理。Cognee 的一个关键特性是能够自动推导嵌入向量维度，以防止向量数据库中出现形状不匹配的问题。

维度自动解析

_resolve_embedding_dimensions 函数通过查询 litellm 和 fastembed 中的注册表，对给定提供方和模型的维度进行尽力查找 cognee/infrastructure/databases/vector/embeddings/config.py:19-59。如果模型未知，则会回退到默认值 3072（与 OpenAI 的 text-embedding-3-large 一致）并记录一条警告 cognee/infrastructure/databases/vector/embeddings/config.py:87-102。

关键配置参数

变量	默认值	描述
`EMBEDDING_PROVIDER`	`"openai"`	提供方服务（openai、ollama、fastembed、openai_compatible）
`EMBEDDING_MODEL`	`"openai/text-embedding-3-large"`	提供方使用的模型标识符
`EMBEDDING_DIMENSIONS`	`None`（自动推导）	输出向量大小。如果未设置则自动推导。
`EMBEDDING_BATCH_SIZE`	`36`	单次 API 调用中嵌入的文本数量
`EMBEDDING_ENDPOINT`	`None`	自定义 API 端点（Ollama/本地服务器必需）
`MOCK_EMBEDDING`	`"false"`	如果为 "true"，则返回零向量而非调用 API

来源： cognee/infrastructure/databases/vector/embeddings/config.py:62-108、cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:83-86、cognee/infrastructure/databases/vector/embeddings/OpenAICompatibleEmbeddingEngine.py:102-103

EmbeddingEngine 实现

1. LiteLLMEmbeddingEngine

默认引擎。它使用 litellm 库为众多云服务提供方提供统一接口。

递归切分： 如果文本超出上下文窗口，它会递归切分文本并对生成的嵌入向量进行池化（对向量求平均）cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:161-189。
速率限制： 集成了 embedding_rate_limiter_context_manager 以防止 API 限流 cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:140-140。

2. OllamaEmbeddingEngine

针对本地 Ollama 实例进行了优化。

直接 HTTP： 使用 aiohttp 直接与 Ollama 的 /api/embed 端点通信 cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py:181-185。
错误处理： 检测响应错误中的上下文长度模式（例如 "context length"、"too long"）并触发递归切分 cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py:114-147。

3. OpenAICompatibleEmbeddingEngine

专为暴露标准 OpenAI /v1/embeddings API 的本地服务器设计（例如 llama.cpp、vLLM、TEI）。

直接 SDK： 直接使用 openai.AsyncOpenAI，以避免 litellm 参数注入问题（例如发送 encoding_format: null）导致某些本地服务器崩溃 cognee/infrastructure/databases/vector/embeddings/OpenAICompatibleEmbeddingEngine.py:1-13。
端点规范化： 自动确保基础 URL 以 /v1 结尾，并去除用户提供的 /embeddings 部分 cognee/infrastructure/databases/vector/embeddings/OpenAICompatibleEmbeddingEngine.py:107-112。

4. FastembedEmbeddingEngine

一个本地、CPU 优化的嵌入向量引擎，使用 fastembed 库。

本地执行： 在本地运行，无需外部服务器，利用 TextEmbedding cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py:78-78。
上下文回退： 当 fastembed 抛出上下文窗口错误时，实现与云端引擎类似的递归切分 cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py:127-164。

来源： cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:40-103、cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py:34-83、cognee/infrastructure/databases/vector/embeddings/OpenAICompatibleEmbeddingEngine.py:56-112、cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py:41-83

数据流：从文本到向量空间

下图展示了 Cognee 如何将自然语言文本字段映射为向量表示，并在系统中进行索引。

Cognee · 数据流：从文本到向量空间 · 图 2

来源： cognee/tasks/storage/index_data_points.py:10-68、cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:111-159、cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py:107-154

Token 化与上下文管理

每个 EmbeddingEngine 都通过 get_tokenizer() 提供 Token 化器，以确保文本被正确准备并对照模型限制进行测量。

Token 化器适配器

TikTokenTokenizer： 用于 OpenAI 模型 cognee/infrastructure/llm/tokenizer/TikToken/adapter.py:7-25。
HuggingFaceTokenizer： 使用 transformers.AutoTokenizer 处理 Mistral 或本地模型 cognee/infrastructure/llm/tokenizer/HuggingFace/adapter.py:6-33。
MistralTokenizer： Mistral Token 化逻辑的专用适配器 cognee/infrastructure/llm/tokenizer/Mistral/adapter.py:5-20。

处理超大输入

引擎通过递归切分与池化处理超出上下文窗口的输入：

如果一批字符串失败，则将其对半切分并重试 cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:162-169。
如果单个字符串失败，则将其切分为重叠的三部分（以保持语义上下文）。各部分分别嵌入，然后对其向量求平均（池化）cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:173-188。

来源： cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py:161-188、cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py:124-147、cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py:138-164