16.2 · Embedchain 配置（Embedchain Configuration）

长期记忆与上下文管理 · 聚焦本章的模块关系、源码依据与实现要点。

项目Mem0 章节16.2 状态全文译文模块文档对象与元数据、记忆与上下文、模型调用与提供方适配、配置治理

项目要点页2.5 参考项目项目章节目录Mem0 DeepWiki 原始章节Embedchain Configuration 上一章16.1 下一章16.3

源码线索

embedchain/docs/api-reference/app/add.mdx
embedchain/embedchain/chunkers/base_chunker.py
embedchain/embedchain/config/llm/base.py
embedchain/embedchain/config/model_prices_and_context_window.json
embedchain/embedchain/data_formatter/data_formatter.py
embedchain/embedchain/embedchain.py
embedchain/embedchain/loaders/base_loader.py
embedchain/embedchain/loaders/web_page.py
embedchain/embedchain/vectordb/chroma.py
embedchain/poetry.lock

模块标签

文档对象与元数据
记忆与上下文
模型调用与提供方适配
配置治理
系统架构

章节正文

Embedchain 配置

原始 DeepWiki 页面https://deepwiki.com/mem0ai/mem0/16.2-embedchain-configuration

Embedchain 配置

配置架构

Embedchain 使用分层配置系统，其中 EmbedChain 类（定义于 embedchain/embedchain/embedchain.py:38）充当中央编排器。它通过应用、大语言模型（LLM）、向量数据库和嵌入器（Embedder）的配置对象进行初始化。

组件初始化流程

下图展示了 EmbedChain 类在初始化过程中如何将高级配置名称与特定代码实体关联起来。

Mem0 · 组件初始化流程 · 图 1

来源：embedchain/embedchain/embedchain.py:38-62，embedchain/embedchain/config/llm/base.py:111-142

大语言模型（LLM）配置

BaseLlmConfig 类管理语言模型交互的参数。它包含随机性、Token 限制和提示模板的设置。

`BaseLlmConfig` 的关键参数

参数	类型	默认值	描述
`number_documents`	`int`	3	要检索的上下文文档数量 `embedchain/embedchain/config/llm/base.py:118`
`model`	`str`	`None`	具体的模型 ID（例如 `gpt-4o-mini`）`embedchain/embedchain/config/llm/base.py:121`
`temperature`	`float`	0	控制随机性（0 表示确定性）`embedchain/embedchain/config/llm/base.py:122`
`max_tokens`	`int`	1000	生成的最大 Token 数 `embedchain/embedchain/config/llm/base.py:123`
`stream`	`bool`	`False`	启用响应流式传输 `embedchain/embedchain/config/llm/base.py:125`
`system_prompt`	`str`	`None`	自定义系统指令 `embedchain/embedchain/config/llm/base.py:129`
`token_usage`	`bool`	`False`	是否返回 Token 使用量和成本 `embedchain/embedchain/config/llm/base.py:127`

默认提示模板

Embedchain 提供了几个内置的提示模板来指导大语言模型（LLM）的行为：

标准问答：DEFAULT_PROMPT embedchain/embedchain/config/llm/base.py:15-29
带历史记录：DEFAULT_PROMPT_WITH_HISTORY embedchain/embedchain/config/llm/base.py:31-52
Mem0 增强：DEFAULT_PROMPT_WITH_MEM0_MEMORY embedchain/embedchain/config/llm/base.py:54-81
开发者支持：DOCS_SITE_DEFAULT_PROMPT embedchain/embedchain/config/llm/base.py:83-99

来源：embedchain/embedchain/config/llm/base.py:111-142，embedchain/embedchain/config/llm/base.py:15-104

向量数据库配置

Embedchain 支持多种向量存储，其中 ChromaDB 是主要实现。配置涉及设置持久化目录和连接参数。

ChromaDB 配置示例

ChromaDB 类 embedchain/embedchain/vectordb/chroma.py:29 使用 ChromaDbConfig 来初始化客户端。它支持本地持久化和基于服务器的连接 embedchain/embedchain/vectordb/chroma.py:51-62。

Mem0 · ChromaDB 配置示例 · 图 2

来源：embedchain/embedchain/vectordb/chroma.py:32-64，embedchain/embedchain/vectordb/chroma.py:94-110

数据入库配置

通过 EmbedChain.add() 添加数据时，系统会使用 DataFormatter 根据 DataType 确定正确的加载器（Loader）和片段切分器（Chunker）。

入库数据流

检测：系统检测源的 DataType embedchain/embedchain/embedchain.py:162-181。
映射：DataFormatter 将 DataType 映射到特定的 BaseLoader 和 BaseChunker embedchain/embedchain/data_formatter/data_formatter.py:34-35。
加载：加载器（例如 WebPageLoader）获取原始内容并生成 doc_id embedchain/embedchain/loaders/web_page.py:15-32。
片段切分：片段切分器将内容分割成可管理的片段，按 min_chunk_size 过滤，并生成唯一的 chunk_id 值 embedchain/embedchain/chunkers/base_chunker.py:18-74。

支持的数据类型（部分列表）

PDF_FILE：使用 PdfFileLoader 和 PdfFileChunker embedchain/embedchain/data_formatter/data_formatter.py:63,111
WEB_PAGE：使用 WebPageLoader 和 WebPageChunker embedchain/embedchain/data_formatter/data_formatter.py:64,112
JSON：使用 JSONLoader 和 JSONChunker embedchain/embedchain/data_formatter/data_formatter.py:75,123
GMAIL：使用 GmailLoader 和 GmailChunker embedchain/embedchain/data_formatter/data_formatter.py:77,125

来源：embedchain/embedchain/embedchain.py:117-154，embedchain/embedchain/data_formatter/data_formatter.py:61-91，embedchain/embedchain/chunkers/base_chunker.py:18-74

模型元数据和定价

Embedchain 在 model_prices_and_context_window.json 中维护了一个模型能力注册表，包括上下文窗口大小和定价。

模型 ID	最大输入 Token 数	最大输出 Token 数	输入成本（每 Token）
`openai/gpt-4o`	128,000	4,096	$0.000005 `embedchain/embedchain/config/model_prices_and_context_window.json:13`
`openai/gpt-4o-mini`	128,000	4,096	$0.00000015 `embedchain/embedchain/config/model_prices_and_context_window.json:20`
`openai/text-embedding-3-large`	8,191	不适用（向量：3072）	$0.00000013 `embedchain/embedchain/config/model_prices_and_context_window.json:157`
`openai/gpt-3.5-turbo-0125`	16,385	4,096	$0.0000005 `embedchain/embedchain/config/model_prices_and_context_window.json:136`

来源：embedchain/embedchain/config/model_prices_and_context_window.json:9-22，embedchain/embedchain/config/model_prices_and_context_window.json:153-159，embedchain/embedchain/config/model_prices_and_context_window.json:132-138

环境和依赖

旧版 Embedchain 系统依赖于几个可选的额外组件来支持特定提供商，这些组件通过 pyproject.toml 进行管理。

开源：sentence-transformers、torch、gpt4all embedchain/pyproject.toml:161
向量存储：qdrant-client、chromadb、pymilvus、lancedb embedchain/pyproject.toml:101,121,122,162
云提供商：google-cloud-aiplatform、langchain-aws、langchain-google-vertexai embedchain/pyproject.toml:123,143,169
集成：gmail（google-api-python-client）、notion、slack embedchain/pyproject.toml:171-178

来源：embedchain/pyproject.toml:95-144，embedchain/pyproject.toml:160-185

16.2 · Embedchain 配置（Embedchain Configuration）

章节正文

Embedchain 配置

Embedchain 配置

配置架构

组件初始化流程

大语言模型（LLM）配置

BaseLlmConfig 的关键参数

默认提示模板

向量数据库配置

ChromaDB 配置示例

数据入库配置

入库数据流

支持的数据类型（部分列表）

模型元数据和定价

环境和依赖

`BaseLlmConfig` 的关键参数