16.1 · Embedchain 总览（Embedchain Overview）

长期记忆与上下文管理 · 聚焦本章的模块关系、源码依据与实现要点。

项目Mem0 章节16.1 状态全文译文模块模型调用与提供方适配、系统架构、记忆与上下文、检索、召回与索引

项目要点页2.5 参考项目项目章节目录Mem0 DeepWiki 原始章节Embedchain Overview 上一章16 下一章16.2

源码线索

.gitignore
embedchain/configs/chroma.yaml
embedchain/configs/full-stack.yaml
embedchain/configs/opensearch.yaml
embedchain/docs/api-reference/advanced/configuration.mdx
embedchain/docs/components/llms.mdx
embedchain/docs/examples/rest-api/create.mdx
embedchain/docs/get-started/faq.mdx
embedchain/embedchain/config/llm/base.py
embedchain/embedchain/config/model_prices_and_context_window.json

模块标签

模型调用与提供方适配
系统架构
记忆与上下文
检索、召回与索引
测试、发布与运维

章节正文

Embedchain 总览

原始 DeepWiki 页面https://deepwiki.com/mem0ai/mem0/16.1-embedchain-overview

Embedchain 概述

目的与范围

本文档介绍 Embedchain，这是 Mem0 项目之前的旧版检索增强生成（RAG）框架。Embedchain 提供了加载、片段切分、嵌入向量生成以及使用向量数据库和大语言模型（LLM）查询非结构化数据源的基础架构。

Embedchain 目前处于维护模式，因为该项目已演进为 Mem0，后者提供了更复杂的记忆层架构。有关现代 Mem0 系统的信息，请参阅核心架构文档。有关 Embedchain 特定的配置详情，请参阅 Embedchain 配置。有关支持的数据源，请参阅 Embedchain 数据源。

什么是 Embedchain

Embedchain 被描述为"最简单的开源检索（RAG）框架"embedchain/pyproject.toml:4。它提供了构建 RAG 应用的完整管线，能够基于自定义数据源回答问题。

核心能力

Embedchain 使开发者能够：

入库多种数据源 - 网页、PDF、YouTube 视频、文档等embedchain/docs/components/llms.mdx:43。
自动片段切分 - 将文档拆分为语义上有意义的片段，支持可配置的 chunk_size 和 chunk_overlapembedchain/docs/api-reference/advanced/configuration.mdx:60-65。
向量嵌入 - 使用 OpenAI 等提供商生成嵌入向量embedchain/embedchain/embedder/openai.py:12-14。
向量存储 - 在 Chroma 或 Qdrant 等多种向量数据库中存储嵌入向量embedchain/docs/api-reference/advanced/configuration.mdx:46-51。
大语言模型查询 - 使用从向量数据库检索到的上下文来回答问题embedchain/embedchain/config/llm/base.py:111-114。
缓存 - 可选的 gptcache 集成，用于响应缓存embedchain/pyproject.toml:106。

包信息

Embedchain 包通过 PyPI 分发，关键元数据如下：

属性	值
包名	`embedchain`
当前版本	0.1.128 `embedchain/pyproject.toml:3`
许可证	Apache 许可证 `embedchain/pyproject.toml:9`
Python 支持	>=3.9, <=3.13.2 `embedchain/pyproject.toml:96`
核心依赖	`langchain`, `openai`, `chromadb`, `mem0ai` `embedchain/pyproject.toml:98-108`

来源： embedchain/pyproject.toml:1-18, embedchain/pyproject.toml:95-108

架构总览

Embedchain 遵循传统的 RAG 管线架构，组件可插拔。

自然语言到代码实体空间

下图将高级 RAG 概念映射到 Embedchain 代码库中实现的特定类和文件。

Mem0 · 自然语言到代码实体空间 · 图 1

来源： embedchain/embedchain/config/llm/base.py:111-142, embedchain/embedchain/llm/openai.py:18-25, embedchain/embedchain/embedder/openai.py:12-14, embedchain/docs/api-reference/advanced/configuration.mdx:46-51

配置系统

Embedchain 使用分层配置系统，通过 YAML、JSON 或 Python 字典支持多个组件embedchain/docs/api-reference/advanced/configuration.mdx:10-12。

大语言模型配置

BaseLlmConfig 类embedchain/embedchain/config/llm/base.py:111-142管理大语言模型的行为：

关键配置参数：

参数	类型	默认值	描述
`number_documents`	int	3	要检索的上下文文档数量`embedchain/embedchain/config/llm/base.py:118`
`temperature`	float	0	采样温度（0-1）`embedchain/embedchain/config/llm/base.py:122`
`max_tokens`	int	1000	最大生成 Token 数`embedchain/embedchain/config/llm/base.py:123`
`model`	str	None	模型标识符`embedchain/embedchain/config/llm/base.py:121`
`stream`	bool	False	启用流式响应`embedchain/embedchain/config/llm/base.py:125`
`token_usage`	bool	False	跟踪 Token 使用量和成本`embedchain/embedchain/config/llm/base.py:127`

提示模板

Embedchain 包含几个内置的提示模板embedchain/embedchain/config/llm/base.py:15-104：

DEFAULT_PROMPT - 带上下文的基本问答embedchain/embedchain/config/llm/base.py:15-29。
DEFAULT_PROMPT_WITH_HISTORY - 包含对话历史embedchain/embedchain/config/llm/base.py:31-52。
DEFAULT_PROMPT_WITH_MEM0_MEMORY - 集成 Mem0 记忆embedchain/embedchain/config/llm/base.py:54-81。
DOCS_SITE_DEFAULT_PROMPT - 专为文档站点设计embedchain/embedchain/config/llm/base.py:83-99。

来源： embedchain/embedchain/config/llm/base.py:111-277, embedchain/embedchain/config/llm/base.py:54-81

大语言模型提供商集成

Embedchain 通过 LangChain 与多个大语言模型提供商集成。

支持的大语言模型提供商：

OpenAI：主要提供商，支持 gpt-4o-mini 等模型embedchain/embedchain/llm/openai.py:18-25。
Azure OpenAI：企业级集成，需要 deployment_nameembedchain/embedchain/llm/azure_openai.py:12-24。
Google AI：支持 Gemini 模型embedchain/docs/components/llms.mdx:144-175。
Anthropic：集成 Claude 模型embedchain/docs/components/llms.mdx:225-250。
其他：Cohere、Together、Ollama、GPT4All、AWS Bedrock 等embedchain/docs/components/llms.mdx:9-28。

模型定价数据库

Embedchain 包含一个定价数据库embedchain/embedchain/config/model_prices_and_context_window.json，用于跟踪：

模型的 Token 限制和每 Token 成本，例如 openai/gpt-4oembedchain/embedchain/config/model_prices_and_context_window.json:9-15。
嵌入模型的输出向量大小，例如 text-embedding-3-large 为 3072embedchain/embedchain/config/model_prices_and_context_window.json:156。

来源： embedchain/embedchain/config/model_prices_and_context_window.json:1-220, embedchain/embedchain/llm/openai.py:28-46

嵌入器实现

Embedchain 为多个提供商提供了嵌入器实现。

嵌入器架构

下图展示了基于 OpenAI 的嵌入配置与实现之间的关系。

Mem0 · 嵌入器架构 · 图 2

来源： embedchain/embedchain/embedder/openai.py:12-44, embedchain/embedchain/config/llm/base.py:111-142

与 Mem0 的关系

Embedchain 已演进为 Mem0，代表了从 RAG 框架到持久化记忆层的根本性架构转变。

依赖关系

Embedchain 现在将 mem0ai 作为核心依赖embedchain/pyproject.toml:108：

mem0ai = "^0.1.54"

这表明 Embedchain 正在作为兼容层被维护，同时用户正在迁移到 Mem0。DEFAULT_PROMPT_WITH_MEM0_MEMORY 模板embedchain/embedchain/config/llm/base.py:54-81的引入明确地桥接了两个系统，允许 Embedchain 的 query 方法将 Mem0 检索到的记忆注入到大语言模型上下文中。

迁移路径到 Mem0

Mem0 相比 Embedchain 提供了多项优势，包括智能记忆提取和基于图的关系建模。

配置映射：

Embedchain 实体	Mem0 对应项
`BaseLlmConfig`	`LlmConfig`
`BaseEmbedderConfig`	`EmbedderConfig`
`App.add()`	`Memory.add()`
`App.query()`	`Memory.search()`

来源： embedchain/pyproject.toml:108, embedchain/embedchain/config/llm/base.py:54-81

旧版支持

虽然 Embedchain 处于维护模式，但它会继续接收针对当前 Python 版本（3.9-3.13.2）embedchain/pyproject.toml:96和 LangChain 0.3.xembedchain/pyproject.toml:98的更新。

该包在构建时会排除某些目录，以保持精简的分发embedchain/pyproject.toml:11-15：

exclude = [
    "db",
    "configs",
    "notebooks"
]

来源： embedchain/pyproject.toml:11-15, embedchain/pyproject.toml:95-144