agentic_huge_data_base / wiki
页面 Dify · 3.3 存储后端与向量数据库配置·DeepWiki 中文全文译文

3.3 · 存储后端与向量数据库配置(Storage Backends and Vector Database Configuration)

应用编排与外部知识接入 · 聚焦本章的模块关系、源码依据与实现要点。

项目Dify 章节3.3 状态全文译文 模块检索、召回与索引、系统架构、测试、发布与运维、存储与持久化
源码线索
  • api/.env.example
  • api/app.py
  • api/app_factory.py
  • api/configs/feature/__init__.py
  • api/configs/middleware/__init__.py
  • api/configs/observability/__init__.py
  • api/configs/observability/otel/otel_config.py
  • api/configs/packaging/__init__.py
  • api/controllers/console/datasets/datasets.py
  • api/core/plugin/backwards_invocation/model.py
模块标签
  • 检索、召回与索引
  • 系统架构
  • 测试、发布与运维
  • 存储与持久化
  • 配置治理

章节正文

存储后端与向量数据库配置

存储后端与向量数据库配置

相关源文件

本章引用的主要源码文件:

  • api/.env.example
  • api/app.py
  • api/app_factory.py
  • api/configs/feature/__init__.py
  • api/configs/middleware/__init__.py
  • api/configs/observability/__init__.py
  • api/configs/observability/otel/otel_config.py
  • api/configs/packaging/__init__.py
  • api/controllers/console/datasets/datasets.py
  • api/core/plugin/backwards_invocation/model.py
  • api/core/rag/datasource/keyword/jieba/jieba.py
  • api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py
  • api/core/rag/datasource/vdb/vector_factory.py
  • api/core/rag/datasource/vdb/vector_type.py
  • api/core/rag/retrieval/router/multi_dataset_function_call_router.py
  • api/core/rag/retrieval/router/multi_dataset_react_route.py
  • api/core/rag/splitter/fixed_text_splitter.py
  • api/core/rag/splitter/text_splitter.py
  • api/extensions/ext_compress.py
  • api/extensions/ext_otel.py
  • api/extensions/ext_storage.py
  • api/extensions/otel/instrumentation.py
  • api/extensions/storage/storage_type.py
  • api/factories/variable_factory.py
  • api/providers/vdb/vdb-couchbase/src/dify_vdb_couchbase/couchbase_vector.py
  • api/providers/vdb/vdb-elasticsearch/src/dify_vdb_elasticsearch/elasticsearch_vector.py
  • api/providers/vdb/vdb-huawei-cloud/src/dify_vdb_huawei_cloud/huawei_cloud_vector.py
  • api/providers/vdb/vdb-lindorm/src/dify_vdb_lindorm/lindorm_vector.py
  • api/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.py
  • api/providers/vdb/vdb-opensearch/src/dify_vdb_opensearch/opensearch_vector.py
  • api/providers/vdb/vdb-oracle/src/dify_vdb_oracle/oraclevector.py
  • api/providers/vdb/vdb-pgvector/src/dify_vdb_pgvector/pgvector.py
  • api/providers/vdb/vdb-relyt/src/dify_vdb_relyt/relyt_vector.py
  • api/providers/vdb/vdb-tablestore/src/dify_vdb_tablestore/tablestore_vector.py
  • api/providers/vdb/vdb-tidb-vector/src/dify_vdb_tidb_vector/tidb_vector.py
  • api/providers/vdb/vdb-upstash/src/dify_vdb_upstash/upstash_vector.py
  • api/providers/vdb/vdb-vastbase/src/dify_vdb_vastbase/vastbase_vector.py
  • api/pyproject.toml
  • api/tests/unit_tests/configs/test_dify_config.py
  • api/tests/unit_tests/core/rag/splitter/__init__.py
  • api/tests/unit_tests/core/rag/splitter/test_text_splitter.py
  • api/tests/unit_tests/core/workflow/graph_engine/test_table_runner.py
  • api/uv.lock
  • docker/.env.example
  • docker/README.md
  • docker/docker-compose-template.yaml
  • docker/docker-compose.middleware.yaml
  • docker/docker-compose.yaml
  • docker/envs/core-services/shared.env.example
  • docker/envs/infrastructure/nginx.env.example
  • docker/envs/security.env.example
  • docker/nginx/conf.d/default.conf.template
  • web/app/components/app/configuration/config-var/index.tsx
  • web/app/components/app/configuration/config-var/var-item.tsx
  • web/app/components/workflow/nodes/_base/components/variable/__tests__/output-var-list.spec.tsx
  • web/app/components/workflow/nodes/_base/components/variable/output-var-list.tsx
  • web/app/components/workflow/nodes/_base/components/variable/var-list.tsx
  • web/app/components/workflow/nodes/_base/hooks/use-output-var-list.ts
  • web/app/components/workflow/nodes/loop/components/loop-variables/item.tsx
  • web/app/components/workflow/nodes/start/components/var-item.tsx
  • web/app/components/workflow/nodes/start/components/var-list.tsx
  • web/app/components/workflow/nodes/variable-assigner/components/var-group-item.tsx
  • web/app/components/workflow/nodes/variable-assigner/components/var-list/index.tsx
  • web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx
  • web/app/components/workflow/panel/chat-variable-panel/type.ts
  • web/app/components/workflow/panel/env-panel/variable-modal.tsx
  • web/package.json
  • web/utils/var.ts

目的与范围

本文档描述了 Dify 的存储后端配置(用于文件存储)和向量数据库配置(用于知识库嵌入向量)。内容涵盖系统架构、支持的存储后端(23 种以上向量数据库和 12 种以上存储提供商)、配置方法,以及用于在运行时初始化这些系统的工厂模式。

Dify 对文件存储和向量搜索均采用可插拔架构,开发者可以通过修改环境变量来切换提供商 docker/.env.example:151-207

存储后端架构

概述

Dify 使用可插拔的存储后端系统来存储用户上传的文件、文档和生成的资源。该系统通过统一接口支持多个云提供商和本地存储,其中 Apache OpenDAL 作为主要的抽象层 api/.env.example:111-115

存储后端选择流程

Dify · 概述 · 图 1
Dify · 概述 · 图 1

来源:api/extensions/ext_storage.py:22-86api/extensions/storage/storage_type.py:4-19api/configs/middleware/__init__.py:70-77

存储配置与类型

存储系统使用基于 Pydantic 的配置模型(位于 api/configs/middleware/storage/),用于校验和解析环境变量。api/extensions/ext_storage.py 中的 Storage 类作为入口点,通过工厂模式实例化具体的提供商。

STORAGE_TYPE 值提供商类实现配置文件
opendalApache OpenDALOpenDALStorageopendal_storage_config.py
s3AWS S3AwsS3Storageamazon_s3_storage_config.py
azure-blobAzureAzureBlobStorageazure_blob_storage_config.py
aliyun-oss阿里云AliyunOssStoragealiyun_oss_storage_config.py
google-storageGoogle CloudGoogleCloudStoragegoogle_cloud_storage_config.py
tencent-cos腾讯云TencentCosStoragetencent_cos_storage_config.py
huawei-obs华为云HuaweiObsStoragehuawei_obs_storage_config.py
baidu-obs百度云BaiduObsStoragebaidu_obs_storage_config.py
volcengine-tos火山引擎VolcengineTosStoragevolcengine_tos_storage_config.py
oci-storageOracle CloudOracleOCIStorageoci_storage_config.py
supabaseSupabaseSupabaseStoragesupabase_storage_config.py
clickzetta-volumeClickZettaClickZettaVolumeStorageclickzetta_volume_storage_config.py
local本地文件系统(已废弃)OpenDALStorage(scheme='fs')-

来源:api/extensions/ext_storage.py:22-86api/extensions/storage/storage_type.py:4-19api/configs/middleware/__init__.py:53-67

OpenDAL 集成

OpenDAL 为 40 多种存储服务提供了统一接口。当 STORAGE_TYPE=opendal 时,通过 OPENDAL_SCHEME 选择方案 api/.env.example:111-115

OpenDAL 初始化模式

OpenDALStorage 类使用重试层和从环境变量中提取的动态关键字参数来初始化 opendal.Operator

# api/extensions/storage/opendal_storage.py
class OpenDALStorage(BaseStorage):
    def __init__(self, scheme: str, **kwargs):
        # OpenDAL Operator 的初始化逻辑
        # 使用 opendal.layers.RetryLayer

系统会解析以 OPENDAL_<SCHEME>_ 开头的环境变量,并将其转换为小写键名,用于 OpenDAL 操作器。

来源:api/pyproject.toml:191api/configs/middleware/storage/opendal_storage_config.pyapi/.env.example:113-115

向量数据库架构

概述

Dify 支持 23 种以上向量数据库实现。每种实现都注册为入口点,并通过 VectorFactory 进行实例化 api/core/rag/datasource/vdb/vector_factory.py

向量数据库初始化流程

Dify · 概述 · 图 2
Dify · 概述 · 图 2

来源:api/core/rag/datasource/vdb/vector_type.py:4-37api/configs/middleware/__init__.py:86-101api/pyproject.toml:203-241

支持的实现

Dify 采用基于工作区的插件架构来管理向量数据库,每个提供商都是 providers/vdb/* 下的独立包 api/pyproject.toml:56-58

数据库包名配置类
Weaviatedify-vdb-weaviateWeaviateConfig
Milvusdify-vdb-milvusMilvusConfig
PGVectordify-vdb-pgvectorPGVectorConfig
Qdrantdify-vdb-qdrantQdrantConfig
Elasticsearchdify-vdb-elasticsearchElasticsearchConfig
TiDB Vectordify-vdb-tidb-vectorTiDBVectorConfig
OceanBasedify-vdb-oceanbaseOceanBaseVectorConfig
Chromadify-vdb-chromaChromaConfig
Oracledify-vdb-oracleOracleConfig

来源:api/pyproject.toml:62-91api/configs/middleware/__init__.py:22-51

配置模式

向量数据库通过 VECTOR_STORE 环境变量进行配置 api/.env.example:205。每个数据库都有自己特定的配置块:

  • WeaviateWEAVIATE_ENDPOINTWEAVIATE_API_KEY api/.env.example:210-211
  • MilvusMILVUS_URIMILVUS_TOKEN docker/.env.example:186
  • PGVectorPGVECTOR_HOSTPGVECTOR_PORT docker/docker-compose.yaml:32

数据流:从文档到向量存储

下图展示了从高层文档入库到负责持久化的具体代码实体之间的桥梁。

Dify · 数据流:从文档到向量存储 · 图 3
Dify · 数据流:从文档到向量存储 · 图 3

来源:api/core/rag/datasource/vdb/vector_factory.pyapi/controllers/console/datasets/datasets.py:25api/models/dataset.py

配置汇总表

中间件配置类默认端口关键环境变量
PostgreSQLDatabaseConfig5432DB_HOSTDB_USERNAMEDB_PASSWORDDB_DATABASE
RedisRedisConfig6379REDIS_HOSTREDIS_PORTREDIS_PASSWORD
S3S3StorageConfig443S3_ENDPOINTS3_BUCKET_NAMES3_ACCESS_KEY
MilvusMilvusConfig19530MILVUS_URIMILVUS_TOKEN
WeaviateWeaviateConfig8080WEAVIATE_ENDPOINTWEAVIATE_API_KEY
QdrantQdrantConfig6333QDRANT_URLQDRANT_API_KEY

来源:api/configs/middleware/__init__.py:123-153api/configs/middleware/cache/redis_config.pyapi/.env.example:46-101api/.env.example:117-124api/.env.example:209-211