agentic_huge_data_base / wiki
页面 Onyx · 3.4 支持的数据源·DeepWiki 中文全文译文

3.4 · 支持的数据源

企业连接器与统一搜索 · 聚焦本章的模块关系、源码依据与实现要点。

项目Onyx 章节3.4 状态全文译文 模块文档对象与元数据、认证、权限与安全、接口与服务契约、模型调用与提供方适配
源码线索
  • backend/alembic/versions/3fc5d75723b3_add_doc_metadata_field_in_document_model.py
  • backend/alembic/versions/47a07e1a38f1_fix_invalid_model_configurations_state.py
  • backend/alembic/versions/7a70b7664e37_add_model_configuration_table.py
  • backend/alembic/versions/9a0296d7421e_add_is_auto_mode_to_llm_provider.py
  • backend/ee/onyx/connectors/perm_sync_valid.py
  • backend/ee/onyx/external_permissions/confluence/constants.py
  • backend/ee/onyx/external_permissions/confluence/doc_sync.py
  • backend/ee/onyx/external_permissions/confluence/group_sync.py
  • backend/ee/onyx/external_permissions/confluence/space_access.py
  • backend/ee/onyx/external_permissions/github/utils.py
模块标签
  • 文档对象与元数据
  • 认证、权限与安全
  • 接口与服务契约
  • 模型调用与提供方适配
  • 智能体运行时

章节正文

支持的数据源

支持的数据源

相关源文件

本章引用的主要源码文件:

  • backend/alembic/versions/3fc5d75723b3_add_doc_metadata_field_in_document_model.py
  • backend/alembic/versions/47a07e1a38f1_fix_invalid_model_configurations_state.py
  • backend/alembic/versions/7a70b7664e37_add_model_configuration_table.py
  • backend/alembic/versions/9a0296d7421e_add_is_auto_mode_to_llm_provider.py
  • backend/ee/onyx/connectors/perm_sync_valid.py
  • backend/ee/onyx/external_permissions/confluence/constants.py
  • backend/ee/onyx/external_permissions/confluence/doc_sync.py
  • backend/ee/onyx/external_permissions/confluence/group_sync.py
  • backend/ee/onyx/external_permissions/confluence/space_access.py
  • backend/ee/onyx/external_permissions/github/utils.py
  • backend/ee/onyx/external_permissions/gmail/doc_sync.py
  • backend/ee/onyx/external_permissions/google_drive/doc_sync.py
  • backend/ee/onyx/external_permissions/google_drive/permission_retrieval.py
  • backend/ee/onyx/external_permissions/jira/doc_sync.py
  • backend/ee/onyx/external_permissions/salesforce/postprocessing.py
  • backend/ee/onyx/external_permissions/salesforce/utils.py
  • backend/ee/onyx/external_permissions/sharepoint/doc_sync.py
  • backend/ee/onyx/external_permissions/sharepoint/group_sync.py
  • backend/ee/onyx/external_permissions/sharepoint/permission_utils.py
  • backend/ee/onyx/external_permissions/slack/doc_sync.py
  • backend/ee/onyx/external_permissions/slack/group_sync.py
  • backend/ee/onyx/external_permissions/slack/utils.py
  • backend/ee/onyx/external_permissions/sync_params.py
  • backend/ee/onyx/external_permissions/teams/doc_sync.py
  • backend/ee/onyx/external_permissions/utils.py
  • backend/onyx/access/models.py
  • backend/onyx/background/indexing/checkpointing_utils.py
  • backend/onyx/connectors/airtable/airtable_connector.py
  • backend/onyx/connectors/axero/connector.py
  • backend/onyx/connectors/bookstack/client.py
  • backend/onyx/connectors/clickup/connector.py
  • backend/onyx/connectors/confluence/connector.py
  • backend/onyx/connectors/confluence/onyx_confluence.py
  • backend/onyx/connectors/confluence/utils.py
  • backend/onyx/connectors/connector_runner.py
  • backend/onyx/connectors/discord/__init__.py
  • backend/onyx/connectors/discord/connector.py
  • backend/onyx/connectors/discourse/connector.py
  • backend/onyx/connectors/document360/connector.py
  • backend/onyx/connectors/egnyte/connector.py
  • backend/onyx/connectors/fireflies/connector.py
  • backend/onyx/connectors/gitbook/__init__.py
  • backend/onyx/connectors/gitbook/connector.py
  • backend/onyx/connectors/google_drive/connector.py
  • backend/onyx/connectors/google_drive/doc_conversion.py
  • backend/onyx/connectors/google_drive/file_retrieval.py
  • backend/onyx/connectors/google_drive/models.py
  • backend/onyx/connectors/google_utils/resources.py
  • backend/onyx/connectors/highspot/__init__.py
  • backend/onyx/connectors/highspot/client.py
  • backend/onyx/connectors/highspot/connector.py
  • backend/onyx/connectors/highspot/utils.py
  • backend/onyx/connectors/hubspot/connector.py
  • backend/onyx/connectors/hubspot/rate_limit.py
  • backend/onyx/connectors/interfaces.py
  • backend/onyx/connectors/linear/connector.py
  • backend/onyx/connectors/mock_connector/connector.py
  • backend/onyx/connectors/productboard/connector.py
  • backend/onyx/connectors/salesforce/connector.py
  • backend/onyx/connectors/salesforce/doc_conversion.py
  • backend/onyx/connectors/salesforce/onyx_salesforce.py
  • backend/onyx/connectors/salesforce/salesforce_calls.py
  • backend/onyx/connectors/salesforce/sqlite_functions.py
  • backend/onyx/connectors/salesforce/utils.py
  • backend/onyx/connectors/sharepoint/connector.py
  • backend/onyx/connectors/sharepoint/connector_utils.py
  • backend/onyx/connectors/slack/connector.py
  • backend/onyx/connectors/slack/onyx_retry_handler.py
  • backend/onyx/connectors/slack/onyx_slack_web_client.py
  • backend/onyx/connectors/slack/utils.py
  • backend/onyx/connectors/teams/connector.py
  • backend/onyx/connectors/teams/models.py
  • backend/onyx/connectors/teams/utils.py
  • backend/onyx/connectors/zendesk/connector.py
  • backend/onyx/onyxbot/slack/icons.py
  • backend/onyx/server/documents/standard_oauth.py
  • backend/onyx/tools/tool_implementations/mcp/mcp_client.py
  • backend/onyx/utils/subclasses.py
  • backend/onyx/utils/threadpool_concurrency.py
  • backend/scripts/decrypt.py
  • backend/tests/daily/connectors/airtable/test_airtable_basic.py
  • backend/tests/daily/connectors/discord/test_discord_connector.py
  • backend/tests/daily/connectors/fireflies/test_fireflies_connector.py
  • backend/tests/daily/connectors/fireflies/test_fireflies_data.json
  • backend/tests/daily/connectors/gitbook/test_gitbook_connector.py
  • backend/tests/daily/connectors/google_drive/conftest.py
  • backend/tests/daily/connectors/google_drive/consts_and_utils.py
  • backend/tests/daily/connectors/google_drive/test_admin_oauth.py
  • backend/tests/daily/connectors/google_drive/test_drive_perm_sync.py
  • backend/tests/daily/connectors/google_drive/test_link_visibility_filter.py
  • backend/tests/daily/connectors/google_drive/test_map_test_ids.py
  • backend/tests/daily/connectors/google_drive/test_resolver.py
  • backend/tests/daily/connectors/google_drive/test_sections.py
  • backend/tests/daily/connectors/google_drive/test_service_acct.py
  • backend/tests/daily/connectors/google_drive/test_user_1_oauth.py
  • backend/tests/daily/connectors/highspot/test_highspot_connector.py
  • backend/tests/daily/connectors/highspot/test_highspot_data.json
  • backend/tests/daily/connectors/hubspot/test_hubspot_connector.py
  • backend/tests/daily/connectors/salesforce/test_salesforce_connector.py
  • backend/tests/daily/connectors/salesforce/test_salesforce_data.json
  • backend/tests/daily/connectors/sharepoint/test_sharepoint_connector.py
  • backend/tests/daily/connectors/slack/test_slack_connector.py
  • backend/tests/daily/connectors/slack/test_slack_perm_sync.py
  • backend/tests/daily/connectors/teams/test_teams_connector.py
  • backend/tests/daily/connectors/utils.py
  • backend/tests/daily/connectors/zendesk/test_zendesk_connector.py
  • backend/tests/daily/connectors/zendesk/test_zendesk_data.json
  • backend/tests/external_dependency_unit/connectors/confluence/conftest.py
  • backend/tests/integration/connector_job_tests/sharepoint/conftest.py
  • backend/tests/integration/connector_job_tests/slack/slack_api_utils.py
  • backend/tests/unit/ee/onyx/external_permissions/confluence/test_space_access.py
  • backend/tests/unit/ee/onyx/external_permissions/salesforce/test_postprocessing.py
  • backend/tests/unit/ee/onyx/external_permissions/sharepoint/test_permission_utils.py
  • backend/tests/unit/onyx/connectors/airtable/test_airtable_index_all.py
  • backend/tests/unit/onyx/connectors/confluence/test_confluence_checkpointing.py
  • backend/tests/unit/onyx/connectors/confluence/test_onyx_confluence.py
  • backend/tests/unit/onyx/connectors/discord/test_discord_validation.py
  • backend/tests/unit/onyx/connectors/google_drive/__init__.py
  • backend/tests/unit/onyx/connectors/google_drive/test_slim_retrieval.py
  • backend/tests/unit/onyx/connectors/google_utils/test_impersonation_guard.py
  • backend/tests/unit/onyx/connectors/hubspot/test_hubspot_inline_associations.py
  • backend/tests/unit/onyx/connectors/jira/test_jira_permission_sync.py
  • backend/tests/unit/onyx/connectors/linear/test_linear_load_credentials.py
  • backend/tests/unit/onyx/connectors/salesforce/test_salesforce_custom_config.py
  • backend/tests/unit/onyx/connectors/salesforce/test_salesforce_sqlite.py
  • backend/tests/unit/onyx/connectors/salesforce/test_yield_doc_batches.py
  • backend/tests/unit/onyx/connectors/sharepoint/test_delta_checkpointing.py
  • backend/tests/unit/onyx/connectors/sharepoint/test_drive_matching.py
  • backend/tests/unit/onyx/connectors/sharepoint/test_fetch_site_pages.py
  • backend/tests/unit/onyx/connectors/sharepoint/test_hierarchy_helpers.py
  • backend/tests/unit/onyx/connectors/sharepoint/test_rest_client_context_caching.py
  • backend/tests/unit/onyx/connectors/teams/test_collect_teams.py
  • backend/tests/unit/onyx/connectors/test_connector_factory.py
  • backend/tests/unit/onyx/connectors/utils.py
  • backend/tests/unit/onyx/connectors/zendesk/test_zendesk_checkpointing.py
  • backend/tests/unit/onyx/connectors/zendesk/test_zendesk_rate_limit.py
  • web/src/app/craft/components/ConnectDataBanner.tsx
  • web/src/app/craft/components/ConnectorBannersRow.tsx
  • web/src/app/craft/v1/configure/components/ComingSoonConnectors.tsx
  • web/src/lib/connectors/AutoSyncOptionFields.tsx

目的与范围

本文档列出了 Onyx 可以连接的所有数据源,用于文档索引和检索。它记录了数据源的枚举、元数据、配置要求、认证方式以及后端实现细节。有关连接器框架和生命周期的信息,请参阅连接器框架概述。有关凭证管理的详细信息,请参阅凭证管理。有关配置这些连接器的管理界面,请参阅连接器管理界面

数据源枚举

所有支持的数据源都在 ValidSources 枚举中定义。该枚举是整个系统中连接器类型的唯一真实来源。

源文件: web/src/lib/types.ts:466-526

ValidSources 枚举包含 60 多个数据源,分为以下几类:

  • 知识库和 Wiki(Confluence、Notion、BookStack 等)
  • 云存储(Google Drive、Dropbox、S3 等)
  • 工单和任务管理(Jira、Zendesk、Linear 等)
  • 消息平台(Slack、Teams、Gmail 等)
  • 代码仓库(GitHub、GitLab、Bitbucket)
  • 销售平台(Salesforce、HubSpot、Gong)
  • 通用数据源(Web、File、Ingestion API)
  • 特殊数据源(FederatedSlack、CraftFile、UserFile)
数据源注册流程

下图展示了从界面的数据源字符串如何映射到后端的连接器实现类。

数据源映射逻辑

Onyx · 数据源注册流程 · 图 1
Onyx · 数据源注册流程 · 图 1

源文件: web/src/lib/types.ts:466-559, web/src/lib/sources.ts:77-451, web/src/lib/connectors/connectors.tsx:145-148, backend/onyx/connectors/factory.py:91-101

数据源分类

数据源按照 SourceCategory 枚举定义的类别进行组织。SOURCE_METADATA_MAP 将每个数据源与其类别、图标、显示名称和文档关联起来。

类别划分

系统根据功能领域对连接器进行分组,以简化管理员的设置体验。

数据源分类映射

Onyx · 类别划分 · 图 2
Onyx · 类别划分 · 图 2

源文件: web/src/lib/sources.ts:95-451, web/src/components/icons/icons.tsx:1-97

连接器配置

每个可配置的数据源在 connectorConfigs 中都有一个条目,定义了设置连接器所需的字段。配置使用类型安全的模式,并通过 Yup 进行校验。

配置模式

ConnectionConfiguration 接口定义了如何为每个数据源生成管理表单。

配置对象结构

Onyx · 配置模式 · 图 3
Onyx · 配置模式 · 图 3

源文件: web/src/lib/connectors/connectors.tsx:114-143, web/src/lib/connectors/connectors.tsx:17-112

实现细节:Google Drive

Google Drive 连接器支持递归文件夹遍历、权限同步和多种文件类型。

数据流:文件检索

连接器使用 crawl_folders_for_files backend/onyx/connectors/google_drive/file_retrieval.py:36 来遍历层级结构,并根据凭证类型使用 get_all_files_for_oauth backend/onyx/connectors/google_drive/file_retrieval.py:38get_all_files_in_my_drive_and_shared backend/onyx/connectors/google_drive/file_retrieval.py:40-41

Google Drive 遍历逻辑

Onyx · 数据流:文件检索 · 图 4
Onyx · 数据流:文件检索 · 图 4

源文件: backend/onyx/connectors/google_drive/connector.py:36-49, backend/onyx/connectors/google_drive/doc_conversion.py:33-34, backend/onyx/connectors/google_drive/file_retrieval.py:106-128

文档转换

文件通过 convert_drive_item_to_document backend/onyx/connectors/google_drive/doc_conversion.py:33 转换为 Onyx 的 Document 对象。对于 Google Docs,连接器使用 get_document_sections backend/onyx/connectors/google_drive/doc_conversion.py:27 提取章节。对于二进制文件(PDF、DOCX、PPTX),它使用 MediaIoBaseDownload backend/onyx/connectors/google_drive/doc_conversion.py:10 下载内容,并使用本地提取器(如 read_pdf_file backend/onyx/connectors/google_drive/doc_conversion.py:43)进行处理。

实现细节:Confluence

Confluence 连接器同时支持 Cloud 版和 Server/Data Center 版。它使用 OnyxConfluence backend/onyx/connectors/confluence/onyx_confluence.py:110,这是对 atlassian-python-api 库的封装。

检查点

ConfluenceConnector 实现了 CheckpointedConnector backend/onyx/connectors/confluence/connector.py:121。它将 next_page_url 存储在 ConfluenceCheckpoint backend/onyx/connectors/confluence/connector.py:108-109 中,以便从中断处恢复索引。

CQL 过滤

连接器构建复杂的 CQL(Confluence 查询语言)字符串,以按空间、页面 ID 或标签进行过滤 backend/onyx/connectors/confluence/connector.py:170-181

源文件: backend/onyx/connectors/confluence/connector.py:120-154, backend/onyx/connectors/confluence/onyx_confluence.py:110-157

实现细节:SharePoint

SharePoint 连接器使用 Microsoft Graph API 和 office365-rest-python-client 库。它支持索引文档检索和权限同步。

认证

SharePoint 支持多种认证方式,包括客户端密钥和基于证书的认证。load_credentials 方法 backend/onyx/connectors/sharepoint/connector.py:221-255 负责初始化 msal.ConfidentialClientApplication backend/onyx/connectors/sharepoint/connector.py:21GraphClient backend/onyx/connectors/sharepoint/connector.py:26

文档处理

连接器遍历 SharePoint 站点和驱动器,使用 DriveItemData.from_graph_json backend/onyx/connectors/sharepoint/connector.py:174 获取项目。它使用 extract_text_and_images backend/onyx/connectors/sharepoint/connector.py:76 处理文件内容,并使用 get_sharepoint_external_access backend/onyx/connectors/sharepoint/connector.py:74 进行权限映射。

源文件: backend/onyx/connectors/sharepoint/connector.py:21-34, backend/onyx/connectors/sharepoint/connector.py:155-172, backend/onyx/connectors/sharepoint/connector.py:221-255

实现细节:Slack

Slack 连接器索引公共和私有频道中的消息和线程。

消息检索

它使用 OnyxSlackWebClient backend/onyx/connectors/slack/connector.py:63 与 Slack API 交互。get_channel_messages 函数 backend/onyx/connectors/slack/connector.py:146 执行分页调用 conversations_history,而 get_thread backend/onyx/connectors/slack/connector.py:176 检索特定消息的回复。

权限同步

Slack 权限通过 get_channel_access backend/onyx/connectors/slack/connector.py:58 进行同步,该方法将 Slack 用户 ID 映射到外部访问记录。

源文件: backend/onyx/connectors/slack/connector.py:146-174, backend/onyx/connectors/slack/connector.py:176-184

实现细节:Salesforce

Salesforce 连接器通过将数据导出到本地 SQLite 数据库进行处理,执行全量同步和增量同步。

同步策略

连接器使用 OnyxSalesforce backend/onyx/connectors/salesforce/onyx_salesforce.py:30 进行 API 交互。在初始同步期间,它通过 fetch_all_csvs_in_parallel backend/onyx/connectors/salesforce/connector.py:31 批量导出对象类型到 CSV,并将其加载到 OnyxSalesforceSQLite backend/onyx/connectors/salesforce/connector.py:32 中。

文档生成

文档通过将父对象与其子对象(例如,账户与机会)在本地数据库中进行关联来创建 backend/onyx/connectors/salesforce/connector.py:172-180

源文件: backend/onyx/connectors/salesforce/connector.py:163-182, backend/onyx/connectors/salesforce/doc_conversion.py:27-28

后端连接器注册表

后端使用工厂模式,根据 DocumentSource 实例化正确的连接器类。

源文件: backend/onyx/connectors/factory.py:1-185

连接器类加载

identify_connector_class 函数 backend/onyx/connectors/factory.py:91-101registry.py 中定义的 CONNECTOR_CLASS_MAP 中检索类。它使用 _load_connector_class backend/onyx/connectors/factory.py:36-54 动态导入模块并缓存类对象。

输入类型校验

在实例化之前,工厂会校验连接器类是否为其 InputType(例如,LoadConnector 对应 LOAD_STATE)实现了所需的接口 backend/onyx/connectors/factory.py:57-88