12.2 · LiteLLM 集成与 Monkey Patches（LiteLLM Integration and Monkey Patches）

企业连接器与统一搜索 · 聚焦本章的模块关系、源码依据与实现要点。

项目Onyx 章节12.2 状态全文译文模块接口与服务契约、模型调用与提供方适配、入库与解析、测试、发布与运维

项目要点页2.5 参考项目项目章节目录Onyx DeepWiki 原始章节LiteLLM Integration and Monkey Patches 上一章12.1 下一章12.3

源码线索

.github/workflows/nightly-external-dependency-unit-tests.yml
.pre-commit-config.yaml
backend/onyx/llm/litellm_singleton/monkey_patches.py
backend/pytest.ini
backend/requirements/README.md
backend/requirements/default.txt
backend/requirements/dev.txt
backend/requirements/ee.txt
backend/requirements/model_server.txt
backend/tests/external_dependency_unit/conftest.py

模块标签

接口与服务契约
模型调用与提供方适配
入库与解析
测试、发布与运维
系统架构

章节正文

LiteLLM 集成与 Monkey Patches

原始 DeepWiki 页面https://deepwiki.com/onyx-dot-app/onyx/12.2-litellm-integration-and-monkey-patches

LiteLLM 集成与猴子补丁

目的与范围

本文档描述了 Onyx 与 LiteLLM（一个面向多个大语言模型（LLM）提供商的统一接口库）的集成方式，以及为修复 LiteLLM 在处理流式响应、推理内容和用量指标时存在的提供商特定 Bug 而应用的猴子补丁。

关于通用 LLM 提供商配置和模型选择的信息，请参阅提供商配置和模型选择层级。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:1-58

LiteLLM 在 Onyx 中的角色

LiteLLM 作为 Onyx 的抽象层，通过统一接口与多种 LLM 提供商（OpenAI、Anthropic、Ollama、Azure、Google 等）进行通信。Onyx 无需为每个 LLM 服务实现特定的提供商客户端，而是使用 LiteLLM 将请求转换为提供商特定格式，并将响应标准化为通用模式。

集成架构

Onyx · 集成架构 · 图 1

图示：Onyx 架构中的 LiteLLM 集成

该集成使用 LiteLLM v1.83.14 来管理各种提供商。Onyx 应用了有针对性的猴子补丁，以确保推理流式传输和 Azure 原生流式传输等特定行为能正常工作。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:6-58, pyproject.toml:15

为何需要猴子补丁

截至 LiteLLM v1.83.14（2026 年 5 月），该库存在若干需要重写的行为，以满足 Onyx 的功能和可读性要求：

Ollama 流式推理 backend/onyx/llm/litellm_singleton/monkey_patches.py:8-16：LiteLLM 的片段解析器使用 is not None 检查来识别思考字段。然而，Ollama 发送空字符串（""）作为从推理到内容的过渡信号。该补丁使用真值检查来正确识别此过渡。
推理摘要换行符 backend/onyx/llm/litellm_singleton/monkey_patches.py:18-22：LiteLLM 直接传递推理摘要文本，不添加分隔符。Onyx 在 summary_index 发生变化时插入 \n\n，以提高 UI 可读性。
OpenAI 非流式连接器 backend/onyx/llm/litellm_singleton/monkey_patches.py:24-28：LiteLLM 使用 " ".join() 连接推理摘要部分。Onyx 将其重写为使用 "\n\n".join()，以保留章节分隔。
Azure 虚假流式传输 backend/onyx/llm/litellm_singleton/monkey_patches.py:30-38：对于不在其内部数据库中的 Azure 模型，LiteLLM 默认使用"虚假流式传输"（缓冲整个响应）。此补丁强制所有 Azure Responses API 调用使用原生流式传输，以改善首 Token 时间。
用量格式校验 backend/onyx/llm/litellm_singleton/monkey_patches.py:41-48：LiteLLM 频繁使用 model_construct()，该方法会绕过 Pydantic 校验。这允许混合使用格式（输入/输出 Token 与提示/补全 Token），从而导致下游序列化警告。
日志转换警告 backend/onyx/llm/litellm_singleton/monkey_patches.py:50-57：上游代码将用量对象原地修改为字典，导致 Pydantic 类型错误。该补丁使用深拷贝来保持对象完整性。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:1-58

猴子补丁系统

补丁应用流程

Onyx · 补丁应用流程 · 图 2

图示：猴子补丁应用流程

每个补丁函数在应用修改前，会通过验证目标函数的 __name__ 属性来实现幂等性检查。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:84-93

关键补丁实现

Ollama 片段解析器（`_patch_ollama_chunk_parser`）

该补丁解决了流式片段中推理内容的检测问题。具体来说，它会跟踪流是否处于 <think> 块内，或者是否正在接收原生的 thinking 字段。

逻辑： 它将片段中的 message 对象转换为 delta 对象 backend/onyx/llm/litellm_singleton/monkey_patches.py:118-122。
推理状态： 它使用 self.started_reasoning_content 和 self.finished_reasoning_content 来区分思考阶段和最终输出 backend/onyx/llm/litellm_singleton/monkey_patches.py:144-145。
过渡信号： 它将空的 thinking 字符串视为推理结束信号 backend/onyx/llm/litellm_singleton/monkey_patches.py:142。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:84-155

Azure 原生流式传输（`_patch_azure_responses_should_fake_stream`）

此补丁对 Azure 的性能至关重要。LiteLLM 的 OpenAIResponsesAPIConfig（Azure 继承自该类）对于自定义模型名称默认使用虚假流式传输 backend/onyx/llm/litellm_singleton/monkey_patches.py:35-38。该补丁强制 should_fake_stream 返回 False，确保用户能实时看到生成的 Token。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:30-38

用量格式校验（`_patch_responses_api_usage_format`）

为避免调用 model_dump() 时出现 Pydantic 序列化警告，此补丁确保 ResponsesAPIResponse 对象包含正确格式的用量数据（input_tokens/output_tokens），而不是聊天补全格式（prompt_tokens/completion_tokens）backend/onyx/llm/litellm_singleton/monkey_patches.py:41-47。

来源： backend/onyx/llm/litellm_singleton/monkey_patches.py:41-48

依赖管理

该集成依赖于特定版本的 LiteLLM 及其依赖项，通过 uv 进行管理。

包名	版本	角色
`litellm[google]`	`1.83.14`	核心 LLM 抽象库 `pyproject.toml:15`
`openai`	`2.14.0`	用于 OpenAI/Azure 兼容 API `pyproject.toml:16`
`pydantic`	`2.11.7`	数据校验和设置管理 `pyproject.toml:17`

来源： pyproject.toml:15-17, backend/requirements/default.txt:3

依赖解析

由于 LiteLLM 在其自身依赖项中使用了精确版本锁定，Onyx 在 pyproject.toml 中使用 override-dependencies，以允许 uv 解析器选择兼容的版本，从而满足 Onyx 的安全和性能要求（例如，允许更新版本的 aiohttp 或 httpx）pyproject.toml:196-216。

来源： pyproject.toml:196-216, backend/requirements/README.md:7-16