5.3 · 知识图谱可视化（Knowledge 图谱可视化）

轻量图谱增强检索 · 聚焦本章的模块关系、源码依据与实现要点。

项目LightRAG 章节5.3 状态全文译文模块图谱与关系、界面与交互、系统架构、测试、发布与运维

项目要点页2.5 参考项目项目章节目录LightRAG DeepWiki 原始章节Knowledge 图谱可视化上一章5.2 下一章5.4

源码线索

lightrag_webui/src/components/graph/EditablePropertyRow.tsx
lightrag_webui/src/components/graph/GraphControl.tsx
lightrag_webui/src/components/graph/GraphLabels.tsx
lightrag_webui/src/components/graph/GraphSearch.tsx
lightrag_webui/src/components/graph/PropertiesView.tsx
lightrag_webui/src/components/graph/PropertyRowComponents.tsx
lightrag_webui/src/components/graph/ZoomControl.tsx
lightrag_webui/src/components/ui/AsyncSearch.tsx
lightrag_webui/src/components/ui/AsyncSelect.tsx
lightrag_webui/src/features/GraphViewer.tsx

模块标签

图谱与关系
界面与交互
系统架构
测试、发布与运维
检索、召回与索引

章节正文

Sidecar 格式与多模态处理

原始 DeepWiki 页面https://deepwiki.com/HKUDS/LightRAG/5.3-sidecar-format-and-multimodal-processing

Sidecar 格式与多模态处理

目的与范围

Sidecar 格式作为 LightRAG 中多模态文档的通用内部表示（IR）docs/LightRAGSidecarFormat-zh.md:3-5。其主要目标是：

对象分离：以结构化方式存储"正文文本 + 多模态对象 + 元数据" docs/LightRAGSidecarFormat-zh.md:3-3。
VLM 集成：为视觉语言模型（VLM）提供清晰的接口，使其能够结合周围文本上下文分析图像 lightrag/sidecar/writer.py:11-12。
可追溯性：维护稳定的 ID（blockid），将向量片段链接回原始文档中的特定段落或对象 docs/LightRAGSidecarFormat-zh.md:125-125。

`parsed` 目录布局

当文档被处理时，会在工作空间的 __parsed__ 文件夹内创建一个名为 <filename>.parsed/ 的目录 docs/LightRAGSidecarFormat-zh.md:23-23。

目录结构

__parsed__/<filename>.parsed/
├── <filename>.blocks.jsonl      # 文本块和文档元数据
├── <filename>.drawings.json     # 图像/图形 Sidecar（ID -> 条目）
├── <filename>.tables.json       # 表格 Sidecar（ID -> 条目）
├── <filename>.equations.json    # 公式 Sidecar（ID -> 条目）
└── <filename>.blocks.assets/    # 原始图像文件（png、wmf 等）

来源：docs/LightRAGSidecarFormat-zh.md:22-36

Sidecar 文件详情

文件	格式	内容
`blocks.jsonl`	JSONL	第一行为 `type="meta"`；后续行为 `type="content"` 块 `docs/LightRAGSidecarFormat-zh.md:40-40`。
`drawings.json`	JSON	图像对象字典。键为 `im-<hash>-<seq>` `docs/LightRAGSidecarFormat-zh.md:150-152`。
`tables.json`	JSON	表格对象字典。键为 `tb-<hash>-<seq>` `docs/LightRAGSidecarFormat-zh.md:11-11`。
`equations.json`	JSON	复杂 LaTeX 公式字典。键为 `eq-<hash>-<seq>` `docs/LightRAGSidecarFormat-zh.md:12-12`。

来源：docs/LightRAGSidecarFormat-zh.md:7-13，lightrag/sidecar/writer.py:9-12

数据流：从解析器到 Sidecar

从原始文件（PDF、DOCX）到 Sidecar 格式的转换由 IRDoc（中间表示）和 write_sidecar 函数管理。

实体关系：IR 到 Sidecar

下图说明了内部代码实体如何映射到物理 Sidecar 文件。

代码实体到 Sidecar 存储映射

LightRAG · 实体关系：IR 到 Sidecar · 图 1

来源：lightrag/sidecar/ir.py:1-28，lightrag/sidecar/writer.py:59-101，lightrag/sidecar/writer.py:133-133

多模态分析管线

Sidecar 写入完成后，LightRAG 会调用 analyze_multimodal 流程。此阶段使用大语言模型（LLM）（用于表格/公式）和视觉语言模型（VLM）（用于图像）生成描述性摘要。

周围上下文组装

为提高分析准确性，LightRAG 会为每个多模态对象提取"周围上下文" lightrag/multimodal_context.py:5-11。

前导上下文：对象之前的文本 lightrag/multimodal_context.py:16-16。
后续上下文：对象之后的文本 lightrag/multimodal_context.py:16-16。
标签原子性：在上下文提取过程中，其他多模态标签（如 <table>）被视为原子单元，因此不会被截断 lightrag/multimodal_context.py:27-28。

分析逻辑

VLM 处理：图像与原始图像字节和周围文本一起发送给 VLM 角色的大语言模型 tests/test_pipeline_analyze_multimodal.py:5-7。
表格/公式处理：这些内容以文本形式（表格为 JSON/HTML，公式为 LaTeX）发送给 EXTRACT 角色的大语言模型 tests/test_pipeline_analyze_multimodal.py:5-7。
结果回填：大语言模型输出被写回 Sidecar 条目的 llm_analyze_result 字段 docs/LightRAGSidecarFormat-zh.md:172-179。

多模态分析流程

LightRAG · 分析逻辑 · 图 2

来源：lightrag/multimodal_context.py:15-20，tests/test_pipeline_analyze_multimodal.py:80-104，lightrag/prompt_multimodal.py:21-25

实现细节

占位符标签

在 blocks.jsonl 内部，多模态对象通过 XML 风格的占位符标签表示 docs/LightRAGSidecarFormat-zh.md:130-132。

表格：<table id="tb-..." format="json">...</table> docs/LightRAGSidecarFormat-zh.md:134-134
图形：<drawing id="im-..." format="png" path="..." /> docs/LightRAGSidecarFormat-zh.md:135-135
公式：<equation id="eq-..." format="latex">...</equation> docs/LightRAGSidecarFormat-zh.md:136-136

关键函数和类

write_sidecar：生成 .parsed/ 目录的入口点。负责 ID 分配（tb-、im-、eq-）和块 ID 计算 lightrag/sidecar/writer.py:59-67。
IRDoc / IRBlock：解析器适配器用于描述文档结构的数据类，在序列化到磁盘之前使用 lightrag/sidecar/ir.py:32-176。
build_surrounding：在标签周围提取文本的逻辑，同时考虑 Token 限制和段落边界 lightrag/multimodal_context.py:104-112。
prompt_multimodal.py：包含 image_analysis、table_analysis 和 equation_analysis 的系统提示词 lightrag/prompt_multimodal.py:52-136。

稳定性与缓存

LightRAG 确保重新解析同一文档时，即使解析时间发生变化，也能产生稳定的 blockid 和文本内容。这是通过以下方式实现的：

在合并文本进行片段切分之前，从 blocks.jsonl 中剥离 meta 行 tests/test_parse_native_lightrag_e2e.py:8-12。
基于内容和位置计算 blockid：md5(doc_id + ":" + block_index + ":" + heading + ":" + content) docs/LightRAGSidecarFormat-zh.md:125-125。
在 Sidecar 的 llm_cache_list 中存储大语言模型分析缓存 ID，以便在文档删除时进行清理 tests/test_pipeline_analyze_multimodal.py:14-15。

来源：lightrag/sidecar/writer.py:1-20，lightrag/multimodal_context.py:1-62，lightrag/prompt_multimodal.py:1-29，tests/test_pipeline_analyze_multimodal.py:1-21

5.3 · 知识图谱可视化（Knowledge 图谱可视化）

章节正文

Sidecar 格式与多模态处理

Sidecar 格式与多模态处理

目的与范围

__parsed__ 目录布局

目录结构

Sidecar 文件详情

数据流：从解析器到 Sidecar

实体关系：IR 到 Sidecar

多模态分析管线

周围上下文组装

分析逻辑

实现细节

占位符标签

关键函数和类

稳定性与缓存

`parsed` 目录布局