agentic_huge_data_base / wiki
页面 Argilla · 7 接口参考·DeepWiki 中文全文译文

7 · 接口参考(API Reference)

人工复核与反馈数据 · 聚焦本章的模块关系、源码依据与实现要点。

项目Argilla 章节7 状态全文译文 模块接口与服务契约、界面与交互、评测、反馈与人工复核、文档对象与元数据
源码线索
  • argilla-frontend/CHANGELOG.md
  • argilla-frontend/components/features/annotation/container/questions/form/span/EntityLabelSelection.component.vue
  • argilla-frontend/components/features/annotation/settings/Validation.vue
  • argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationFieldSelector.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationLabels.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationQuestion.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationRating.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationSpan.vue
  • argilla-frontend/package.json
模块标签
  • 接口与服务契约
  • 界面与交互
  • 评测、反馈与人工复核
  • 文档对象与元数据
  • 检索、召回与索引

章节正文

接口参考

API 参考文档

相关源文件

以下文件为本 Wiki 页面生成时使用的上下文:

  • argilla-frontend/CHANGELOG.md
  • argilla-frontend/components/features/annotation/container/questions/form/span/EntityLabelSelection.component.vue
  • argilla-frontend/components/features/annotation/settings/Validation.vue
  • argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationFieldSelector.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationLabels.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationQuestion.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationRating.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationSpan.vue
  • argilla-frontend/package.json
  • argilla-frontend/translation/de.js
  • argilla-frontend/translation/en.js
  • argilla-frontend/translation/es.js
  • argilla-frontend/v1/domain/entities/hub/DatasetCreation.test.ts
  • argilla-frontend/v1/domain/entities/hub/QuestionCreation.ts
  • argilla-frontend/v1/domain/entities/hub/Subset.ts
  • argilla-server/CHANGELOG.md
  • argilla-server/src/argilla_server/_version.py
  • argilla-server/src/argilla_server/alembic/versions/580a6553186f_add_datasets_users_table.py
  • argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py
  • argilla-server/src/argilla_server/api/schemas/v1/datasets.py
  • argilla-server/src/argilla_server/bulk/records_bulk.py
  • argilla-server/src/argilla_server/contexts/datasets.py
  • argilla-server/src/argilla_server/database.py
  • argilla-server/src/argilla_server/models/database.py
  • argilla-server/tests/factories.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/records/records_bulk/test_create_dataset_records_bulk.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/records/records_bulk/test_dataset_records_bulk_with_responses.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/test_get_dataset_progress.py
  • argilla-server/tests/unit/api/handlers/v1/responses/test_create_current_user_responses_bulk.py
  • argilla-server/tests/unit/api/handlers/v1/test_datasets.py
  • argilla-server/tests/unit/api/handlers/v1/test_records.py
  • argilla-server/tests/unit/database/models/test_dataset_user_model.py
  • argilla-server/tests/unit/test_database.py
  • argilla-v1/src/argilla_v1/_version.py
  • argilla/CHANGELOG.md
  • argilla/src/argilla/__init__.py
  • argilla/src/argilla/_version.py
  • docs/_source/.readthedocs.yaml
  • docs/_source/_static/images/og-doc.png
  • docs/_source/_templates/page.html
  • docs/_source/conf.py
  • docs/_source/getting_started/quickstart.md
  • docs/_source/reference/python/python_client.rst
  • docs/_source/reference/python/python_training.rst
  • docs/_source/requirements.txt

本参考文档全面记录了 Argilla 的应用程序编程接口(API),涵盖 Python SDK 和 REST API 端点。Python SDK 提供了高层级的、符合 Python 习惯的接口用于与 Argilla 交互,而 REST API 则为自定义集成提供了更直接的访问方式。

关于部署 Argilla 的信息,请参见部署与配置。关于用户指南,请参见用户指南

Python SDK API

Python SDK 是以编程方式与 Argilla 交互的推荐方式。它将 REST API 调用抽象为便捷的接口。

客户端初始化
import argilla as rg

# 方法一:函数式方式——设置全局客户端
rg.init(api_url="http://localhost:6900", api_key="your-api-key")

# 方法二:面向对象方式
client = rg.Argilla(api_url="http://localhost:6900", api_key="your-api-key")

# 方法三:上下文管理器
with rg.Argilla(api_url="http://localhost:6900", api_key="your-api-key") as client:
    # 使用客户端...

来源:argilla/src/argilla/__init__.py, argilla-server/src/argilla_server/database.py

数据集管理
# 创建数据集
dataset = rg.Dataset(
    name="sentiment-analysis",
    workspace="default",
    fields=[
        rg.TextField(name="text", title="文本内容")
    ],
    questions=[
        rg.LabelQuestion(
            name="sentiment",
            title="情感倾向",
            options=["positive", "negative", "neutral"]
        )
    ]
)

# 推送到 Argilla 服务器
remote_dataset = dataset.push_to_argilla()

# 加载已有数据集
dataset = rg.load(name="sentiment-analysis", workspace="default")

# 列出所有数据集
datasets = rg.list_datasets()

# 删除数据集
rg.delete(name="sentiment-analysis")

来源:argilla/src/argilla/__init__.py, argilla-server/src/argilla_server/contexts/datasets.py

字段类型

Argilla 支持多种字段类型来展示不同类型的数据:

# 文本字段
text_field = rg.TextField(name="text", title="文本字段")

# 图片字段
image_field = rg.ImageField(name="image", title="图片字段")

# 对话字段(用于对话场景)
chat_field = rg.ChatField(name="chat", title="对话字段")

# 自定义字段
custom_field = rg.CustomField(name="custom", title="自定义字段")

来源:argilla-frontend/translation/en.js:2-9, argilla-frontend/v1/domain/entities/hub/FieldCreation.ts

问题类型

问题定义了标注人员需要提供的内容:

# 标签问题(单选)
label_question = rg.LabelQuestion(
    name="category",
    title="分类",
    options=["news", "sports", "entertainment"]
)

# 多标签问题
multi_label_question = rg.MultiLabelQuestion(
    name="topics",
    title="主题",
    options=["politics", "economy", "technology"]
)

# 评分问题
rating_question = rg.RatingQuestion(
    name="quality",
    title="质量评分",
    options=[0, 1, 2, 3, 4, 5]
)

# 排序问题
ranking_question = rg.RankingQuestion(
    name="preference",
    title="偏好排序",
    options=["option_a", "option_b", "option_c"]
)

# 跨度问题(用于实体标注)
span_question = rg.SpanQuestion(
    name="entities",
    title="实体标注",
    field="text",
    options=["person", "organization", "location"]
)

# 文本问题(自由文本)
text_question = rg.TextQuestion(
    name="comment",
    title="评论"
)

来源:argilla-frontend/translation/en.js:2-9, argilla-frontend/v1/domain/entities/hub/QuestionCreation.ts:19-26

记录操作

记录是用户标注的单个数据点:

# 创建记录
record = rg.FeedbackRecord(
    fields={"text": "这是一个示例文本。"},
    metadata={"source": "news", "length": 21}
)

# 向数据集添加记录
dataset.add_records([record])

# 添加多条记录
records = [
    rg.FeedbackRecord(fields={"text": "示例 1"}),
    rg.FeedbackRecord(fields={"text": "示例 2"})
]
dataset.add_records(records)

# 搜索记录
results = dataset.records.search("示例")

# 按元数据过滤记录
filtered = dataset.records.filter_by(
    metadata_filters=[
        rg.TermsMetadataFilter(name="source", value="news")
    ]
)

# 限制返回的记录数量
limited = dataset.records.pull(max_records=100)

来源:argilla-server/src/argilla_server/contexts/datasets.py:330-339, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89

响应与建议

响应是用户提交的标注,而建议是预填充的标注(通常来自模型):

# 为记录添加建议
record.suggest(
    question_name="sentiment",
    value="positive",
    score=0.95,
    agent="gpt-4"
)

# 提交响应
record.respond(
    question_name="sentiment",
    value="negative"
)

# 获取记录的响应
responses = record.responses

# 按状态过滤响应
submitted = dataset.responses.filter_by(status="submitted")

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540

向量操作

向量嵌入支持相似度搜索:

# 为数据集添加向量设置
dataset.add_vector_settings(
    name="embeddings",
    dimensions=768
)

# 为记录添加向量
record = rg.FeedbackRecord(
    fields={"text": "示例"},
    vectors={"embeddings": [0.1, 0.2, ...]}  # 维度为 768 的向量
)

# 查找相似记录
similar = dataset.find_similar_records(
    record_id="record-id",
    vector_name="embeddings",
    limit=10
)

来源:argilla-server/src/argilla_server/contexts/datasets.py:301-326, argilla-server/src/argilla_server/bulk/records_bulk.py:136-155

元数据属性

元数据属性存储可用于过滤的记录附加信息:

# 定义元数据属性
term_metadata = rg.TermsMetadataProperty(
    name="source",
    title="来源",
    visible_for_annotators=True
)

int_metadata = rg.IntegerMetadataProperty(
    name="length",
    title="文本长度",
    visible_for_annotators=False
)

float_metadata = rg.FloatMetadataProperty(
    name="score",
    title="置信度分数",
    visible_for_annotators=True
)

# 向数据集添加元数据属性
dataset.add_metadata_property(term_metadata)

来源:argilla-server/src/argilla_server/contexts/datasets.py:246-270, argilla-server/src/argilla_server/contexts/datasets.py:273-282

Hugging Face 集成

Argilla 与 Hugging Face Hub 集成,支持数据集的导入/导出:

# 从 Hugging Face Hub 导入
dataset = rg.Dataset.from_hub("stanfordnlp/imdb")

# 导出到 Hugging Face Hub
dataset.push_to_hub(
    repo_id="username/dataset-name",
    private=True,
    token="your-huggingface-token"
)

来源:argilla-frontend/translation/en.js:332-345, argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:444-467

REST 接口端点

REST API 提供了对 Argilla 功能的直接编程访问。

认证端点
POST /api/v1/token                # 获取认证令牌

来源:argilla-server/src/argilla_server/database.py:83-97

数据集端点
GET    /api/v1/me/datasets                     # 列出当前用户的数据集
POST   /api/v1/datasets                        # 创建数据集
GET    /api/v1/datasets/{dataset_id}           # 获取指定数据集
PATCH  /api/v1/datasets/{dataset_id}           # 更新数据集
DELETE /api/v1/datasets/{dataset_id}           # 删除数据集
POST   /api/v1/datasets/{dataset_id}/publish   # 发布数据集
GET    /api/v1/datasets/{dataset_id}/progress  # 获取数据集进度

来源:argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:75-96, argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:136-179

字段端点
GET    /api/v1/datasets/{dataset_id}/fields      # 列出数据集字段
POST   /api/v1/datasets/{dataset_id}/fields      # 创建字段
PATCH  /api/v1/fields/{field_id}                 # 更新字段
DELETE /api/v1/fields/{field_id}                 # 删除字段

来源:argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:99-107

问题端点
GET    /api/v1/datasets/{dataset_id}/questions      # 列出数据集问题
POST   /api/v1/datasets/{dataset_id}/questions      # 创建问题
PATCH  /api/v1/questions/{question_id}              # 更新问题
DELETE /api/v1/questions/{question_id}              # 删除问题
记录端点
POST   /api/v1/datasets/{dataset_id}/records/bulk    # 批量创建记录
PUT    /api/v1/datasets/{dataset_id}/records/bulk    # 批量更新记录
GET    /api/v1/datasets/{dataset_id}/records         # 列出数据集中的记录
GET    /api/v1/records/{record_id}                   # 获取指定记录
PATCH  /api/v1/records/{record_id}                   # 更新记录
DELETE /api/v1/records/{record_id}                   # 删除记录
POST   /api/v1/datasets/{dataset_id}/records/search  # 搜索记录

来源:argilla-server/src/argilla_server/bulk/records_bulk.py:46-89, argilla-server/src/argilla_server/contexts/datasets.py:330-339

响应端点
POST   /api/v1/me/responses/bulk        # 批量提交响应
GET    /api/v1/responses/{response_id}  # 获取指定响应
PATCH  /api/v1/responses/{response_id}  # 更新响应
DELETE /api/v1/responses/{response_id}  # 删除响应

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540

元数据属性端点
GET    /api/v1/datasets/{dataset_id}/metadata-properties       # 列出元数据属性
POST   /api/v1/datasets/{dataset_id}/metadata-properties       # 创建元数据属性
PATCH  /api/v1/metadata-properties/{metadata_property_id}      # 更新元数据属性
DELETE /api/v1/metadata-properties/{metadata_property_id}      # 删除元数据属性
GET    /api/v1/metadata-properties/{metadata_property_id}/metrics  # 获取属性指标

来源:argilla-server/src/argilla_server/contexts/datasets.py:246-270, argilla-server/src/argilla_server/contexts/datasets.py:273-282

向量设置端点
GET    /api/v1/datasets/{dataset_id}/vectors-settings       # 列出向量设置
POST   /api/v1/datasets/{dataset_id}/vectors-settings       # 创建向量设置
PATCH  /api/v1/vectors-settings/{vector_settings_id}        # 更新向量设置
DELETE /api/v1/vectors-settings/{vector_settings_id}        # 删除向量设置

来源:argilla-server/src/argilla_server/contexts/datasets.py:301-326

用户与工作空间端点
GET    /api/v1/me                                    # 获取当前用户
GET    /api/v1/users                                 # 列出所有用户
POST   /api/v1/users                                 # 创建用户
DELETE /api/v1/users/{user_id}                       # 删除用户
GET    /api/v1/workspaces                            # 列出所有工作空间
POST   /api/v1/workspaces                            # 创建工作空间
GET    /api/v1/workspaces/{workspace_id}/users       # 列出工作空间中的用户
POST   /api/v1/workspaces/{workspace_id}/users       # 向工作空间添加用户
DELETE /api/v1/workspaces/{workspace_id}/users/{user_id}  # 从工作空间移除用户

API 架构

Argilla API 组件
Argilla · Argilla API 组件 · 图 1
Argilla · Argilla API 组件 · 图 1

来源:argilla-server/src/argilla_server/contexts/datasets.py, argilla-server/src/argilla_server/database.py, argilla-server/src/argilla_server/models/database.py

记录创建与标注的 API 工作流
Argilla · 记录创建与标注的 API 工作流 · 图 2
Argilla · 记录创建与标注的 API 工作流 · 图 2

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89

数据模型

Argilla 的核心数据模型围绕数据集、记录、响应和建议展开:

Argilla · 数据模型 · 图 3
Argilla · 数据模型 · 图 3

来源:argilla-server/src/argilla_server/models/database.py:73-104, argilla-server/src/argilla_server/models/database.py:220-277, argilla-server/src/argilla_server/models/database.py:280-320

响应状态与数据集状态

Argilla 使用不同的状态来跟踪响应和数据集的进度:

响应状态
状态描述
pending尚未提供响应
draft响应已保存为草稿但未提交
submitted响应已提交
discarded记录已被标记为废弃
数据集状态
状态描述
draft数据集处于草稿模式,可以修改
ready数据集已发布,可供标注

来源:argilla-frontend/translation/en.js:74-81, argilla-server/src/argilla_server/api/schemas/v1/datasets.py:15-46

Python SDK 响应过滤器

在 Python SDK 中过滤响应时,可以使用以下响应状态过滤器:

from argilla import ResponseStatusFilter

# 按已提交的响应过滤
dataset.responses.filter_by(status=ResponseStatusFilter.submitted)

# 其他可选值
# ResponseStatusFilter.pending
# ResponseStatusFilter.draft
# ResponseStatusFilter.discarded

来源:argilla-server/tests/unit/api/handlers/v1/test_datasets.py:31-42

常见 API 工作流

创建和配置数据集
import argilla as rg

# 初始化 Argilla 客户端
rg.init(api_url="http://localhost:6900", api_key="your-api-key")

# 创建数据集
dataset = rg.Dataset(
    name="sentiment-analysis",
    fields=[
        rg.TextField(name="text", title="文本内容")
    ],
    questions=[
        rg.LabelQuestion(
            name="sentiment",
            title="情感倾向",
            options=["positive", "negative", "neutral"]
        )
    ],
    guidelines="请将每条文本的情感倾向标注为正面、负面或中性。"
)

# 推送到 Argilla 服务器
remote_dataset = dataset.push_to_argilla()
记录日志与添加建议
# 创建带有建议的记录
records = [
    rg.FeedbackRecord(
        fields={"text": "我非常喜欢这个产品!"},
        suggestions=[{
            "question_name": "sentiment",
            "value": "positive",
            "score": 0.95,
            "agent": "model-v1"
        }]
    ),
    rg.FeedbackRecord(
        fields={"text": "这个效果不太好。"},
        suggestions=[{
            "question_name": "sentiment",
            "value": "negative",
            "score": 0.87,
            "agent": "model-v1"
        }]
    )
]

# 向数据集添加记录
dataset.add_records(records)
搜索和过滤记录
# 按文本内容搜索
results = dataset.records.search("产品")

# 按元数据过滤
filtered = dataset.records.filter_by(
    metadata_filters=[
        rg.TermsMetadataFilter(name="source", value="web")
    ]
)

# 查找相似记录
similar = dataset.find_similar_records(
    record_id="some-record-id",
    vector_name="embeddings",
    limit=5
)
提交和检索标注
# 为记录提交标注
record = dataset.records[0]
record.respond(
    question_name="sentiment",
    value="positive"
)

# 获取数据集的所有响应
all_responses = dataset.responses

# 按状态过滤响应
submitted = dataset.responses.filter_by(status="submitted")

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89