agentic_huge_data_base / wiki
页面 Argilla · 6.2 Kubernetes 部署·DeepWiki 中文全文译文

6.2 · Kubernetes 部署(Kubernetes Deployment)

人工复核与反馈数据 · 聚焦本章的模块关系、源码依据与实现要点。

项目Argilla 章节6.2 状态全文译文 模块配置治理、测试、发布与运维、存储与持久化、界面与交互
源码线索
  • .dockerignore
  • .github/workflows/argilla-frontend.build-push-dev-frontend-docker.yml
  • .github/workflows/argilla-frontend.deploy-environment.yml
  • .github/workflows/argilla-frontend.yml
  • .github/workflows/argilla-server.yml
  • .github/workflows/argilla-v1.yml
  • .github/workflows/argilla.docs.yml
  • .github/workflows/argilla.yml
  • .github/workflows/close-inactive-issues-bot.yml
模块标签
  • 配置治理
  • 测试、发布与运维
  • 存储与持久化
  • 界面与交互
  • 系统架构

章节正文

Kubernetes 部署

Kubernetes 部署

相关源文件

本章引用的主要源码文件:

  • .dockerignore
  • .github/workflows/argilla-frontend.build-push-dev-frontend-docker.yml
  • .github/workflows/argilla-frontend.deploy-environment.yml
  • .github/workflows/argilla-frontend.yml
  • .github/workflows/argilla-server.yml
  • .github/workflows/argilla-v1.yml
  • .github/workflows/argilla.docs.yml
  • .github/workflows/argilla.yml
  • .github/workflows/close-inactive-issues-bot.yml

本文档提供了在 Kubernetes 环境中部署 Argilla 的技术说明。内容涵盖系统架构组件、配置要求以及实现高可用大规模运行的部署策略。如需在无 Kubernetes 编排的情况下进行容器化部署,请参见 Docker 部署

系统组件

在 Kubernetes 上部署 Argilla 需要多个核心组件协同工作,以提供完整的标注平台功能:

Argilla · 系统组件 · 图 1
Argilla · 系统组件 · 图 1

来源:.github/workflows/argilla-server.yml:32-65

Argilla 的 Kubernetes 架构

典型的 Argilla Kubernetes 部署架构包含以下资源:

Argilla · Argilla 的 Kubernetes 架构 · 图 2
Argilla · Argilla 的 Kubernetes 架构 · 图 2

来源:.github/workflows/argilla-server.yml.github/workflows/argilla-frontend.yml

Docker 镜像

Argilla 提供以下 Docker 镜像,可在 Kubernetes 清单中引用:

组件Docker 镜像仓库
Argilla 服务器argilladev/argilla-server:[tag]Docker Hub
Argilla 前端argilla/argilla-frontend:[tag]Docker Hub
Argilla 快速启动argilladev/argilla-hf-spaces:[tag]Docker Hub

这些镜像通过 CI/CD 管线构建并发布,具体定义在 GitHub 工作流文件中。

来源:.github/workflows/argilla-server.yml:127-141.github/workflows/argilla-frontend.yml:62-73

配置选项

环境变量

使用以下环境变量配置 Argilla 组件:

变量描述默认值
ARGILLA_DATABASE_URL数据库连接 URLpostgresql://postgres:postgres@localhost:5432/argilla
ARGILLA_ELASTICSEARCHElasticsearch URLhttp://localhost:9200
ARGILLA_SEARCH_ENGINE使用的搜索引擎(elasticsearchopensearchelasticsearch
ARGILLA_REDIS_URLRedis URLredis://redis:6379
ARGILLA_API_KEY用于认证的 API 密钥argilla.apikey
USERNAME默认管理员用户名argilla
PASSWORD默认管理员密码12345678
ARGILLA_ENABLE_TELEMETRY启用/禁用遥测0(CI 中禁用)

这些环境变量应通过 ConfigMaps 或 Secrets 提供给 argilla-server 部署。

来源:.github/workflows/argilla-server.yml:87-91.github/workflows/argilla.yml:31-35

Kubernetes ConfigMaps 和 Secrets

将配置存储在 Kubernetes 资源中:

# Argilla 配置的 ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: argilla-config
data:
  ARGILLA_SEARCH_ENGINE: "elasticsearch"
  ARGILLA_ELASTICSEARCH: "http://elasticsearch-service:9200"
  ARGILLA_DATABASE_URL: "postgresql://postgres:postgres@postgresql-service:5432/argilla"
  ARGILLA_REDIS_URL: "redis://redis-service:6379"

# 敏感信息的 Secret
apiVersion: v1
kind: Secret
metadata:
  name: argilla-secrets
type: Opaque
data:
  ARGILLA_API_KEY: YXJnaWxsYS5hcGlrZXk=  # base64 编码
  PASSWORD: MTIzNDU2Nzg=                 # base64 编码
  USERNAME: YXJnaWxsYQ==                 # base64 编码

来源:.github/workflows/argilla.yml:31-35.github/workflows/argilla-frontend.deploy-environment.yml:58-66

部署流程

Argilla 服务器部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: argilla-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: argilla-server
  template:
    metadata:
      labels:
        app: argilla-server
    spec:
      containers:
      - name: argilla-server
        image: argilladev/argilla-server:latest
        ports:
        - containerPort: 6900
        env:
        - name: ARGILLA_DATABASE_URL
          valueFrom:
            configMapKeyRef:
              name: argilla-config
              key: ARGILLA_DATABASE_URL
        - name: ARGILLA_ELASTICSEARCH
          valueFrom:
            configMapKeyRef:
              name: argilla-config
              key: ARGILLA_ELASTICSEARCH
        - name: ARGILLA_SEARCH_ENGINE
          valueFrom:
            configMapKeyRef:
              name: argilla-config
              key: ARGILLA_SEARCH_ENGINE
        - name: ARGILLA_API_KEY
          valueFrom:
            secretKeyRef:
              name: argilla-secrets
              key: ARGILLA_API_KEY
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        livenessProbe:
          httpGet:
            path: /api/_status
            port: 6900
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/_status
            port: 6900
          initialDelaySeconds: 5
          periodSeconds: 10

来源:.github/workflows/argilla.yml:57-58

Argilla 前端部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: argilla-frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: argilla-frontend
  template:
    metadata:
      labels:
        app: argilla-frontend
    spec:
      containers:
      - name: argilla-frontend
        image: argilla/argilla-frontend:latest
        ports:
        - containerPort: 3000
        env:
        - name: API_BASE_URL
          value: "/api/"
        resources:
          requests:
            cpu: "200m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"

来源:.github/workflows/argilla-frontend.deploy-environment.yml:66

数据库部署

对于生产环境,建议使用托管数据库服务,或部署带有适当复制和备份配置的 PostgreSQL:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
spec:
  serviceName: postgresql
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - name: postgresql
        image: postgres:14
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: "argilla"
        - name: POSTGRES_USER
          value: "postgres"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: argilla-secrets
              key: POSTGRES_PASSWORD
        volumeMounts:
        - name: postgresql-data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgresql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

来源:.github/workflows/argilla-server.yml:42-55

Elasticsearch 部署
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
spec:
  serviceName: elasticsearch
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
        ports:
        - containerPort: 9200
        - containerPort: 9300
        env:
        - name: discovery.type
          value: "single-node"
        - name: xpack.security.enabled
          value: "false"
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        volumeMounts:
        - name: elasticsearch-data
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

来源:.github/workflows/argilla-server.yml:34-41

Redis 部署
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: redis
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:latest
        ports:
        - containerPort: 6379
        resources:
          requests:
            cpu: "100m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"
        volumeMounts:
        - name: redis-data
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 5Gi

来源:.github/workflows/argilla-server.yml:57-65

资源需求

基于 GitHub 工作流中使用的 Cloud Run 配置,以下是 Argilla 组件的推荐资源分配:

组件CPU 请求内存请求说明
argilla-server2000m(2 个 CPU 核心)4096Mi(4 GB)无 CPU 限制
argilla-frontend2000m(2 个 CPU 核心)4096Mi(4 GB)适用于开发环境
最小实例数1确保可用性
最大实例数可变基于工作负载

来源:.github/workflows/argilla-frontend.deploy-environment.yml:57

扩缩容与高可用

对于生产部署,配置水平扩缩容:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: argilla-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: argilla-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

建议实施以下措施:

  • 多可用区或多区域部署以实现地理冗余
  • 数据库复制以确保数据持久性
  • Elasticsearch/OpenSearch 集群以提高搜索性能和弹性
  • Redis 复制或集群以确保缓存持久性

监控与健康检查

通过以下方式监控 Argilla 部署:

  1. 每个组件的存活探针和就绪探针:
   livenessProbe:
     httpGet:
       path: /api/_status
       port: 6900
     initialDelaySeconds: 30
     periodSeconds: 10
  1. Prometheus 指标收集
  2. 使用 Elasticsearch/Kibana 或其他日志聚合工具进行日志记录

argilla-server 组件提供 /api/_status 健康检查端点,可用于验证其运行状态。

来源:.github/workflows/argilla.yml:57-58

总结

在 Kubernetes 上部署 Argilla 可以为数据标注和模型改进提供可扩展、高弹性的平台。部署过程涉及配置多个组件,并为其分配适当的资源、存储和网络,以确保最佳性能。

有关 Docker 特定的部署说明,请参见 Docker 部署。有关 Hugging Face Spaces 集成,请参见 Hugging Face Spaces 集成。有关详细的服务器配置选项,请参见 服务器配置