Batch Lifecycle

Typed Batch resource with state machine, per-row error report, replay framework, and four source kinds.

The Batch resource

type Batch = {
  id: string
  surface: "partner" | "ipaas" | "batch"
  operation: BatchOperationKind
  status: BatchStatus
  source: BatchSource
  created_at: ISO8601
  started_at: ISO8601 | null
  completed_at: ISO8601 | null
  progress: {
    rows_total: number | null         // null when total isn't yet known
    rows_processed: number
    rows_succeeded: number
    rows_failed: number
  }
  errors_url: string | null            // populated when status is partially_failed or failed
  result_url: string | null            // populated when status is succeeded or partially_failed
  idempotency_key: string              // echoed back
}

type BatchStatus =
  | "accepted"           // queued; not yet running
  | "running"
  | "succeeded"          // all rows processed; zero failed
  | "partially_failed"   // some rows processed, some failed
  | "failed"             // unrecoverable error
  | "cancelled"          // operator-cancelled via DELETE

Source kinds

Four kinds, picked by load profile and delivery mechanism:

type BatchSource =
  | { kind: "inline"; rows: unknown[] }                                  // ≤ 1000 rows
  | { kind: "signed_url"; url: string; format: "csv" | "jsonl" }         // pre-signed URL
  | { kind: "s3"; bucket: string; key: string; format: "csv" | "jsonl"; assume_role_arn?: string }
  | { kind: "sftp"; credentials_id: string; path: string; format: "csv" | "jsonl" }

When to use each

  • inline — small one-shot uploads, ≤1000 rows. Lowest latency.
  • signed_url — partner generates a temporary download URL; gondor fetches at job start. Good for ad-hoc uploads from any storage.
  • s3 — long-lived gondor-owned bucket or partner-owned bucket via assume_role_arn. 50GB cap.
  • sftp — partner uploads to gondor-managed SFTP endpoint with PGP. The credentials_id references a per-tenant SftpCredentials resource — see POST /v2/partner/sftp-credentials.

Lifecycle endpoints

POST /v2/batch/operations
Idempotency-Key: <uuid>
Body: { operation: BatchOperationKind, source: BatchSource, params: ... }
→ 201 Batch (status: "accepted")

GET /v2/batch/operations/{id}
→ Batch

GET /v2/batch/operations/{id}/errors
→ BatchErrorReport     (paginated)

GET /v2/batch/operations/{id}/result
→ result data          (per operation)

DELETE /v2/batch/operations/{id}
X-Operator-Override: destructive
→ 204                  (cancels if running)

POST /v2/batch/operations/{id}/replay
Idempotency-Key: <new-uuid>
Body: { only: "failed_rows" | "all_rows" }
→ 201 Batch            (new batch with same op + source filtered to the replay target)

Per-row error report

type BatchErrorReport = {
  batch_id: string
  total_failed: number
  errors: BatchErrorRow[]              // paginated; sampling rules apply when total_failed > 10000
  sampled: boolean
  next_cursor: string | null
}

type BatchErrorRow = {
  row_number: number                   // 1-indexed
  source_identifier: string | null     // e.g., the row's unique_id from the CSV
  errors: Error[]                      // standard Error envelope per row
}

The error report is paginated. For batches with >10k failed rows the response is sampled (sampled: true); fetch the full report via the errors_url which returns a JSONL stream.

Idempotency on batches

Same Idempotency-Key + same operation + same source hash → returns the existing batch (not a new one).

Same Idempotency-Key + DIFFERENT source → 409 idempotency_conflict.

Per-row idempotency (within a batch)

Some Op handlers — notably messages.send — accept per-row idempotency keys inside the batch payload. The key is scoped to the operation + the row's natural identifier; replays of the same row within the same batch return the cached per-row result. This is distinct from the batch-level Idempotency-Key header which scopes the batch as a whole.

Replay

Two replay modes:

  • failed_rows — only the rows that failed in the original. The new batch carries the same Op handler and source kind, filtered to the failure set.
  • all_rows — rerun the entire source. Useful when the failure was upstream of the row data (a misconfigured Op param, a flaky downstream service).

Replays inherit the original's lineage and are linkable via the parent batch_id on the replay's params.