Skip to main content

Usage Overview

Track AI token consumption and cost estimates across your workspace with aggregate usage stats and per-document breakdowns via the Usage API.

The Usage API provides visibility into AI token consumption and cost estimates across your workspace. Every AI operation — extraction, classification, matching, resolution — is metered and logged with input/output token counts, the model used, and an estimated USD cost. Use these endpoints to monitor spending, optimize pipeline configuration, and audit AI operations.

Two endpoints cover aggregate and document-level views. The aggregate endpoint returns workspace-wide totals with a breakdown by operation type and model over a configurable date range. The per-document endpoint drills into a specific document to show every AI call made during its processing lifecycle, including cache token utilization and per-operation cost estimates.

Cost estimates are computed from the model-specific token pricing at the time of the API call. Input tokens, output tokens, and cache read tokens are each priced at their respective rates. Cache read tokens represent prompt cache hits where previously cached input was reused at a significantly lower rate than fresh input tokens — high cache utilization indicates efficient prompt reuse across similar documents.

Usage data is available immediately after each AI operation completes. There is no delay or batching — the token counts and cost estimates are recorded synchronously as part of the operation lifecycle. Historical data is retained indefinitely and can be queried over any date range.

Usage tracking is automatic and requires no configuration. Every AI operation performed by the platform — extraction, classification, triage, resolution, matching — is metered and available through these endpoints.
GET/v1/usage

Aggregate Response

Response fields

period.fromstringISO 8601 start of the reporting period.
period.tostringISO 8601 end of the reporting period.
totals.input_tokensintegerTotal input tokens consumed across all operations.
totals.output_tokensintegerTotal output tokens produced across all operations.
totals.callsintegerTotal number of AI operations performed.
breakdownarrayPer-operation-type and per-model usage breakdown.
breakdown[].operation_typestringOperation category (e.g. extraction, classification, matching).
breakdown[].modelstringAI model used for this operation group.
breakdown[].input_tokensintegerInput tokens consumed by this group.
breakdown[].output_tokensintegerOutput tokens produced by this group.
breakdown[].callsintegerNumber of calls in this group.
links.selfstringSelf-link to this endpoint.

Response — Aggregate usage

{
  "period": { "from": "2026-04-14T00:00:00.000Z", "to": "2026-05-14T00:00:00.000Z" },
  "totals": { "input_tokens": 8940120, "output_tokens": 1274580, "calls": 2764 },
  "breakdown": [
    {
      "operation_type": "extraction",
      "model": "claude-sonnet-4-20250514",
      "input_tokens": 7896420,
      "output_tokens": 1127040,
      "calls": 1842
    },
    {
      "operation_type": "classification",
      "model": "claude-haiku-3-5",
      "input_tokens": 1043700,
      "output_tokens": 147540,
      "calls": 922
    }
  ],
  "links": { "self": "/v1/usage" }
}

Per-Document Usage

The per-document endpoint returns every AI operation performed on a specific document along with token counts, cache utilization, and cost estimates. This is useful for understanding why a particular document was expensive to process — for example, a large multi-page PDF that required multiple extraction chunks, or a document that triggered classification retries.

GET/v1/usage/documents/:id

Document Response

Response fields

document_idstringUUID of the document.
totals.input_tokensintegerTotal input tokens across all operations for this document.
totals.output_tokensintegerTotal output tokens across all operations for this document.
totals.cost_estimate_usdnumberTotal estimated cost in USD for all operations on this document.
totals.callsintegerTotal number of AI operations performed on this document.
entriesarrayIndividual AI operation log entries.
entries[].idstringLog entry UUID.
entries[].operation_typestringOperation category (e.g. extraction, classification).
entries[].modelstringAI model used for this operation.
entries[].input_tokensintegerInput tokens consumed.
entries[].output_tokensintegerOutput tokens produced.
entries[].cache_read_tokensintegerPrompt cache read tokens (reused cached input).
entries[].cost_estimate_usdnumberEstimated cost in USD for this operation.
entries[].created_atstringISO 8601 timestamp of the operation.
links.selfstringSelf-link to this document usage endpoint.
links.documentstringLink to the document resource.

Response — Per-document usage

{
  "document_id": "d4e5f6a7-b8c9-0123-d456-e7f8a9b0c1d2",
  "totals": {
    "input_tokens": 12480,
    "output_tokens": 1890,
    "cost_estimate_usd": 0.058,
    "calls": 3
  },
  "entries": [
    {
      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "operation_type": "extraction",
      "model": "claude-sonnet-4-20250514",
      "input_tokens": 8640,
      "output_tokens": 1245,
      "cache_read_tokens": 2048,
      "cost_estimate_usd": 0.041,
      "created_at": "2026-05-13T14:22:10.000Z"
    },
    {
      "id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "operation_type": "classification",
      "model": "claude-haiku-3-5",
      "input_tokens": 3840,
      "output_tokens": 645,
      "cache_read_tokens": 0,
      "cost_estimate_usd": 0.017,
      "created_at": "2026-05-13T14:21:45.000Z"
    }
  ],
  "links": {
    "self": "/v1/usage/documents/d4e5f6a7-b8c9-0123-d456-e7f8a9b0c1d2",
    "document": "/v1/documents/d4e5f6a7-b8c9-0123-d456-e7f8a9b0c1d2"
  }
}

Example Request

cURL — Aggregate usage for the last 7 days

curl -X GET "https://api.talonic.ai/v1/usage?from=2026-05-07T00:00:00Z&to=2026-05-14T00:00:00Z" \
  -H "Authorization: Bearer tlnc_your_api_key"

Errors

Error responses

401unauthorizedMissing or invalid API key.
404not_foundDocument not found (per-document endpoint only).
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.