Skip to main content

talonic_filter

Filter documents by extracted field values using composable conditions (eq, gt, between, contains, etc.).

Accepts canonical field names (e.g. vendor.name, policy.0_coverage_type) which the Talonic API resolves to IDs server-side, or UUIDs directly.

`talonic_filter` is the precision counterpart to `talonic_search`. While search uses fuzzy matching for discovery, filter applies exact conditions against extracted field values. This makes it ideal for queries like 'show me all invoices from vendor X over 1000 EUR' or 'contracts expiring before 2026-12-31'.

Multiple conditions are AND-ed together, allowing you to build precise queries that narrow results progressively. The optional search parameter lets you combine free-text search with structured filters in a single call, and sort controls the result ordering.

ParameterTypeDescription
conditions *arrayFilter conditions, AND-ed together. Each has `field` or `field_id`, `operator`, and `value`.
searchstringOptional free-text search applied alongside filters.
sortobjectSort by a field: `{ field, direction: "asc" | "desc" }`.
pageintegerPage number for pagination.
limitintegerResults per page. Default: 50.
Only fields where the search response shows filterable: true can be used with talonic_filter; entries with filterable: false have no extracted data yet. The is_not_empty operator checks materialized values, which are updated within seconds of extraction completing.

Example

Tool input
{
  "conditions": [
    { "field": "vendor.name", "operator": "eq", "value": "Meridian Energy AG" },
    { "field": "invoice.total_eur", "operator": "gt", "value": 1000 }
  ],
  "sort": { "field": "invoice.total_eur", "direction": "desc" },
  "limit": 20
}
Tool response
{
  "data": [
    {
      "document_id": "doc_8f3a...",
      "filename": "invoice-2026-001.pdf",
      "fields": {
        "vendor.name": "Meridian Energy AG",
        "invoice.total_eur": 1500.00,
        "invoice.due_date": "2026-06-15"
      }
    },
    {
      "document_id": "doc_c2d9...",
      "filename": "invoice-2026-003.pdf",
      "fields": {
        "vendor.name": "Meridian Energy AG",
        "invoice.total_eur": 1200.00,
        "invoice.due_date": "2026-04-30"
      }
    }
  ],
  "total": 2,
  "page": 1
}

Available operators

The following operators are available for filter conditions: eq (equals), neq (not equals), gt (greater than), gte (greater than or equal), lt (less than), lte (less than or equal), between (range, requires a two-element array as value), contains (substring match), starts_with, ends_with, is_empty (checks for null or empty values), and is_not_empty (checks for materialized values). Numeric operators (gt, gte, lt, lte, between) only resolve correctly when the schema field is typed as number. The next section explains how to handle that constraint.

Schema typing (preventive + reactive)

A numeric operator on a string-typed field that happens to hold numeric content (e.g. "€1,500.00") silently returns zero matches — the comparison falls back to lexicographic ordering and almost never produces the result the user expects. There are two ways to handle this; pick the right one before constructing the call.

Preventive — gate on `dataType`

Call talonic_search first and read dataType on the field entry. If dataType !== "number", do not issue a numeric operator on that field. Pick a string-friendly operator (eq, contains) or warn the user that the field needs a data_type change in the schema definition before the query can succeed. This avoids the silent-zero-matches outcome entirely.

Reactive — handle `warnings[]`

When a numeric operator is applied to a string-typed field anyway, the API attaches a warnings[] array to the filter response. Each entry has code, message, field/field_id, and a suggestion. The MCP tool surfaces this in structuredContent — agents should relay the message (and suggestion, when present) to the user rather than silently retrying.

Response with a warning
{
  "data": [],
  "total": 0,
  "warnings": [
    {
      "code": "numeric_operator_on_string_field",
      "message": "Operator `gt` was applied to field `invoice_total` typed as string. Numeric comparisons against string-typed fields use lexicographic ordering and may return zero matches.",
      "field": "invoice_total",
      "field_id": "fld_inv_total",
      "suggestion": "Change the field's data_type to `number` in the schema definition."
    }
  ]
}

Example: combined search and filter

Combining free-text search with field-value filters
{
  "conditions": [
    { "field": "invoice.due_date", "operator": "lte", "value": "2026-06-30" },
    { "field": "invoice.total_eur", "operator": "gte", "value": 500 }
  ],
  "search": "consulting services",
  "sort": { "field": "invoice.due_date", "direction": "asc" },
  "limit": 10
}

Pagination is supported via the page and limit parameters. The default page size is 50 results. For large workspaces with many matching documents, iterate through pages by incrementing page from 1. The response includes a total field showing the total number of matches, so the agent can determine how many pages remain and communicate this to the user.

When building filters dynamically from user requests, agents should validate field names against the workspace's field registry first. The most reliable way is to call talonic_search to discover available canonical field names and check the filterable flag. Attempting to filter on a field that does not exist or is not yet filterable returns a VALIDATION_ERROR with a descriptive message. The agent should catch this error and suggest alternative field names rather than failing silently.

Frequently asked questions

How do I filter documents by field value?+
Use talonic_filter with canonical field names and operators like eq, gt, between, contains. The API resolves field names to IDs server-side.
How do I find the correct canonical field names for filtering?+
Call talonic_search first — canonical field names appear in the fields[].canonicalName array. You can also inspect previously extracted documents to see their field structure.
Can I combine free-text search with field filters?+
Yes. Pass the search parameter alongside conditions to combine fuzzy text matching with structured field-value filters in a single call.
What operators are available for talonic_filter?+
Supported operators include eq, neq, gt, gte, lt, lte, between, contains, starts_with, ends_with, and is_empty. Numeric operators (gt, gte, lt, lte, between) require the schema field to be typed as number. String operators work on all field types.
Why does my filter return zero results even though matching documents exist?+
The most common cause is using a numeric operator (gt, lt, between) on a field that is typed as string in the schema. Currency symbols, commas, or locale formatting in the value also cause mismatches. Ensure the schema defines numeric fields as type number and that values are stored without formatting.