talonic_filter
Filter documents by extracted field values using composable conditions (eq, gt, between, contains, etc.).
Accepts canonical field names (e.g. vendor.name, policy.0_coverage_type) which the Talonic API resolves to IDs server-side, or UUIDs directly.
`talonic_filter` is the precision counterpart to `talonic_search`. While search uses fuzzy matching for discovery, filter applies exact conditions against extracted field values. This makes it ideal for queries like 'show me all invoices from vendor X over 1000 EUR' or 'contracts expiring before 2026-12-31'.
Multiple conditions are AND-ed together, allowing you to build precise queries that narrow results progressively. The optional search parameter lets you combine free-text search with structured filters in a single call, and sort controls the result ordering.
| Parameter | Type | Description |
|---|---|---|
| conditions * | array | Filter conditions, AND-ed together. Each has `field` or `field_id`, `operator`, and `value`. |
| search | string | Optional free-text search applied alongside filters. |
| sort | object | Sort by a field: `{ field, direction: "asc" | "desc" }`. |
| page | integer | Page number for pagination. |
| limit | integer | Results per page. Default: 50. |
filterable: true can be used with talonic_filter; entries with filterable: false have no extracted data yet. The is_not_empty operator checks materialized values, which are updated within seconds of extraction completing.Example
{
"conditions": [
{ "field": "vendor.name", "operator": "eq", "value": "Meridian Energy AG" },
{ "field": "invoice.total_eur", "operator": "gt", "value": 1000 }
],
"sort": { "field": "invoice.total_eur", "direction": "desc" },
"limit": 20
}{
"data": [
{
"document_id": "doc_8f3a...",
"filename": "invoice-2026-001.pdf",
"fields": {
"vendor.name": "Meridian Energy AG",
"invoice.total_eur": 1500.00,
"invoice.due_date": "2026-06-15"
}
},
{
"document_id": "doc_c2d9...",
"filename": "invoice-2026-003.pdf",
"fields": {
"vendor.name": "Meridian Energy AG",
"invoice.total_eur": 1200.00,
"invoice.due_date": "2026-04-30"
}
}
],
"total": 2,
"page": 1
}Available operators
The following operators are available for filter conditions: eq (equals), neq (not equals), gt (greater than), gte (greater than or equal), lt (less than), lte (less than or equal), between (range, requires a two-element array as value), contains (substring match), starts_with, ends_with, is_empty (checks for null or empty values), and is_not_empty (checks for materialized values). Numeric operators (gt, gte, lt, lte, between) only resolve correctly when the schema field is typed as number. The next section explains how to handle that constraint.
Schema typing (preventive + reactive)
A numeric operator on a string-typed field that happens to hold numeric content (e.g. "€1,500.00") silently returns zero matches — the comparison falls back to lexicographic ordering and almost never produces the result the user expects. There are two ways to handle this; pick the right one before constructing the call.
Preventive — gate on `dataType`
Call talonic_search first and read dataType on the field entry. If dataType !== "number", do not issue a numeric operator on that field. Pick a string-friendly operator (eq, contains) or warn the user that the field needs a data_type change in the schema definition before the query can succeed. This avoids the silent-zero-matches outcome entirely.
Reactive — handle `warnings[]`
When a numeric operator is applied to a string-typed field anyway, the API attaches a warnings[] array to the filter response. Each entry has code, message, field/field_id, and a suggestion. The MCP tool surfaces this in structuredContent — agents should relay the message (and suggestion, when present) to the user rather than silently retrying.
{
"data": [],
"total": 0,
"warnings": [
{
"code": "numeric_operator_on_string_field",
"message": "Operator `gt` was applied to field `invoice_total` typed as string. Numeric comparisons against string-typed fields use lexicographic ordering and may return zero matches.",
"field": "invoice_total",
"field_id": "fld_inv_total",
"suggestion": "Change the field's data_type to `number` in the schema definition."
}
]
}Example: combined search and filter
{
"conditions": [
{ "field": "invoice.due_date", "operator": "lte", "value": "2026-06-30" },
{ "field": "invoice.total_eur", "operator": "gte", "value": 500 }
],
"search": "consulting services",
"sort": { "field": "invoice.due_date", "direction": "asc" },
"limit": 10
}Pagination is supported via the page and limit parameters. The default page size is 50 results. For large workspaces with many matching documents, iterate through pages by incrementing page from 1. The response includes a total field showing the total number of matches, so the agent can determine how many pages remain and communicate this to the user.
When building filters dynamically from user requests, agents should validate field names against the workspace's field registry first. The most reliable way is to call talonic_search to discover available canonical field names and check the filterable flag. Attempting to filter on a field that does not exist or is not yet filterable returns a VALIDATION_ERROR with a descriptive message. The agent should catch this error and suggest alternative field names rather than failing silently.