Bypass Strategies

Bypass strategies determine how a schema field is populated when it should not go through LLM extraction. Each strategy provides a deterministic value without consuming AI credits. This is useful for fields whose values are known ahead of time, can be derived from other fields, or should be looked up from reference data rather than extracted from the document text.

Strategy types

Parameter	Type	Description
none	strategy	Field is always left blank. Use for fields you want to skip entirely.
constant	strategy	Field is set to a literal value you specify (e.g. "USD", "pending").
generator	strategy	Deterministic value generators: deterministic-id (hash-based ID) or context-fallback (derive from other fields).
reference	strategy	Look up a value from a reference table using a key_expression to match against uploaded reference data.

Use bypass strategies for fields whose values are known ahead of time or can be derived without reading the document. For example, set a constant of "USD" for a currency field that is always the same, or use a generator to produce a deterministic ID for each row. Fields with bypass strategies skip the AI extraction phase entirely, reducing processing time and credit usage.

none — Use when a field should always be blank. Useful for placeholder columns in your output that will be populated by a downstream system.
constant — Use when the value never varies across documents (e.g., currency "USD", data source "talonic", processing status "pending").
generator (deterministic-id) — Use when you need a unique, reproducible identifier for each row. Produces a hash-based ID from entity attributes.
generator (context-fallback) — Use when the value can be derived from other fields in the schema without reading the document.
reference — Use when the value should be looked up from a reference table using a key_expression that references another schema field (e.g., map supplier name to ERP vendor code).

The reference bypass strategy is particularly powerful for enrichment fields. Define a key_expression that references another field in the schema (e.g., the supplier name), and the system will automatically look up the corresponding code from your reference table without any AI involvement. This is ideal for mapping extracted entity names to internal system identifiers, ERP codes, or classification labels.

For best results, audit your schema for fields that never vary across documents — these are prime candidates for the constant strategy. Fields like currency, data source, or processing batch can be set once and never require AI extraction. This reduces per-document processing cost and improves job completion time, especially on large runs with hundreds of documents.

Example generator bypass configuration (deterministic ID)

{
  "strategy": "generator",
  "generator_type": "deterministic-id",
  "generator_config": {
    "prefix": "INV"
  }
}

// Each row receives a unique, reproducible ID like "INV-a7b3c9d2"
// based on a hash of the document and entity attributes.

Example reference bypass configuration (lookup from reference table)

{
  "strategy": "reference",
  "key_expression": "vendor_name",
  "reference_table_id": "ref_vendor_codes"
}

// The field value is resolved by looking up the vendor_name field
// against the vendor_codes reference table: no LLM call needed.

Bypass strategies are evaluated in Phase 1 of the pipeline, before any AI calls are made. This means bypass fields are resolved in milliseconds at zero credit cost, and their values are immediately available as context for Phase 2 AI extraction of other fields. For schemas with many static or derivable fields, bypass strategies can reduce the number of fields sent to the LLM by 30-50%, which translates directly to faster job completion and lower per-document cost. Audit your schema periodically for new bypass candidates as your understanding of the data matures.

When a generator strategy fails to produce a value, the field falls through to LLM extraction as a safety net — your data is never left incomplete due to a bypass misconfiguration. Strategy values are normalized via generator mappings in Phase 4 of the pipeline. Bypass strategies execute during Phase 1, before any AI calls are made.

Frequently asked questions

What are bypass strategies?+

Bypass strategies populate schema fields without LLM extraction. Options: none (blank), constant (fixed value), generator (deterministic ID), and reference (lookup from reference table).

What happens when a generator bypass fails?+

When a generator strategy fails to produce a value, the field falls through to LLM extraction as a safety net, ensuring the cell is still filled.

Do bypass strategies reduce extraction costs?+

Yes. Fields with bypass strategies skip the AI extraction phase entirely, which reduces both processing time and credit usage. Use constant or reference strategies for fields that do not require document reading.

What is the difference between a reference table on a field and a reference bypass strategy?+

A reference table on a field normalizes AI-extracted values to canonical codes after extraction (Phases 1 and 3). A reference bypass strategy skips AI extraction entirely and resolves the value by looking up another field in a reference table during Phase 1. Use reference tables when the AI needs to read the document first; use reference bypass when the value can be derived from an already-extracted field without reading the document.

Schema Features Reference

Reference Tables

Phase 4: Re-read

Bypass Strategies

Strategy types

Frequently asked questions

Related