Skip to main content

Entity Graph

Get the tenant entity relationship graph of distinct extracted values, the documents they occur in, and the field names they attach to. Force a deterministic rebuild on demand.

The entity graph is a deterministic view of the distinct values extracted across your documents and the structure that connects them. An entity is a distinct extracted value (trimmed, lower-cased, whitespace-collapsed). Each entity links to the documents it occurs in and the field names it was extracted from. No LLM and no named-entity recognition are involved: entities are extracted values, typed coarsely from the occurrence data type.

This is a different lens than link keys and cases. Where link keys are the curated fields used to discover case membership, the entity graph is the raw value-to-document fabric underneath. It powers value-centric exploration: pick a vendor name or a reference number and see every document it appears in, regardless of whether those documents form a case.

The graph has two edge kinds, mirroring its tripartite shape. An occurs_in edge connects an entity to a document it appears in. An attached_to edge connects an entity to a field:<name> node representing the field it was extracted from. Entity-to-entity co-occurrence is derived by the consumer, not stored, which keeps the persisted graph compact.

The graph is cached per workspace and rebuilt lazily. When new documents resolve, the snapshot is flagged stale and the next GET /v1/linking/entity-graph rebuilds it before serving. This avoids an expensive rebuild during an ingestion burst while keeping reads fresh. To force an immediate rebuild that bypasses the stale flag, call POST /v1/linking/entity-graph/recompute.

Document plumbing values (document type, label, page, language), pure numbers, empty strings, and over-long text (more than 120 characters) are excluded from the entity set. Entity type is a coarse mapping from the occurrence data type (e.g. org becomes ORG); unmapped values are typed OTHER.
GET/v1/linking/entity-graph

Response

Response fields

entitiesarrayDistinct extracted-value entities.
entities[].idstringNormalized value key (trimmed, lower-cased, whitespace-collapsed).
entities[].valuestringDisplay value of the entity.
entities[].typestringCoarse type derived from the occurrence data type (e.g. ORG, PERSON, OTHER).
entities[].doc_countintegerNumber of distinct documents the entity occurs in.
entities[].occurrencesintegerTotal number of occurrences across all documents.
entities[].field_namesstring[]Field names the value was extracted from.
documentsarrayDocuments referenced by the graph.
documents[].idstringDocument UUID.
documents[].filenamestringDocument filename.
documents[].doc_typestring | nullInferred document type.
documents[].datestring | nullDocument date (YYYY-MM-DD), or null.
documents[].mtimeintegerDocument timestamp in epoch milliseconds (0 if unknown).
field_namesarrayField nodes referenced by attached_to edges.
field_names[].idstringField node ID, prefixed with "field:".
field_names[].namestringField name.
field_names[].value_countintegerNumber of occurrences attributed to this field.
edgesarrayAll graph edges.
edges[].sourcestringEntity ID (the normalized value key).
edges[].targetstringDocument UUID (occurs_in) or field node ID (attached_to).
edges[].kindstringEdge kind: occurs_in or attached_to.
stats.n_entitiesintegerTotal number of entities.
stats.n_documentsintegerTotal number of documents.
stats.n_edgesintegerTotal number of edges.
stats.sourcestringData source for the graph (field_occurrences).
stats.nerbooleanWhether NER typing was used. Always false.

Response

{
  "entities": [
    {
      "id": "acme corp",
      "value": "Acme Corp",
      "type": "ORG",
      "doc_count": 3,
      "occurrences": 4,
      "field_names": ["vendor_name", "supplier"]
    }
  ],
  "documents": [
    {
      "id": "doc_uuid_1",
      "filename": "invoice_oct.pdf",
      "doc_type": "Invoice",
      "date": "2024-10-01",
      "mtime": 1727740800000
    }
  ],
  "field_names": [
    { "id": "field:vendor_name", "name": "vendor_name", "value_count": 3 }
  ],
  "edges": [
    { "source": "acme corp", "target": "doc_uuid_1", "kind": "occurs_in" },
    { "source": "acme corp", "target": "field:vendor_name", "kind": "attached_to" }
  ],
  "stats": {
    "n_entities": 1,
    "n_documents": 1,
    "n_edges": 2,
    "built_ms": 84,
    "source": "field_occurrences",
    "ner": false
  }
}

Recompute Entity Graph

Force a full rebuild of the entity graph, bypassing the stale flag. Unlike the read endpoint, which rebuilds only when the snapshot is dirty, recompute always recomputes from the current field_occurrences and persists the new snapshot. The response returns only the stats block, so use it to confirm graph size after a large ingestion, then read the full graph from the GET endpoint.

POST/v1/linking/entity-graph/recompute

Response fields

okbooleanAlways true on success.
statsobjectStatistics for the freshly built graph (n_entities, n_documents, n_edges, source, ner).

Response

{
  "ok": true,
  "stats": {
    "n_entities": 412,
    "n_documents": 87,
    "n_edges": 1903,
    "built_ms": 612,
    "source": "field_occurrences",
    "ner": false
  }
}

Errors

Error responses

400bad_requestA specific customer must be selected; the all master view is not allowed for the entity graph.
401unauthorizedMissing or invalid API key.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.