Skip to main content

Source Connectors

Beyond manual upload and API ingestion, Talonic connects to external systems to automatically ingest documents. Each connector authenticates via OAuth or credentials and syncs documents into a source. Connectors turn Talonic into a continuous ingestion pipeline — once configured, new files arriving in a connected folder or inbox are available for processing without manual intervention.

Available connectors

ParameterTypeDescription
Google DriveOAuthSync files from Google Drive folders. Shared drives supported.
GmailOAuthIngest email attachments. Search bar with Gmail query passthrough.
SharePointOAuthSync documents from SharePoint sites and document libraries.
OneDriveOAuthSync files from OneDrive personal or business accounts.
OutlookOAuthIngest email attachments with date range and body inclusion settings.
TeamsOAuthIngest meeting transcripts and channel attachments. Requires tenant-admin consent for privileged scopes.
NotionOAuthSync pages and databases. Users choose which pages to share during OAuth consent.
SQLcredentialsConnect to MSSQL or PostgreSQL databases. Browse tables, run saved queries.
Amazon S3credentialsSync from S3 buckets (and S3-compatible storage like MinIO, R2).
Azure BlobcredentialsSync from Azure Blob Storage containers.

Google and Microsoft connectors share a single OAuth client each. OAuth tokens are encrypted at rest using aes-256-gcm. Each source card includes a Batch Processing toggle to defer extraction at 50% cost.

OAuth-based connectors (Google Drive, Gmail, SharePoint, OneDrive, Outlook, Teams, Notion) use a consent-based flow where you authorize Talonic to access specific resources. For Microsoft connectors, Teams requires extended scopes that need tenant-admin consent. If a connector's OAuth credentials are revoked or expire, the source enters a disconnected state — reconnecting via the source settings page automatically refreshes the credentials without losing your existing documents.

Credential-based connectors (SQL, Amazon S3, Azure Blob) authenticate with access keys or connection strings rather than OAuth. SQL connections support PostgreSQL, MySQL, and MSSQL, with a built-in read-only safety layer that prevents accidental writes. S3-compatible storage like MinIO and Cloudflare R2 also works through the S3 connector. All credentials are encrypted at rest before being stored.

Setting Up a Connector

  1. Navigate to Sources and click New Source.
  2. Select the connector type from the dropdown (e.g., Google Drive, SharePoint, S3).
  3. For OAuth connectors, complete the authorization flow — you will be redirected to the provider to grant access.
  4. For credential-based connectors, enter the required credentials (access key, connection string, or API key).
  5. Browse the connected system to select specific folders, mailboxes, buckets, or tables to import.
  6. Optionally enable Batch Processing to defer extraction at 50% cost.

Email connectors (Gmail and Outlook) ingest attachments from messages rather than the messages themselves. Gmail supports query passthrough so you can use standard Gmail search syntax to filter which messages are scanned for attachments. Outlook supports date range filtering and an option to include email bodies as documents. Microsoft Teams ingests meeting transcripts and channel attachments, with configurable surface filters for channels, chats, and meetings.

Connectors are feature-gated on their OAuth client ID/secret. Without credentials configured, the connector dropdown entry is disabled. Microsoft Teams requires tenant-admin consent for privileged scopes like ChannelMessage.Read.All.

Managing Sources via API

Sources can be created, listed, and managed through the REST API. This is useful for automated provisioning scenarios where you need to set up sources programmatically — for example, creating a new source for each client or department as part of an onboarding workflow. The API also lets you trigger imports from connected sources, which is how you can build scheduled sync pipelines using external orchestration tools like cron jobs or workflow engines.

List all sources
curl https://api.talonic.com/v1/sources \
  -H "Authorization: Bearer $TALONIC_API_KEY"
Response
{
  "data": [
    {
      "id": "src_abc123",
      "name": "Invoices - Google Drive",
      "type": "google_drive",
      "documents_count": 234,
      "status": "connected",
      "batch_processing": false
    },
    {
      "id": "src_def456",
      "name": "Contracts - Manual Upload",
      "type": "upload",
      "documents_count": 47,
      "status": "active",
      "batch_processing": true
    }
  ],
  "meta": { "total": 5 }
}

For S3 and Azure Blob connectors, you can also use the API to create connections with encrypted credentials. The platform encrypts all credentials at rest using AES-256-GCM, so sensitive access keys and connection strings are never stored in plaintext. When listing sources through the API, credential values are redacted — only the connection status and metadata are returned to prevent accidental exposure of secrets in logs or dashboards.

Frequently asked questions

What external sources can Talonic connect to?+
Google Drive, Gmail, SharePoint, OneDrive, Outlook, Teams, Notion, SQL databases (MSSQL/PostgreSQL/MySQL), Amazon S3 (and S3-compatible storage like MinIO and Cloudflare R2), and Azure Blob Storage. Each connector authenticates via OAuth or credentials.
How are OAuth tokens stored?+
OAuth access and refresh tokens are encrypted at rest using AES-256-GCM. The encryption key is SOURCE_ENCRYPTION_KEY (falls back to JWT_SECRET). Tokens are decrypted only when making API calls to the connected service.
What happens if a connector loses its credentials or authorization?+
If OAuth credentials are revoked or expire, the source enters a disconnected state. Reconnecting via the source settings page automatically refreshes the credentials without losing your existing documents or configuration. No documents are deleted during disconnection.
Does the SQL connector support write operations?+
No. SQL connections have a built-in read-only safety layer. A two-layer defense ensures no writes: an AST parser rejects anything that is not a single SELECT statement, and per-transaction read-only mode is enforced at the database level. MSSQL connections additionally reject accounts with elevated privileges.