Skip to main content
Version: 2.3

Databricks SQL Integration Guide

Connect to Databricks SQL warehouses to query and execute statements against your lakehouse data. This guide covers connection setup, authentication, function authoring, and pipeline integration.

Overview

The Databricks SQL connector enables you to interact with Databricks SQL warehouses through the SQL Statement Execution API. It supports:

  • Query data from tables, views, or custom SQL using SELECT statements
  • Execute statements for INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, and MERGE operations
  • Use parameterized queries with ((parameterName)) syntax for dynamic, reusable operations
  • Catalog and schema overrides per function for multi-catalog environments
  • Two authentication methods: Personal Access Token (PAT) and OAuth machine-to-machine (M2M)
  • Connection pooling for high-concurrency workloads
Cloud-Native

Databricks SQL connects via your workspace URL. No host or port configuration is needed — connections are established through the Databricks SQL Statement Execution API.

Connection Configuration

Creating a Databricks SQL Connection

Navigate to ConnectionsNew ConnectionDatabricks SQL and configure the following:

Databricks SQL Connection Creation Fields

1. Profile Information
FieldDefaultDescription
Profile Name-A descriptive name for this connection profile (required, max 100 characters)
Description-Optional description for this Databricks SQL connection
2. Connection Settings
FieldDefaultDescription
Workspace URL-Your Databricks workspace URL (required). Must start with https:// (e.g., https://myworkspace.cloud.databricks.com)
HTTP Path-Optional HTTP path for the SQL warehouse endpoint (e.g., /sql/1.0/warehouses/abc123)
Connect Timeout (sec)30Maximum time to wait for connection establishment (1–600 seconds)
Query Timeout (sec)120Default timeout for query and statement execution (1–3600 seconds)
3. Authentication

Databricks SQL supports two authentication methods:

FieldDefaultDescription
Auth TypePersonal Access TokenAuthentication method: Personal Access Token or OAuth M2M

Personal Access Token (PAT)

FieldDefaultDescription
Access Token-Databricks personal access token (required when using PAT auth)

OAuth Machine-to-Machine (M2M)

FieldDefaultDescription
Client ID-OAuth application client ID (required when using OAuth M2M)
Client Secret-OAuth application client secret (required when using OAuth M2M)
Generating a Personal Access Token

In your Databricks workspace, go to User SettingsDeveloperAccess TokensGenerate New Token. Copy the token value immediately — it cannot be viewed again after creation.

4. Catalog & Warehouse
FieldDefaultDescription
Catalog-Unity Catalog name (optional). Sets the default catalog for all queries
SchemadefaultDefault schema name within the catalog
Warehouse ID-SQL warehouse ID (required). Found in the warehouse details page under Connection Details
Finding Your Warehouse ID

In the Databricks workspace, navigate to SQL Warehouses, select your warehouse, and find the ID in the Connection Details tab. The warehouse ID is also the last segment of the HTTP path.

5. Connection Pool Settings
FieldDefaultDescription
Max Open Connections25Maximum number of simultaneous database connections (1–100)
Max Idle Connections5Idle connections kept ready for reuse (0–50, 0 = close idle connections immediately)
Connection Max Lifetime (sec)300Maximum age of a connection before it is recycled (0–86400 seconds)
Connection Max Idle Time (sec)60How long an idle connection may remain unused before being closed (0–86400 seconds)
6. Retry Configuration
FieldDefaultDescription
Retries3Number of connection attempts (0–10, 0 = no retry)
Retry Delay (ms)500Delay between retry attempts in milliseconds (0–3600000 ms)
Retry Backoff Multiplier2Exponential factor for retry delay growth (1–10)

Example Retry Behavior

  • With Retry Delay = 500ms and Retry Backoff Multiplier = 2:
    • 1st retry: wait 500ms
    • 2nd retry: wait 1000ms
    • 3rd retry: wait 2000ms
7. Connection Labels
FieldDefaultDescription
Labels-Key-value pairs to categorize and organize this connection (max 10 labels)

Example Labels

  • env: production – Environment
  • team: analytics – Responsible team
  • warehouse: starter – Warehouse tier
Notes
  • Required Fields: Workspace URL and Warehouse ID are always required. Authentication credentials depend on the selected auth type.
  • Connection Pooling: Manages concurrency with Max Open Connections, reduces latency with Max Idle Connections, mitigates stale connections via Connection Max Lifetime, and frees resources using Connection Max Idle Time.
  • Retry Logic: Implements exponential backoff — initial connection attempts use the base delay and each retry multiplies the delay by the backoff factor.
  • Catalog & Schema: When set at the connection level, these become the default for all functions. Individual functions can override these values.

Function Builder

Creating Databricks SQL Functions

Once you have a connection established, you can create reusable functions:

  1. Navigate to FunctionsNew Function
  2. Select the function type (Query or Execute)
  3. Choose your Databricks SQL connection
  4. Configure the function parameters

Query Function

Purpose: Execute SQL SELECT statements against your Databricks SQL warehouse. Run queries to read data from tables, views, or any valid SQL expression.

Configuration Fields

FieldTypeRequiredDefaultDescription
SQL QueryStringYes-SQL SELECT statement to execute. Supports parameterized queries with ((parameterName)) syntax.
CatalogStringNo-Override the connection-level catalog for this function
SchemaStringNo-Override the connection-level schema for this function
Timeout (seconds)NumberNo120Per-execution timeout in seconds (1–3600).

Use Cases:

  • SELECT data from Delta Lake tables for analytics dashboards
  • SELECT with JOINs across multiple catalogs and schemas
  • SELECT aggregated metrics for reporting and monitoring
  • SELECT time-series data for trend analysis

Using Parameters

The ((parameterName)) syntax creates dynamic, reusable queries. Parameters are automatically detected and can be configured with:

ConfigurationDescriptionExample
TypeData type validationstring, number, boolean, date, array
RequiredMake parameters mandatory or optionalRequired / Optional
Default ValueFallback value if not providedNOW(), 0, active
DescriptionHelp text for users"Start date for the report"

Execute Function

Purpose: Execute DML/DDL statements against your Databricks SQL warehouse. Use this for data modifications (INSERT, UPDATE, DELETE, MERGE) and schema changes (CREATE, ALTER, DROP).

Configuration Fields

FieldTypeRequiredDefaultDescription
SQL StatementStringYes-DML/DDL SQL statement to execute. Supports parameterized queries with ((parameterName)) syntax.
CatalogStringNo-Override the connection-level catalog for this function
SchemaStringNo-Override the connection-level schema for this function
Timeout (seconds)NumberNo120Per-execution timeout in seconds (1–3600).

Use Cases:

  • INSERT sensor readings or event data into Delta Lake tables
  • UPDATE records based on business logic
  • DELETE or MERGE data for incremental pipeline loads
  • CREATE or ALTER tables and views for schema management

Pipeline Integration

Use the Databricks SQL functions you create here as nodes inside the Pipeline Designer to interact with your lakehouse alongside the rest of your operations stack. Drag a query or execute node onto the canvas, bind its parameters to upstream outputs or constants, and configure connection-level options without leaving the designer.

For end-to-end orchestration ideas, such as combining Databricks queries with MQTT, REST, or analytics steps, explore the Connector Nodes page to see how SQL nodes complement other automation patterns.

Databricks SQL Query Node in Pipeline Designer

Databricks SQL query node in the pipeline designer with connection, function, and parameter configuration

Databricks SQL Execute Node in Pipeline Designer

Databricks SQL execute node for running DML/DDL statements in pipelines