Version: 2.7-dev

ClickHouse Integration Guide

Connect MaestroHub pipelines to ClickHouse for high-throughput columnar analytics, time-series rollups, and bulk ingestion. This guide covers connection setup, TLS / mutual TLS configuration, and the three function types (Query, Execute, Write) tailored for ClickHouse workloads.

Overview

The ClickHouse connector brings columnar OLAP capabilities to your MaestroHub pipelines:

Read data with SELECT queries — aggregations, joins, window functions, top-K with LIMIT BY
Execute DML/DDL — ALTER TABLE, OPTIMIZE … FINAL, INSERT … SELECT, TRUNCATE
Bulk-write pipeline rows — batched VALUES inserts with optional MergeTree auto-create
Native column types — Map(K, V), Array(T), DateTime64, UUID, IPv4 / IPv6 bound directly without JSON round-tripping
Secure connections — system trust store by default, custom CA bundle, or mutual TLS with client cert + key
Wire compression — LZ4 (fast) or ZSTD (smaller) over the native TCP protocol

Native TCP Protocol

ClickHouse exposes a native TCP protocol on port 9000 (plaintext) or 9440 (TLS). MaestroHub connects over the native protocol for full type fidelity and pipelined batch writes — the HTTP interface on port 8123 is not used.

Connection Configuration

Creating a ClickHouse Connection

Navigate to Connections → New Connection → ClickHouse and configure the following:

ClickHouse Connection Creation Fields

1. Profile Information

Field	Default	Description
Profile Name	-	A descriptive name for this connection profile (required, max 100 characters)
Description	-	Optional description for this ClickHouse connection

2. Database Configuration

Field	Default	Description
Host	localhost	ClickHouse server hostname or IP address (required)
Port	9000	ClickHouse native TCP port — `9000` plaintext, `9440` TLS (1-65535, required)
Database	default	Database name to connect to (required)
Connect Timeout (sec)	30	Maximum time to wait for connection establishment (1-300 seconds)

Default Port

ClickHouse uses port 9000 for the native protocol, not the HTTP port 8123. When you enable TLS, switch to 9440 (the standard ClickHouse TLS port) — the form shows a reminder banner.

3. Basic Authentication

Field	Default	Description
Username	default	ClickHouse username (optional)
Password	-	ClickHouse user password (optional)

Anonymous Authentication

Stock ClickHouse ships with a default user that has no password. Both Username and Password are optional in MaestroHub — leave both empty for an anonymous connection to the default user, or fill them to authenticate as another role. Most managed ClickHouse offerings (ClickHouse Cloud, Altinity) require credentials.

4. TLS Settings

4a. Enable TLS

Field	Default	Description
Enable TLS	false	Use encrypted native protocol (typically on port 9440)

4b. Skip TLS Verification

(Only displayed when TLS is enabled)

Field	Default	Description
Skip TLS Verification	false	Do not verify the server certificate. Development only — disables both hostname and CA validation

4c. Certificate Configuration

(Only displayed when TLS is enabled)

Field	Default	Description
Server CA Certificate	-	PEM-encoded CA bundle that signed the ClickHouse server certificate. Required for self-signed or internal CAs not in the system trust store; leave empty for public CAs (Let's Encrypt, etc.)
Client Certificate	-	PEM-encoded client certificate for mutual TLS (mTLS) — fill both Client Certificate and Private Key together
Client Private Key	-	PEM-encoded private key matching the client certificate. Stored encrypted at rest

Cert Upload

Each certificate field has an Upload button that reads a .pem file from disk and validates the format inline. The Private Key field is masked by default — toggle the eye icon to reveal during paste, then re-hide.

TLS Behavior Matrix

Configuration	Behavior
TLS off	Plain TCP — no encryption
TLS on, Skip Verify off, no CA	Encryption + system trust store + hostname validation (recommended for public-CA servers)
TLS on, Skip Verify off, custom CA	Encryption + custom CA validation + hostname check (self-signed / internal CA)
TLS on, Skip Verify on	Encryption only — no server validation. Disables the Server CA field
TLS on, Client Cert + Key set	Adds mutual TLS authentication on top of any of the above

5. Wire Protocol

Field	Default	Description
Compression	lz4	Wire compression for the native protocol (`none` / `lz4` / `zstd`). LZ4 is the speed/size sweet spot; ZSTD trades CPU for smaller payloads
Read Timeout (sec)	60	Per-query read timeout in seconds (1-3600)
Dial Timeout (sec)	10	TCP dial timeout in seconds (1-300)

6. Connection Pool Settings

Field	Default	Description
Max Open Connections	10	Maximum number of simultaneous database connections (1-1000)
Max Idle Connections	5	Idle connections kept ready for reuse (0-100, 0 = let the driver use its built-in default)
Connection Max Lifetime (sec)	900	Maximum age of a connection before it is recycled (0-86400 seconds, 0 = use driver default)
Connection Max Idle Time (sec)	300	How long an idle connection may remain unused before being closed (0-86400 seconds, 0 = use driver default)

Connection pool settings help optimize ClickHouse performance by balancing concurrency, resource usage, and latency across workloads.

7. Retry Configuration

Field	Default	Description
Retries	3	Number of connection retry attempts (0-10, 0 = no retry)
Retry Delay (ms)	100	Initial delay between retry attempts in milliseconds (0-60000 ms)
Retry Backoff Multiplier	2	Exponential factor for retry delay growth (1-10, e.g., 2 means delay doubles each retry)

Retry Behavior Example

1st retry: wait 100ms
2nd retry: wait 200ms (100 × 2)
3rd retry: wait 400ms (200 × 2)

8. Connection Labels

Field	Default	Description
Labels	-	Key-value pairs to categorize and organize this ClickHouse connection (max 10 labels)

Example Labels

env: prod — Environment
team: analytics — Responsible team
cluster: warehouse-eu — Cluster identity

Notes

Required Fields: All fields described as "required" must be filled.
Optional Auth: Stock ClickHouse allows the default user to connect without a password. Make Username/Password mandatory in your own deployment via users.xml.
TLS / mTLS: A custom CA bundle is only consulted when Skip TLS Verification is off. Client certificates are still sent to the server even with Skip Verify on, but the server is no longer authenticated to you — only enable in dev/test environments.
Connection Pooling: Manages concurrency with Max Open Connections, reduces latency with Max Idle Connections, mitigates stale connections via Connection Max Lifetime, and frees resources using Connection Max Idle Time.
Retry Logic: Implements exponential backoff — initial connection attempts use the base delay and each retry multiplies the delay by the backoff factor; ideal for transient network issues or rolling restarts of a ClickHouse cluster.
Timeout Values: Connect Timeout applies to initial connection establishment; Read Timeout caps individual query duration. All timeout values are in seconds unless noted (Retry Delay uses milliseconds).

Function Builder

Creating ClickHouse Functions

Once you have a connection established, you can create reusable functions for different operation types:

Navigate to Functions → New Function
Select one of the ClickHouse function types: Query, Execute, or Write
Choose your ClickHouse connection
Configure the function parameters

ClickHouse function creation interface showing the three function types: Query, Execute, and Write

Query Function

Purpose: Execute SELECT queries against ClickHouse and return rows as structured data.

Configuration Fields

Field	Type	Required	Default	Description
SQL Query	String	Yes	-	SQL SELECT statement to execute. Supports parameterized queries with `((param))` syntax.
Timeout (seconds)	Number	No	30	Per-execution timeout in seconds (1-3600).

Use Cases:

Aggregate metrics with GROUP BY over wide MergeTree tables
Join fact tables with dictionaries for enrichment
Window functions over ordered event streams
Top-K analytics using LIMIT BY
Time-series rollups with toStartOfInterval() / toStartOfHour()

Execute Function

Purpose: Execute DML/DDL statements (INSERT, ALTER, CREATE, OPTIMIZE, TRUNCATE, etc.) on ClickHouse. Use for ad-hoc writes, schema migrations, and table maintenance.

Configuration Fields

Field	Type	Required	Default	Description
SQL Statement	String	Yes	-	DML/DDL SQL statement to execute. Supports parameterized queries with `((param))` syntax.
Timeout (seconds)	Number	No	30	Per-execution timeout in seconds (1-3600).

Use Cases:

INSERT INTO … SELECT to materialize an aggregation into a new table
ALTER TABLE … ADD COLUMN for schema evolution
OPTIMIZE TABLE … FINAL to merge MergeTree parts on demand
TRUNCATE TABLE for fast purges
CREATE MATERIALIZED VIEW to maintain a continuously-updated rollup

Write Function

Purpose: Bulk-insert pipeline data into a ClickHouse table. This is not a SQL editor — it loads structured rows from your pipeline into the target table.

Load data into ClickHouse with two paths: an existing table (auto-detected schema mapping) or an auto-created MergeTree table with operator-specified ORDER BY. Supports batching for efficient bulk loading.

Configuration Fields

Field	Type	Required	Default	Description
Table Name	String	Yes	-	Target table to insert data into.
Data	Any	Yes	-	Row payload — array of objects, single object, or `((data))` template pulling from upstream pipeline output.
Schema Hints	Object	No	-	Type hints for table creation (`{ "field": "Type" }`). Only consulted when auto-creating the table.
Create Table If Not Exists	Boolean	No	false	Auto-create the table if it does not exist (MergeTree engine). When disabled, missing tables produce a clear error instead of inserting.
ORDER BY (for auto-create)	String	No	-	Comma-separated columns for the MergeTree `ORDER BY` clause when auto-creating the table. Leave empty for `tuple()` (no sort key — allowed, but disables ClickHouse's primary-index optimisation).
Batch Size	Number	No	1000	Number of rows per INSERT batch (1-100000).
Timeout (seconds)	Number	No	60	Per-execution timeout in seconds (1-3600).

Use Cases:

Stream pipeline events into an auto-created MergeTree table
Bulk-load telemetry from MQTT / OPC UA / Modbus collectors with batched VALUES
Append rows to a Distributed table fronting a cluster
Land enrichment results from upstream transform nodes into an analytics table

How Write Works Step by Step

1. Batching

Each batch generates a single INSERT INTO `table` (`col1`, `col2`) VALUES (?, ?), (?, ?) statement with multiple value tuples. If your pipeline sends 2,500 rows and Batch Size is 1,000, exactly 3 INSERT statements are executed (1000 + 1000 + 500).

2. Column Matching

Write uses case-insensitive matching between incoming data field names and table column names. For example, a data field named sensorId will match a table column SensorId or sensorid. The original column casing from the database is preserved in the generated INSERT statement.

3. Native Typed Binding for Map / Array / DateTime

ClickHouse's native protocol expects typed Go values for composite columns — not JSON strings. The Write function converts pipeline data accordingly:

Pipeline Value	Bound As	Target Column Type
`{"unit": "celsius", "loc": "line1"}`	`map[string]string`	`Map(String, String)`
`["temperature", "critical"]`	`[]string`	`Array(String)`
`"2026-05-18T12:30:00Z"`	`time.Time`	`DateTime` / `DateTime64`
Scalars (number, bool, string, UUID-shaped string)	Native Go	`Int` / `Float` / `Bool` / `String` / `UUID`
Nested array of objects (`[{...}, {...}]`)	JSON string	`String` / `JSON` (fallback)

Map / Array element types

The Write function emits map[string]string and []string regardless of input element type — non-string values are %v-stringified to fit. For numeric-element columns like Array(Int64) or Map(String, Float64), send already-typed elements from upstream, or define those columns as String and parse on read.

4. Table Exists — Normal Flow

The function queries system.tables to check if the target table exists in the current database
If the table exists, it fetches all column metadata (name, type) from system.columns
It matches incoming data fields to table columns (case-insensitive)
Fields in data that don't match any column are silently skipped — they are tracked and returned in the response metadata as skippedFields
Batch INSERT statements are generated and executed using only the matched columns

5. Table Does Not Exist

Create Table If Not Exists	Behavior
Disabled (default)	Returns an error immediately: "table 'X' does not exist. Enable 'Create table if not exists' or create it manually". No insert is attempted.
Enabled, no schema provided	Types are inferred from the first row of data using the rules below. A `CREATE TABLE … ENGINE = MergeTree ORDER BY …` statement is executed, then all rows are inserted.
Enabled, with schema	The Schema Hints object is used for column types. The MergeTree `ORDER BY` comes from the dedicated field, defaulting to `tuple()`.

Data Type Inference — When no schema is provided, Write infers Nullable column types from the first row:

Data Value	Inferred Type	Notes
`true` / `false`	`Nullable(Bool)`	JSON boolean values
`42`, `-7`, `1000`	`Nullable(Int64)`	Whole numbers (no fractional part)
`3.14`, `0.5`	`Nullable(Float64)`	Numbers with fractional part
`"2024-01-15T10:30:00Z"`	`Nullable(DateTime64(3))`	Strings matching ISO 8601 / RFC 3339
`"hello"`, `"sensor-01"`	`Nullable(String)`	All other strings
`null`	`Nullable(String)`	Null values default to nullable string
`{"k": "v"}`, `[1, 2]`	`Nullable(String)`	Complex objects/arrays land as nullable string — define explicit `Map(String, String)` / `Array(String)` columns in Schema Hints for native typed storage

Sort-key Columns Are Non-Nullable

ClickHouse MergeTree rejects Nullable columns in ORDER BY (without the allow_nullable_key=1 setting). When you specify ORDER BY for auto-create, the Write function automatically strips the Nullable(…) wrapper from those columns. Make sure the rows you send always populate those fields — missing values for sort-key columns will fail.

6. Engine Clause

Auto-created tables always use ENGINE = MergeTree with the operator-supplied ORDER BY (or tuple() if unspecified). To use ReplacingMergeTree, SummingMergeTree, a Distributed table, or any other engine, create the table manually via the Execute function first and leave Create Table If Not Exists disabled.

7. Partial Failure

Each batch executes independently — there is no transaction wrapping all batches. If batch 3 of 5 fails, batches 1 and 2 have already been committed. The error response still includes rowsInserted showing how many rows succeeded before the failure.

Response Metadata

On success, the response includes:

Field	Description
`rowsInserted`	Total number of rows successfully inserted
`matchedColumns`	Column names that had matching data fields
`skippedFields`	Data fields that didn't match any table column (if any)
`tableCreated`	`true` if the table was created during this call
`batchSize`	Batch size used
`totalRows`	Total number of input rows

Using Parameters

MaestroHub detects the ((parameterName)) syntax and exposes each parameter for validation and runtime binding for Query and Execute functions.

Configuration	Description	Example
Type	Enforce data types for incoming values	`string`, `number`, `boolean`, `date`, `array`
Required	Mark parameters as mandatory or optional	Required / Optional
Default Value	Fallback value applied when callers omit a parameter	`now()`, `0`, `'active'`
Description	Add guidance for downstream users	"Start date for the report"

Pipeline Integration

Use the ClickHouse functions you create here as nodes inside the Pipeline Designer to integrate your ClickHouse warehouse with the rest of your operations stack. Drag the appropriate node type onto the canvas, bind its parameters to upstream outputs or constants, and configure connection-level options without leaving the designer.

Each function type maps to a dedicated pipeline node — Query, Execute, and Write — so you can clearly separate read, write, and bulk-loading operations in your flows.

For node-level configuration details — input shape, output shape, and per-op behavior — see the ClickHouse Nodes reference.

Common Use Cases

Query: Hourly Sensor Rollups

Scenario: Generate per-sensor hourly averages over the last day for a real-time dashboard.

SELECT
    sensor_id,
    toStartOfHour(timestamp) AS hour,
    round(avg(temperature), 2) AS avg_temp,
    round(avg(humidity), 2)    AS avg_humidity,
    count() AS sample_count
FROM sensors
WHERE timestamp > now() - INTERVAL 24 HOUR
  AND sensor_id IN ((sensorIds))
GROUP BY sensor_id, hour
ORDER BY sensor_id, hour;

Pipeline Integration: Feed the result into a dashboard widget, an alert node, or a downstream transform that joins with metadata.

Query: Latest Reading per Device with Join

Scenario: Combine the latest sensor reading with static device metadata.

SELECT
    s.sensor_id,
    m.model,
    m.site,
    argMax(s.temperature, s.timestamp) AS latest_temp,
    max(s.timestamp)                   AS latest_ts
FROM sensors s
INNER JOIN sensor_metadata m USING (sensor_id)
GROUP BY s.sensor_id, m.model, m.site;

Pipeline Integration: Use the result to populate a device-status table or feed a state-change detector.

Execute: Materialise a Daily Rollup

Scenario: Persist an aggregated daily table from raw events, run once per day.

INSERT INTO events_daily
SELECT
    toDate(timestamp) AS day,
    event_type,
    country,
    count()           AS events,
    uniq(session_id)  AS sessions
FROM events
WHERE timestamp >= ((startDate)) AND timestamp < ((endDate))
GROUP BY day, event_type, country;

Pipeline Integration: Schedule with a Cron Trigger node; the result table powers BI dashboards without rescanning raw events.

Execute: Maintenance — Force Merge

Scenario: Trigger a background merge before a heavy reporting window for tables with frequent inserts.

OPTIMIZE TABLE metrics_daily FINAL;

Pipeline Integration: Run via a manual trigger or schedule it ahead of a BI refresh job.

Write: Bulk Loading Telemetry

Scenario: Land high-throughput sensor data from MQTT/OPC UA collectors into a MergeTree table.

Configure a Write function with:

Table Name: sensors
Create Table If Not Exists: enabled
ORDER BY: sensor_id, timestamp
Batch Size: 1000

Connect it after data-collection nodes (MQTT, OPC UA, Modbus, etc.) in your pipeline. The Write function automatically maps incoming fields (sensor_id, timestamp, temperature, humidity, tags, metadata) to columns. Native Map(String, String) and Array(String) columns are bound directly without JSON encoding — a metadata field of {"unit": "celsius"} lands as a queryable Map, not a JSON-string blob.

Overview​

Connection Configuration​

Creating a ClickHouse Connection​

ClickHouse Connection Creation Fields​

1. Profile Information​

2. Database Configuration​

3. Basic Authentication​

4. TLS Settings​

4a. Enable TLS​

4b. Skip TLS Verification​

4c. Certificate Configuration​

5. Wire Protocol​

6. Connection Pool Settings​

7. Retry Configuration​

8. Connection Labels​

Function Builder​

Creating ClickHouse Functions​

Query Function​

Execute Function​

Write Function​

How Write Works Step by Step​

Response Metadata​

Using Parameters​

Pipeline Integration​

Common Use Cases​

Query: Hourly Sensor Rollups​

Query: Latest Reading per Device with Join​

Execute: Materialise a Daily Rollup​

Execute: Maintenance — Force Merge​

Write: Bulk Loading Telemetry​

Overview

Connection Configuration

Creating a ClickHouse Connection

ClickHouse Connection Creation Fields

1. Profile Information

2. Database Configuration

3. Basic Authentication

4. TLS Settings

4a. Enable TLS

4b. Skip TLS Verification

4c. Certificate Configuration

5. Wire Protocol

6. Connection Pool Settings

7. Retry Configuration

8. Connection Labels

Function Builder

Creating ClickHouse Functions

Query Function

Execute Function

Write Function

How Write Works Step by Step

Response Metadata

Using Parameters

Pipeline Integration

Common Use Cases

Query: Hourly Sensor Rollups

Query: Latest Reading per Device with Join

Execute: Materialise a Daily Rollup

Execute: Maintenance — Force Merge

Write: Bulk Loading Telemetry