Skip to main content
Version: 2.7-dev

ClickHouse ClickHouse Integration Guide

Connect MaestroHub pipelines to ClickHouse for high-throughput columnar analytics, time-series rollups, and bulk ingestion. This guide covers connection setup, TLS / mutual TLS configuration, and the three function types (Query, Execute, Write) tailored for ClickHouse workloads.

Overview

The ClickHouse connector brings columnar OLAP capabilities to your MaestroHub pipelines:

  • Read data with SELECT queries — aggregations, joins, window functions, top-K with LIMIT BY
  • Execute DML/DDLALTER TABLE, OPTIMIZE … FINAL, INSERT … SELECT, TRUNCATE
  • Bulk-write pipeline rows — batched VALUES inserts with optional MergeTree auto-create
  • Native column typesMap(K, V), Array(T), DateTime64, UUID, IPv4 / IPv6 bound directly without JSON round-tripping
  • Secure connections — system trust store by default, custom CA bundle, or mutual TLS with client cert + key
  • Wire compression — LZ4 (fast) or ZSTD (smaller) over the native TCP protocol
Native TCP Protocol

ClickHouse exposes a native TCP protocol on port 9000 (plaintext) or 9440 (TLS). MaestroHub connects over the native protocol for full type fidelity and pipelined batch writes — the HTTP interface on port 8123 is not used.

Connection Configuration

Creating a ClickHouse Connection

Navigate to ConnectionsNew ConnectionClickHouse and configure the following:

ClickHouse Connection Creation Fields

1. Profile Information
FieldDefaultDescription
Profile Name-A descriptive name for this connection profile (required, max 100 characters)
Description-Optional description for this ClickHouse connection
2. Database Configuration
FieldDefaultDescription
HostlocalhostClickHouse server hostname or IP address (required)
Port9000ClickHouse native TCP port — 9000 plaintext, 9440 TLS (1-65535, required)
DatabasedefaultDatabase name to connect to (required)
Connect Timeout (sec)30Maximum time to wait for connection establishment (1-300 seconds)
Default Port

ClickHouse uses port 9000 for the native protocol, not the HTTP port 8123. When you enable TLS, switch to 9440 (the standard ClickHouse TLS port) — the form shows a reminder banner.

3. Basic Authentication
FieldDefaultDescription
UsernamedefaultClickHouse username (optional)
Password-ClickHouse user password (optional)
Anonymous Authentication

Stock ClickHouse ships with a default user that has no password. Both Username and Password are optional in MaestroHub — leave both empty for an anonymous connection to the default user, or fill them to authenticate as another role. Most managed ClickHouse offerings (ClickHouse Cloud, Altinity) require credentials.

4. TLS Settings
4a. Enable TLS
FieldDefaultDescription
Enable TLSfalseUse encrypted native protocol (typically on port 9440)
4b. Skip TLS Verification

(Only displayed when TLS is enabled)

FieldDefaultDescription
Skip TLS VerificationfalseDo not verify the server certificate. Development only — disables both hostname and CA validation
4c. Certificate Configuration

(Only displayed when TLS is enabled)

FieldDefaultDescription
Server CA Certificate-PEM-encoded CA bundle that signed the ClickHouse server certificate. Required for self-signed or internal CAs not in the system trust store; leave empty for public CAs (Let's Encrypt, etc.)
Client Certificate-PEM-encoded client certificate for mutual TLS (mTLS) — fill both Client Certificate and Private Key together
Client Private Key-PEM-encoded private key matching the client certificate. Stored encrypted at rest
Cert Upload

Each certificate field has an Upload button that reads a .pem file from disk and validates the format inline. The Private Key field is masked by default — toggle the eye icon to reveal during paste, then re-hide.

TLS Behavior Matrix
ConfigurationBehavior
TLS offPlain TCP — no encryption
TLS on, Skip Verify off, no CAEncryption + system trust store + hostname validation (recommended for public-CA servers)
TLS on, Skip Verify off, custom CAEncryption + custom CA validation + hostname check (self-signed / internal CA)
TLS on, Skip Verify onEncryption only — no server validation. Disables the Server CA field
TLS on, Client Cert + Key setAdds mutual TLS authentication on top of any of the above
5. Wire Protocol
FieldDefaultDescription
Compressionlz4Wire compression for the native protocol (none / lz4 / zstd). LZ4 is the speed/size sweet spot; ZSTD trades CPU for smaller payloads
Read Timeout (sec)60Per-query read timeout in seconds (1-3600)
Dial Timeout (sec)10TCP dial timeout in seconds (1-300)
6. Connection Pool Settings
FieldDefaultDescription
Max Open Connections10Maximum number of simultaneous database connections (1-1000)
Max Idle Connections5Idle connections kept ready for reuse (0-100, 0 = let the driver use its built-in default)
Connection Max Lifetime (sec)900Maximum age of a connection before it is recycled (0-86400 seconds, 0 = use driver default)
Connection Max Idle Time (sec)300How long an idle connection may remain unused before being closed (0-86400 seconds, 0 = use driver default)

Connection pool settings help optimize ClickHouse performance by balancing concurrency, resource usage, and latency across workloads.

7. Retry Configuration
FieldDefaultDescription
Retries3Number of connection retry attempts (0-10, 0 = no retry)
Retry Delay (ms)100Initial delay between retry attempts in milliseconds (0-60000 ms)
Retry Backoff Multiplier2Exponential factor for retry delay growth (1-10, e.g., 2 means delay doubles each retry)

Retry Behavior Example

  • 1st retry: wait 100ms
  • 2nd retry: wait 200ms (100 × 2)
  • 3rd retry: wait 400ms (200 × 2)
8. Connection Labels
FieldDefaultDescription
Labels-Key-value pairs to categorize and organize this ClickHouse connection (max 10 labels)

Example Labels

  • env: prod — Environment
  • team: analytics — Responsible team
  • cluster: warehouse-eu — Cluster identity
Notes
  • Required Fields: All fields described as "required" must be filled.
  • Optional Auth: Stock ClickHouse allows the default user to connect without a password. Make Username/Password mandatory in your own deployment via users.xml.
  • TLS / mTLS: A custom CA bundle is only consulted when Skip TLS Verification is off. Client certificates are still sent to the server even with Skip Verify on, but the server is no longer authenticated to you — only enable in dev/test environments.
  • Connection Pooling: Manages concurrency with Max Open Connections, reduces latency with Max Idle Connections, mitigates stale connections via Connection Max Lifetime, and frees resources using Connection Max Idle Time.
  • Retry Logic: Implements exponential backoff — initial connection attempts use the base delay and each retry multiplies the delay by the backoff factor; ideal for transient network issues or rolling restarts of a ClickHouse cluster.
  • Timeout Values: Connect Timeout applies to initial connection establishment; Read Timeout caps individual query duration. All timeout values are in seconds unless noted (Retry Delay uses milliseconds).

Function Builder

Creating ClickHouse Functions

Once you have a connection established, you can create reusable functions for different operation types:

  1. Navigate to FunctionsNew Function
  2. Select one of the ClickHouse function types: Query, Execute, or Write
  3. Choose your ClickHouse connection
  4. Configure the function parameters
ClickHouse Function Creation

ClickHouse function creation interface showing the three function types: Query, Execute, and Write

Query Function

Purpose: Execute SELECT queries against ClickHouse and return rows as structured data.

Configuration Fields

FieldTypeRequiredDefaultDescription
SQL QueryStringYes-SQL SELECT statement to execute. Supports parameterized queries with ((param)) syntax.
Timeout (seconds)NumberNo30Per-execution timeout in seconds (1-3600).

Use Cases:

  • Aggregate metrics with GROUP BY over wide MergeTree tables
  • Join fact tables with dictionaries for enrichment
  • Window functions over ordered event streams
  • Top-K analytics using LIMIT BY
  • Time-series rollups with toStartOfInterval() / toStartOfHour()

Execute Function

Purpose: Execute DML/DDL statements (INSERT, ALTER, CREATE, OPTIMIZE, TRUNCATE, etc.) on ClickHouse. Use for ad-hoc writes, schema migrations, and table maintenance.

Configuration Fields

FieldTypeRequiredDefaultDescription
SQL StatementStringYes-DML/DDL SQL statement to execute. Supports parameterized queries with ((param)) syntax.
Timeout (seconds)NumberNo30Per-execution timeout in seconds (1-3600).

Use Cases:

  • INSERT INTO … SELECT to materialize an aggregation into a new table
  • ALTER TABLE … ADD COLUMN for schema evolution
  • OPTIMIZE TABLE … FINAL to merge MergeTree parts on demand
  • TRUNCATE TABLE for fast purges
  • CREATE MATERIALIZED VIEW to maintain a continuously-updated rollup

Write Function

Purpose: Bulk-insert pipeline data into a ClickHouse table. This is not a SQL editor — it loads structured rows from your pipeline into the target table.

Load data into ClickHouse with two paths: an existing table (auto-detected schema mapping) or an auto-created MergeTree table with operator-specified ORDER BY. Supports batching for efficient bulk loading.

Configuration Fields

FieldTypeRequiredDefaultDescription
Table NameStringYes-Target table to insert data into.
DataAnyYes-Row payload — array of objects, single object, or ((data)) template pulling from upstream pipeline output.
Schema HintsObjectNo-Type hints for table creation ({ "field": "Type" }). Only consulted when auto-creating the table.
Create Table If Not ExistsBooleanNofalseAuto-create the table if it does not exist (MergeTree engine). When disabled, missing tables produce a clear error instead of inserting.
ORDER BY (for auto-create)StringNo-Comma-separated columns for the MergeTree ORDER BY clause when auto-creating the table. Leave empty for tuple() (no sort key — allowed, but disables ClickHouse's primary-index optimisation).
Batch SizeNumberNo1000Number of rows per INSERT batch (1-100000).
Timeout (seconds)NumberNo60Per-execution timeout in seconds (1-3600).

Use Cases:

  • Stream pipeline events into an auto-created MergeTree table
  • Bulk-load telemetry from MQTT / OPC UA / Modbus collectors with batched VALUES
  • Append rows to a Distributed table fronting a cluster
  • Land enrichment results from upstream transform nodes into an analytics table

How Write Works Step by Step

1. Batching

Each batch generates a single INSERT INTO `table` (`col1`, `col2`) VALUES (?, ?), (?, ?) statement with multiple value tuples. If your pipeline sends 2,500 rows and Batch Size is 1,000, exactly 3 INSERT statements are executed (1000 + 1000 + 500).

2. Column Matching

Write uses case-insensitive matching between incoming data field names and table column names. For example, a data field named sensorId will match a table column SensorId or sensorid. The original column casing from the database is preserved in the generated INSERT statement.

3. Native Typed Binding for Map / Array / DateTime

ClickHouse's native protocol expects typed Go values for composite columns — not JSON strings. The Write function converts pipeline data accordingly:

Pipeline ValueBound AsTarget Column Type
{"unit": "celsius", "loc": "line1"}map[string]stringMap(String, String)
["temperature", "critical"][]stringArray(String)
"2026-05-18T12:30:00Z"time.TimeDateTime / DateTime64
Scalars (number, bool, string, UUID-shaped string)Native GoInt* / Float* / Bool / String / UUID
Nested array of objects ([{...}, {...}])JSON stringString / JSON (fallback)
Map / Array element types

The Write function emits map[string]string and []string regardless of input element type — non-string values are %v-stringified to fit. For numeric-element columns like Array(Int64) or Map(String, Float64), send already-typed elements from upstream, or define those columns as String and parse on read.

4. Table Exists — Normal Flow

  1. The function queries system.tables to check if the target table exists in the current database
  2. If the table exists, it fetches all column metadata (name, type) from system.columns
  3. It matches incoming data fields to table columns (case-insensitive)
  4. Fields in data that don't match any column are silently skipped — they are tracked and returned in the response metadata as skippedFields
  5. Batch INSERT statements are generated and executed using only the matched columns

5. Table Does Not Exist

Create Table If Not ExistsBehavior
Disabled (default)Returns an error immediately: "table 'X' does not exist. Enable 'Create table if not exists' or create it manually". No insert is attempted.
Enabled, no schema providedTypes are inferred from the first row of data using the rules below. A CREATE TABLE … ENGINE = MergeTree ORDER BY … statement is executed, then all rows are inserted.
Enabled, with schemaThe Schema Hints object is used for column types. The MergeTree ORDER BY comes from the dedicated field, defaulting to tuple().

Data Type Inference — When no schema is provided, Write infers Nullable column types from the first row:

Data ValueInferred TypeNotes
true / falseNullable(Bool)JSON boolean values
42, -7, 1000Nullable(Int64)Whole numbers (no fractional part)
3.14, 0.5Nullable(Float64)Numbers with fractional part
"2024-01-15T10:30:00Z"Nullable(DateTime64(3))Strings matching ISO 8601 / RFC 3339
"hello", "sensor-01"Nullable(String)All other strings
nullNullable(String)Null values default to nullable string
{"k": "v"}, [1, 2]Nullable(String)Complex objects/arrays land as nullable string — define explicit Map(String, String) / Array(String) columns in Schema Hints for native typed storage
Sort-key Columns Are Non-Nullable

ClickHouse MergeTree rejects Nullable columns in ORDER BY (without the allow_nullable_key=1 setting). When you specify ORDER BY for auto-create, the Write function automatically strips the Nullable(…) wrapper from those columns. Make sure the rows you send always populate those fields — missing values for sort-key columns will fail.

6. Engine Clause

Auto-created tables always use ENGINE = MergeTree with the operator-supplied ORDER BY (or tuple() if unspecified). To use ReplacingMergeTree, SummingMergeTree, a Distributed table, or any other engine, create the table manually via the Execute function first and leave Create Table If Not Exists disabled.

7. Partial Failure

Each batch executes independently — there is no transaction wrapping all batches. If batch 3 of 5 fails, batches 1 and 2 have already been committed. The error response still includes rowsInserted showing how many rows succeeded before the failure.

Response Metadata

On success, the response includes:

FieldDescription
rowsInsertedTotal number of rows successfully inserted
matchedColumnsColumn names that had matching data fields
skippedFieldsData fields that didn't match any table column (if any)
tableCreatedtrue if the table was created during this call
batchSizeBatch size used
totalRowsTotal number of input rows

Using Parameters

MaestroHub detects the ((parameterName)) syntax and exposes each parameter for validation and runtime binding for Query and Execute functions.

ConfigurationDescriptionExample
TypeEnforce data types for incoming valuesstring, number, boolean, date, array
RequiredMark parameters as mandatory or optionalRequired / Optional
Default ValueFallback value applied when callers omit a parameternow(), 0, 'active'
DescriptionAdd guidance for downstream users"Start date for the report"

Pipeline Integration

Use the ClickHouse functions you create here as nodes inside the Pipeline Designer to integrate your ClickHouse warehouse with the rest of your operations stack. Drag the appropriate node type onto the canvas, bind its parameters to upstream outputs or constants, and configure connection-level options without leaving the designer.

Each function type maps to a dedicated pipeline node — Query, Execute, and Write — so you can clearly separate read, write, and bulk-loading operations in your flows.

For node-level configuration details — input shape, output shape, and per-op behavior — see the ClickHouse Nodes reference.

Common Use Cases

Query: Hourly Sensor Rollups

Scenario: Generate per-sensor hourly averages over the last day for a real-time dashboard.

SELECT
sensor_id,
toStartOfHour(timestamp) AS hour,
round(avg(temperature), 2) AS avg_temp,
round(avg(humidity), 2) AS avg_humidity,
count() AS sample_count
FROM sensors
WHERE timestamp > now() - INTERVAL 24 HOUR
AND sensor_id IN ((sensorIds))
GROUP BY sensor_id, hour
ORDER BY sensor_id, hour;

Pipeline Integration: Feed the result into a dashboard widget, an alert node, or a downstream transform that joins with metadata.


Query: Latest Reading per Device with Join

Scenario: Combine the latest sensor reading with static device metadata.

SELECT
s.sensor_id,
m.model,
m.site,
argMax(s.temperature, s.timestamp) AS latest_temp,
max(s.timestamp) AS latest_ts
FROM sensors s
INNER JOIN sensor_metadata m USING (sensor_id)
GROUP BY s.sensor_id, m.model, m.site;

Pipeline Integration: Use the result to populate a device-status table or feed a state-change detector.


Execute: Materialise a Daily Rollup

Scenario: Persist an aggregated daily table from raw events, run once per day.

INSERT INTO events_daily
SELECT
toDate(timestamp) AS day,
event_type,
country,
count() AS events,
uniq(session_id) AS sessions
FROM events
WHERE timestamp >= ((startDate)) AND timestamp < ((endDate))
GROUP BY day, event_type, country;

Pipeline Integration: Schedule with a Cron Trigger node; the result table powers BI dashboards without rescanning raw events.


Execute: Maintenance — Force Merge

Scenario: Trigger a background merge before a heavy reporting window for tables with frequent inserts.

OPTIMIZE TABLE metrics_daily FINAL;

Pipeline Integration: Run via a manual trigger or schedule it ahead of a BI refresh job.


Write: Bulk Loading Telemetry

Scenario: Land high-throughput sensor data from MQTT/OPC UA collectors into a MergeTree table.

Configure a Write function with:

  • Table Name: sensors
  • Create Table If Not Exists: enabled
  • ORDER BY: sensor_id, timestamp
  • Batch Size: 1000

Connect it after data-collection nodes (MQTT, OPC UA, Modbus, etc.) in your pipeline. The Write function automatically maps incoming fields (sensor_id, timestamp, temperature, humidity, tags, metadata) to columns. Native Map(String, String) and Array(String) columns are bound directly without JSON encoding — a metadata field of {"unit": "celsius"} lands as a queryable Map, not a JSON-string blob.