Skip to main content
Version: 2.0

Troubleshooting Guide

This guide walks you through the monitoring and diagnostic tools available in MaestroHub. Whether you need to verify a new connection, investigate a pipeline failure, or track system health over time, the sections below explain where to look and what each page provides.

Overview Dashboard

Navigation: Home

The Overview Dashboard is the first screen you see after logging in. It provides a real-time snapshot of three core subsystems in a single view: Connection Health, Pipeline Metrics, and UNS Status.

Overview dashboard showing Connection Health, Pipeline Metrics, and UNS Status sections

The Overview Dashboard with Connection Health, Pipeline Metrics, and UNS Status at a glance.

Connection Health

MetricDescription
ConnectedNumber of active connections out of total (e.g. 4/10).
Current IssuesConnections that are not in a healthy state and need attention.
Failures (24h)Total failure events in the last 24 hours, with state change count.
ActivityTotal state changes across all connections in the last 24 hours.

An Issues Detected badge appears in the header when one or more connections require attention. Click the refresh button to update the data manually.

Pipeline Metrics

MetricDescription
Active PipelinesNumber of pipelines with recent execution activity.
Total ExecutionsCumulative execution count with throughput (runs/sec).
Success RatePercentage of successful executions, with success and failure counts.
Current LoadNumber of executions currently in progress.

A time indicator shows how long the system has been collecting metrics (e.g. "Since: 3m 14s ago") and an All-time toggle lets you view lifetime statistics.

UNS Status

MetricDescription
BrokerHealth status and type of the MQTT broker (e.g. external).
StorageHealth status and backend (e.g. timescale).
TopicsTotal topic count with active topic count.
UptimeTime elapsed since the last UNS service restart.

A Healthy badge in the header confirms that the UNS subsystem is operating normally.


System Logs

Navigation: System Management → Logs

Overview

The System Logs page gives you centralized access to log records from all MaestroHub backend services. Use it to verify operations, diagnose errors, trace execution flows, and monitor service health.

System Logs interface with search, level filters, and log results

System Logs viewer with time range, search, level filter chips, and paginated results.

Time Range

Every log query requires a time range. Presets are available for quick selection, or you can define a custom window.

PresetWhen to Use
Last 15 minutesQuick check for recent activity or to verify an operation you just performed.
Last hourStandard troubleshooting starting point when you are not sure when the issue occurred.
Last 6 hoursInvestigating intermittent or recurring problems that do not appear in shorter windows.
Last 24 hoursFull-day analysis for pattern recognition or investigating issues reported hours ago.
Custom rangeSpecify exact start and end date/time for precise investigation.
Time range presets and custom range selector

Time range presets and custom range picker for filtering logs.

The custom range picker includes a calendar for date selection, hour/minute/second fields, quick-set buttons (00:00:00, 12:00:00, 23:59:59), and timezone display with UTC offset.

The search bar searches across all log message content. Type a keyword or phrase and press Enter. Search is case-insensitive and works together with level filters and the selected time range — all criteria apply simultaneously. Clear the search with the X button to reset results.

Level Filter Chips

Below the search bar, clickable level chips filter logs by severity. Multiple levels can be selected at the same time. When no level is selected, all levels are shown.

Log level filter chips

Level chips: PANIC, FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.

LevelDescriptionWhen to Focus
PANICSystem-wide failure that may cause the service to become unresponsive.Immediate investigation. Rare but critical.
FATALCritical failure that stops a service component.Immediate investigation. Check whether the affected function is required for your workflow.
ERROROperation failures — failed connections, execution errors, timeouts.Primary troubleshooting level. Start here when diagnosing issues.
WARNPotential problems — retry attempts, deprecation warnings, approaching limits.Monitor regularly. Warnings often precede errors.
INFONormal operations — successful connections, completed executions, status updates.Use to confirm expected behaviour.
DEBUGDetailed internal diagnostics — request/response payloads, intermediate steps.Deep troubleshooting when ERROR and WARN lack context.
TRACEStep-by-step execution traces at the lowest level.Specific deep investigations only. Very high volume.

Log Table

ColumnDescription
TimestampWhen the event occurred, in MMM DD, HH:mm:ss.SSS format. Sortable.
LevelSeverity badge with color coding. Sortable.
MessagePrimary log content describing what happened.

Below the message, log entries may display additional context tags:

TagPurpose
connection_id / connection_nameIdentify the connection involved.
pipeline_id / pipeline_nameIdentify the pipeline involved.
execution_idTrace a single pipeline execution end-to-end.
node_id / node_nameIdentify the specific node that produced the log.
durationHow long the operation took.
errorDetailed error description or stack trace.

Pagination

SettingValue
Page size options50, 100 (default), 250, 500 records per page
Maximum result window10,000 records — when reached, an orange "10K limit reached" badge appears and further pagination is disabled
Data freshnessLogs are considered fresh for 10 seconds; after that, interacting with the page triggers a background refresh

If you hit the 10,000 record limit, narrow the time range, add level filters, or use specific search terms to reduce the result set.


Pipeline Troubleshooting

MaestroHub provides several tools for monitoring and debugging pipelines directly from the Pipeline Designer canvas: an Execution History sidebar, an Execution Replay mode, a Pipeline Metrics panel, and detailed execution dialogs.

Execution History Sidebar

Navigation: Pipeline Designer → History button (toolbar)

Click History in the pipeline toolbar to open a sidebar on the right showing all executions for the current pipeline. The sidebar auto-refreshes every 10 seconds with a visible countdown timer.

Pipeline execution history sidebar with completed runs

Execution History sidebar showing recent pipeline runs with status, trigger type, duration, and node counts.

Filters: Use the status tabs (All, Running, Completed, Failed) and the time range dropdown (default: last 24 hours) to narrow the list.

Each execution row shows:

  • Status indicator — colored stripe on the left (green = Completed, red = Failed, blue = Running, yellow = Partial, gray = Queued, orange = Cancelled)
  • Time — when the execution started, with full date in tooltip
  • Trigger type — Manual, Webhook, or Schedule (shown as icon)
  • Duration — total execution time; "Running..." animation for active executions
  • Node statisticsX/Y nodes completed, with failed count in red if any
  • Error preview — first 60 characters of the error message for failed executions

Click an execution to select it, then use View on Canvas to enter Execution Replay mode. The sidebar loads 30 executions by default; click Load More to fetch additional entries.

Execution Replay

Select an execution from the sidebar and click View on Canvas to enter Execution Replay mode. This overlays the historical execution results onto the pipeline canvas so you can visually inspect what happened.

Execution Replay mode with node inspector and replay panel

Execution Replay showing node input/output data on the left and the node list on the right.

Replay header shows the Execution ID, status badge, duration, start time, pipeline version, snapshot mode indicator, and node statistics (success/failure counts). Click Exit Replay to return to normal editing mode.

Replay Node List on the right lists all nodes in the execution with their status and duration. Sort by Order (default), Fastest, or Slowest. Click a node to open its inspector panel.

Node Inspector shows:

  • Overview — status, duration, and timing
  • Input Data — the data the node received, including variables used and source nodes
  • Output Data — the data the node produced

If the execution ran on an older pipeline version, a yellow warning banner indicates how many nodes were added or removed since then.

Shareable URL — when you enter Replay mode, the URL updates with ?execution={id}. Share this URL and the recipient will open the same execution in Replay mode.

Pipeline Metrics Panel

Navigation: Pipeline Designer → Metrics button (toolbar)

Click Metrics in the toolbar to open a metrics panel showing real-time performance statistics for the current pipeline. The panel auto-refreshes every 5 seconds.

Pipeline metrics panel showing success rate, run counts, and percentile timings

Pipeline Metrics panel with success rate, run counts, average time, and P50/P95/P99 latencies.

MetricDescription
Success RatePercentage of successful executions (green ≥95%, yellow ≥80%, red below 80%).
Success / FailedCounts of successful and failed executions.
Total RunsTotal number of executions since the system started.
Avg TimeAverage execution duration.
P50Median execution time (50th percentile).
P9595th percentile execution time.
P9999th percentile execution time.

All Executions View

Navigation: Orchestrate → Execution History

The All Executions page shows a global table of all pipeline executions across the entire system. Use this when you need to search across pipelines or do not know which pipeline produced a specific execution.

All Executions table with status, pipeline, timing, and trigger columns

Global Execution History table with filters, search, and sortable columns.

ColumnDescription
Execution IDUnique UUID with a copy button. Use this to search in System Logs.
PipelineName of the pipeline that ran. Click to filter by that pipeline.
StatusColor-coded badge — Completed, Failed, Running, Pending, Cancelled, Paused, or Partial.
Started / EndedTimestamps for execution start and end.
DurationTotal time from start to finish. Sort to find the slowest runs.
Triggered ByHow the execution was started — Schedule, Manual, Webhook, or API.

Actions: View on Canvas opens the execution in Replay mode on the Pipeline Designer. Rerun launches the same pipeline configuration again. Cancel stops a running execution.

Filters: Status dropdown, Pipeline dropdown, and Execution ID search. Click column headers to sort.

Execution History Retention

Detailed logs for executions older than 2 hours are removed automatically. You can still see high-level information (status, duration, metadata) for older executions, but node-level input/output data will no longer be available.

Pipeline Advanced Settings

The Edit Pipeline → Advanced tab contains settings that affect troubleshooting and execution behaviour.

Edit Pipeline Advanced tab with Execution Mode, Priority, and Execution History Detail

Advanced pipeline settings: Execution Mode, Priority, and Execution History Detail level.

Execution ModeParallel (default, recommended): independent nodes run concurrently. Sequential: nodes run one after another in order.

PriorityHigh: critical pipelines execute first. Normal (default): standard priority. Low: background pipelines that can wait.

Execution History Detail — controls how much data is captured during execution:

LevelWhat is RecordedUse Case
MinimalExecution status only (started/completed/failed).Production pipelines where you only need to know if it ran. Fastest performance, least storage.
Standard (default)Status + per-node results (success/fail, duration).Recommended for debugging. Shows which node failed without storing full payloads.
FullEverything including node input/output data.Complete debug capability. Uses more storage. Recommended for development and testing.
note

Changing this setting only affects new executions. Previous executions retain whatever detail level was active when they ran.

Error Handling tab — configure default timeout (1–600s, default 30s), retry on timeout, retry on failure, and error behaviour (Stop Pipeline or Continue Execution). When retry is enabled, additional settings control max attempts (1–10), initial delay, max delay, backoff multiplier, and jitter factor.


Connection Status

Navigation: Connect → Connections

Overview

The Status column in the connections table shows real-time connectivity and health information for every configured connection. The system continuously monitors connections through a runtime supervisor that tracks state transitions, manages automatic reconnection, and provides circuit-breaker protection. Status data refreshes every 10 seconds.

Connection status column showing Connected and Failed states with error messages

Connection status column with mixed Connected and Failed states, including inline error messages and action buttons.

Connection States

StateIndicatorDescription
ConnectedGreen dotActive connection established and operational. Shows "Connected X ago" with elapsed time.
IdleGray dotConnection registered but not started yet. Initial state after creating a new connection.
ConnectingYellow pulsing dotInitial connection attempt in progress. If it persists beyond 30 seconds, investigate network or configuration issues.
ReconnectingBlue pulsing dotConnection lost, automatic reconnection in progress. Shows retry count (e.g. "Retry #3") and the triggering error message. Uses exponential backoff.
UnhealthyOrange dotConnected but failing health checks. Pipelines may still execute with degraded performance.
SuspendedAmber dotAutomatic reconnection paused — operator intervention required. A Resume button appears inline.
FailedRed dotAll reconnection attempts exhausted. Shows the final error message. A Connect button appears inline.
DrainingPurple dotGraceful shutdown in progress. Active operations are completing before the connection closes.
DisconnectedGray dotNo active connection. Connection pool is empty.

Health Status

HealthMeaning
HealthyFully operational, passing all health checks.
DegradedOperational but experiencing performance or intermittent issues.
UnhealthyNot operational or has critical issues.
UnknownCannot be determined — connection is in a transitional state (Idle, Connecting, Reconnecting).

When a connection is in a problematic state, the error message appears directly below the status indicator. Common patterns: connection refused (service not running), timeout (network latency or firewall), authentication failure (wrong credentials), TLS/SSL errors (certificate issues), DNS resolution failure (bad hostname).

Circuit Breaker

Each connection is protected by a circuit breaker that prevents cascading failures.

StateMeaning
ClosedNormal operation. Requests pass through. Failures are counted in the background.
OpenToo many consecutive failures. Requests are blocked. An "On Hold" badge appears. Shows failure count, opened time, and next retry time.
Half-OpenAfter cooldown, a limited number of test requests are allowed. Success returns to Closed; failure returns to Open.

Use Reset Circuit Breaker from the connection's dropdown menu to force it back to Closed immediately after fixing the underlying issue.

Connection Actions

ActionAvailable WhenWhat It Does
ConnectFailed, Idle, DisconnectedStarts a fresh connection attempt.
Resume ReconnectionSuspendedResumes paused automatic reconnection.
Force ReconnectFailed, Suspended, Idle, DisconnectedFull stop and restart of the connection. Use after configuration changes.
Reset Circuit BreakerCircuit breaker OpenResets to Closed, allowing requests immediately.
Test ConnectionAll statesOne-time connectivity test without affecting the running connection.
View Status HistoryAll statesOpens the Connection Health page filtered to this connection.

Connection Health Dashboard

Navigation: Connect → Connection Health

The Connection Health dashboard provides a historical view of connection state changes across the entire deployment. While the connections table shows current status, this dashboard reveals patterns over time.

Connection Health dashboard with statistics, state changes, and failure rankings

Connection Health dashboard showing current issues, connectivity ratio, failures, state changes, and connections with failures.

Time range options: Last 10 minutes, 1 hour, 6 hours, 24 hours (default), 3 days, or 7 days (maximum).

Auto-refresh: Summary statistics refresh every 30 seconds, state changes every 15 seconds, runtime status every 10 seconds. Toggle off when analysing a specific point in time.

Statistics cards:

MetricDescription
Current IssuesConnections not in Connected or Idle state right now.
ConnectedRatio of connected to total connections with percentage.
FailuresTotal failure events in the selected time period.
State ChangesTotal state transitions. A high number relative to connection count indicates instability.

Recent State Changes (left panel) shows the most recent 50 state transitions with connection name, connector logo, state transition badges (e.g. Connected → Failed), relative timestamp, and error message when applicable.

Connections with Failures (right panel) ranks connections by failure count for the selected period. Columns: connection name with live status dot, failure count, uptime percentage (green ≥99%, yellow 95–98.9%, red below 95%), and total state changes.

Per-Connection Health Tab

Navigation: Connect → Select Connection → Edit → Health tab

Each connection has a dedicated Health tab with detailed historical statistics and a visual timeline.

Per-connection Health tab showing uptime, failures, status timeline, and state changes

Health tab for an MQTT connection with uptime, failures, recovery stats, status timeline, and state change history.

Statistics: Uptime percentage, failure count, average recovery time, and current live status — all for the selected time range.

Status Timeline: A color-coded bar spanning the selected period shows exactly when the connection was in each state (green = Connected, yellow = Connecting, orange = Reconnecting, gray = Disconnected/Idle, red = Failed).

State Change Events: Scrollable list of up to 50 events showing state transitions with timestamps, error messages, and duration since the previous change.

Connection Limits

LimitValue
Maximum history range7 days
Recent state changes display50 events
Aggregate events (system dashboard)500 events
Top unstable connections50 connections
Connection profiles in aggregate view100 profiles
History states tracked6 of 9 (Connected, Connecting, Reconnecting, Disconnected, Failed, Idle)
note

The live status column supports all 9 runtime states, but historical tracking records only 6. States like Suspended appear as Disconnected in the timeline. Use live status for current state and history for trend analysis.