Topic Search
MaestroHub ships a lexical topic search subsystem that makes the UNS queryable by natural-language phrases, industrial abbreviations, and cryptic tag identifiers. The search layer is purely additive: when disabled (the default), no index table rows exist and no new HTTP or MCP surface is exposed.
Search is the discovery layer that pairs with the existing data tools — use search to locate the right topic, then fetch values via MCP query_topic_data or the UNS HTTP endpoints.
What it covers
- Hybrid retrieval. Every query runs through both a lexical index (Postgres
tsvector/ SQLite FTS5 unicode61) and a fuzzy trigram index (pg_trgm/ FTS5 trigram), merged with Reciprocal Rank Fusion so both industrial tags (TT_101) and natural-language phrases ("mixer temperature on line 2") resolve well. - ISA-95 aware document builder. Each topic is serialized to a stable content string that includes the full slash-path, split segments, optional schema description, unit, aliases, and field-level schemas. This preserves the hierarchy cues a tokenizer needs. A leading version segment that matches
<ns>v<major>.<minor>(for examplemHv1.0,spBv1.0) is captured asVersion:so the downstream ISA-95 labels (Enterprise/Site/Area/...) align with the real hierarchy rather than being offset by one. The version is still present inPath:andSegments:so exact matches andpathGlobfilters continue to work against the raw topic name. - Query-side abbreviation expansion. Common industrial short forms (
temp → temperature,press → pressure,rpm → speed rotation, etc.) are expanded at query time only — the index itself is identity-safe, so editing the synonym list does not require a reindex. - Event-driven indexing. A bounded worker pool subscribes to
uns.topic.createdanduns.topic.deletedevents, debounced per topic so rapid updates collapse to a single op. The document is rebuilt by re-reading the topic via the canonical repository, never from the event payload. - Backfill and rebuild. An auto-backfill runs on first boot after search is enabled; operators can trigger a full rebuild at any time via
POST /api/v1/uns/search/rebuild. - Multi-tenant safety. Every query is gated by
org_idat the SQL layer — never through prompt or request-body fields.
Enabling the feature
Search is controlled by a single block under modules.uns.defaults.search in config.yaml. Nothing changes in the deployed artifact until enabled: true is set.
modules:
uns:
defaults:
search:
enabled: false # opt-in; ships disabled
workers: 2 # async indexer worker count
debounceMs: 250 # per-topic debounce window
backfillRatePerSec: 500 # caps backfill throughput
abbreviations: # optional; merged over built-in defaults
bldg: [building]
temp: [temperature] # override disables overlap with default
tag: [] # empty list disables a default
On first boot after enabled: true, the module checks whether topic_search_documents has any rows for the org. When empty, an auto-backfill runs at the configured rate and logs progress. Backfill is idempotent (UPSERT-based) and interruption-safe — restarting mid-run resumes.
MCP tools
Two tools are registered when search is enabled:
-
uns_semantic_find_topics— hybrid-search discovery entrypoint.{
"query": "mixer temperature on line 2",
"path_glob": "%/Berlin/%/Line2/%",
"limit": 10
}Returns ranked
{topic_id, topic_path, score}candidates. Use these IDs as input touns_describe_topicorquery_topic_data. -
uns_describe_topic— rich metadata for a single topic.{ "topic_id": "t-berlin-line2-mixer-temp" }Returns path, ISA-95 segments, schema, retain flag, parent, and immediate children.
Both tools honor the uns_topic:read scope and resolve org_id from the authenticated MCP context.
Admin HTTP endpoint
Operators can force a full reindex (truncate + rebuild) for a single org:
POST /api/v1/uns/search/rebuild
Required permission: uns_topic:operate. Returns {org_id, written, elapsed, timestamp}. Expect ~200 seconds for 100k topics at the default rate; tune backfillRatePerSec if your environment supports more.
Observability
All search metrics carry the prefix uns_search_ and are exported through the same OpenTelemetry pipeline as the rest of the UNS module:
| Metric | Type | Description |
|---|---|---|
uns_search_index_events_total{kind} | counter | Successful indexing events (upsert, delete) |
uns_search_query_duration_seconds | histogram | Query end-to-end latency |
uns_search_query_results_total | histogram | Hits returned per query |
uns_search_backfill_written_total | counter | Cumulative topics written by backfill |
uns_search_index_queue_depth | gauge | Current async-indexing backlog |
Operational runbook
- Search returns stale results after a rename. The ConfigChanged event intentionally does not carry org_id in Phase 1, so schema-only edits are reconciled by the next create/delete cycle or by a triggered rebuild. Run the admin rebuild endpoint to force consistency.
- Backfill is slow on a large deployment. The default
backfillRatePerSec: 500means 200 seconds per 100k topics. Raise it (for example,2000) when the underlying database has headroom; log lines show progress as rows are flushed. - No hits for a cryptic tag. Confirm the tag appears in the indexed content — topics are indexed using their path and schema description only. If your tag lives in a separate identifier field, extend the document builder (
application/search/doc_builder.go) and rebuild. - Managed Postgres refuses
CREATE EXTENSION pg_trgm. RunCREATE EXTENSION pg_trgm;as a superuser once before the first deployment. The migration is idempotent. - Killing the feature. Flip
enabled: falseand restart. The tables remain in place (cheap, avoids destructive rollback); nothing else changes.
Running integration tests locally
The Postgres adapter and scale benchmarks use testcontainers-go to spin up a throwaway postgres:15-alpine container per test. On Windows + Docker Desktop, testcontainers-go can misdetect the socket and panic with "rootless Docker is not supported on Windows". The fix is a single env var pointing at Docker Desktop's named pipe:
$env:DOCKER_HOST = "npipe:////./pipe/docker_engine"
go test -tags=integration ./apps/backend/modules/uns/...
Linux and macOS developers don't need to set anything. CI already runs with a native Docker socket.
Tuning notes (Phase 1)
Phase 1 targets MVP lexical discovery. The following levers are intentionally conservative:
- Top-5 hit rate target is 75% — internal eval runs typically measure 95%+ on well-curated ISA-95 topics.
- Query p95 budgets (40 ms Postgres, 60 ms SQLite at 100k topics) assume the default worker count and no concurrent heavy writes.
- Vector embeddings and OEE-style aggregation tools are Phase 2 work; the search table schema and RRF helper are already shaped to accept a dense ranking alongside lexical + trigram.