Skip to main content
Version: 2.4-dev

Topic Search

MaestroHub ships a lexical topic search subsystem that makes the UNS queryable by natural-language phrases, industrial abbreviations, and cryptic tag identifiers. The search layer is purely additive: when disabled (the default), no index table rows exist and no new HTTP or MCP surface is exposed.

Search is the discovery layer that pairs with the existing data tools — use search to locate the right topic, then fetch values via MCP query_topic_data or the UNS HTTP endpoints.

What it covers

  • Hybrid retrieval. Every query runs through both a lexical index (Postgres tsvector / SQLite FTS5 unicode61) and a fuzzy trigram index (pg_trgm / FTS5 trigram), merged with Reciprocal Rank Fusion so both industrial tags (TT_101) and natural-language phrases ("mixer temperature on line 2") resolve well.
  • ISA-95 aware document builder. Each topic is serialized to a stable content string that includes the full slash-path, split segments, optional schema description, unit, aliases, and field-level schemas. This preserves the hierarchy cues a tokenizer needs. A leading version segment that matches <ns>v<major>.<minor> (for example mHv1.0, spBv1.0) is captured as Version: so the downstream ISA-95 labels (Enterprise/Site/Area/...) align with the real hierarchy rather than being offset by one. The version is still present in Path: and Segments: so exact matches and pathGlob filters continue to work against the raw topic name.
  • Query-side abbreviation expansion. Common industrial short forms (temp → temperature, press → pressure, rpm → speed rotation, etc.) are expanded at query time only — the index itself is identity-safe, so editing the synonym list does not require a reindex.
  • Event-driven indexing. A bounded worker pool subscribes to uns.topic.created and uns.topic.deleted events, debounced per topic so rapid updates collapse to a single op. The document is rebuilt by re-reading the topic via the canonical repository, never from the event payload.
  • Backfill and rebuild. An auto-backfill runs on first boot after search is enabled; operators can trigger a full rebuild at any time via POST /api/v1/uns/search/rebuild.
  • Multi-tenant safety. Every query is gated by org_id at the SQL layer — never through prompt or request-body fields.

Enabling the feature

Search is controlled by a single block under modules.uns.defaults.search in config.yaml. Nothing changes in the deployed artifact until enabled: true is set.

modules:
uns:
defaults:
search:
enabled: false # opt-in; ships disabled
workers: 2 # async indexer worker count
debounceMs: 250 # per-topic debounce window
backfillRatePerSec: 500 # caps backfill throughput
abbreviations: # optional; merged over built-in defaults
bldg: [building]
temp: [temperature] # override disables overlap with default
tag: [] # empty list disables a default

On first boot after enabled: true, the module checks whether topic_search_documents has any rows for the org. When empty, an auto-backfill runs at the configured rate and logs progress. Backfill is idempotent (UPSERT-based) and interruption-safe — restarting mid-run resumes.

MCP tools

Two tools are registered when search is enabled:

  • uns_semantic_find_topics — hybrid-search discovery entrypoint.

    {
    "query": "mixer temperature on line 2",
    "path_glob": "%/Berlin/%/Line2/%",
    "limit": 10
    }

    Returns ranked {topic_id, topic_path, score} candidates. Use these IDs as input to uns_describe_topic or query_topic_data.

  • uns_describe_topic — rich metadata for a single topic.

    { "topic_id": "t-berlin-line2-mixer-temp" }

    Returns path, ISA-95 segments, schema, retain flag, parent, and immediate children.

Both tools honor the uns_topic:read scope and resolve org_id from the authenticated MCP context.

Admin HTTP endpoint

Operators can force a full reindex (truncate + rebuild) for a single org:

POST /api/v1/uns/search/rebuild

Required permission: uns_topic:operate. Returns {org_id, written, elapsed, timestamp}. Expect ~200 seconds for 100k topics at the default rate; tune backfillRatePerSec if your environment supports more.

Observability

All search metrics carry the prefix uns_search_ and are exported through the same OpenTelemetry pipeline as the rest of the UNS module:

MetricTypeDescription
uns_search_index_events_total{kind}counterSuccessful indexing events (upsert, delete)
uns_search_query_duration_secondshistogramQuery end-to-end latency
uns_search_query_results_totalhistogramHits returned per query
uns_search_backfill_written_totalcounterCumulative topics written by backfill
uns_search_index_queue_depthgaugeCurrent async-indexing backlog

Operational runbook

  • Search returns stale results after a rename. The ConfigChanged event intentionally does not carry org_id in Phase 1, so schema-only edits are reconciled by the next create/delete cycle or by a triggered rebuild. Run the admin rebuild endpoint to force consistency.
  • Backfill is slow on a large deployment. The default backfillRatePerSec: 500 means 200 seconds per 100k topics. Raise it (for example, 2000) when the underlying database has headroom; log lines show progress as rows are flushed.
  • No hits for a cryptic tag. Confirm the tag appears in the indexed content — topics are indexed using their path and schema description only. If your tag lives in a separate identifier field, extend the document builder (application/search/doc_builder.go) and rebuild.
  • Managed Postgres refuses CREATE EXTENSION pg_trgm. Run CREATE EXTENSION pg_trgm; as a superuser once before the first deployment. The migration is idempotent.
  • Killing the feature. Flip enabled: false and restart. The tables remain in place (cheap, avoids destructive rollback); nothing else changes.

Running integration tests locally

The Postgres adapter and scale benchmarks use testcontainers-go to spin up a throwaway postgres:15-alpine container per test. On Windows + Docker Desktop, testcontainers-go can misdetect the socket and panic with "rootless Docker is not supported on Windows". The fix is a single env var pointing at Docker Desktop's named pipe:

$env:DOCKER_HOST = "npipe:////./pipe/docker_engine"
go test -tags=integration ./apps/backend/modules/uns/...

Linux and macOS developers don't need to set anything. CI already runs with a native Docker socket.

Tuning notes (Phase 1)

Phase 1 targets MVP lexical discovery. The following levers are intentionally conservative:

  • Top-5 hit rate target is 75% — internal eval runs typically measure 95%+ on well-curated ISA-95 topics.
  • Query p95 budgets (40 ms Postgres, 60 ms SQLite at 100k topics) assume the default worker count and no concurrent heavy writes.
  • Vector embeddings and OEE-style aggregation tools are Phase 2 work; the search table schema and RRF helper are already shaped to accept a dense ranking alongside lexical + trigram.