Version: 2.7-dev

Topic Search

MaestroHub ships a lexical topic search subsystem that makes the UNS queryable by natural-language phrases, industrial abbreviations, and cryptic tag identifiers. The search layer is purely additive: when disabled (the default), no index table rows exist and no new HTTP or MCP surface is exposed.

Search is the discovery layer that pairs with the existing data tools — use search to locate the right topic, then fetch values via MCP query_topic_data or the UNS HTTP endpoints.

What it covers

Hybrid retrieval. Every query runs through both a lexical index (Postgres tsvector / SQLite FTS5 unicode61) and a fuzzy trigram index (pg_trgm / FTS5 trigram), merged with Reciprocal Rank Fusion so both industrial tags (TT_101) and natural-language phrases ("mixer temperature on line 2") resolve well.
ISA-95 aware document builder. Each topic is serialized to a stable content string that includes the full slash-path, split segments, optional schema description, unit, aliases, and field-level schemas. This preserves the hierarchy cues a tokenizer needs. A leading version segment that matches <ns>v<major>.<minor> (for example mHv1.0, spBv1.0) is captured as Version: so the downstream ISA-95 labels (Enterprise/Site/Area/...) align with the real hierarchy rather than being offset by one. The version is still present in Path: and Segments: so exact matches and pathGlob filters continue to work against the raw topic name.
Query-side abbreviation expansion. Common industrial short forms (temp → temperature, press → pressure, rpm → speed rotation, etc.) are expanded at query time only — the index itself is identity-safe, so editing the synonym list does not require a reindex.
Event-driven indexing. A bounded worker pool subscribes to uns.topic.created and uns.topic.deleted events, debounced per topic so rapid updates collapse to a single op. The document is rebuilt by re-reading the topic via the canonical repository, never from the event payload.
Backfill and rebuild. An auto-backfill runs on first boot after search is enabled; operators can trigger a full rebuild at any time via POST /api/v1/uns/search/rebuild.
Multi-tenant safety. Every query is gated by org_id at the SQL layer — never through prompt or request-body fields.

Enabling the feature

Search is controlled by a single block under modules.uns.defaults.search in config.yaml. Nothing changes in the deployed artifact until enabled: true is set.

modules:
  uns:
    defaults:
      search:
        enabled: false              # opt-in; ships disabled
        workers: 2                  # async indexer worker count
        debounceMs: 250             # per-topic debounce window
        backfillRatePerSec: 500     # caps backfill throughput
        abbreviations:              # optional; merged over built-in defaults
          bldg: [building]
          temp: [temperature]       # override disables overlap with default
          tag: []                   # empty list disables a default

On first boot after enabled: true, the module checks whether topic_search_documents has any rows for the org. When empty, an auto-backfill runs at the configured rate and logs progress. Backfill is idempotent (UPSERT-based) and interruption-safe — restarting mid-run resumes.

MCP tools

Two tools are registered when search is enabled:

uns_semantic_find_topics — hybrid-search discovery entrypoint.
```
{
  "query": "mixer temperature on line 2",
  "path_glob": "%/Berlin/%/Line2/%",
  "limit": 10
}
```
Returns ranked {topic_id, topic_path, score} candidates. Use these IDs as input to uns_describe_topic or query_topic_data.
uns_describe_topic — rich metadata for a single topic.
```
{ "topic_id": "t-berlin-line2-mixer-temp" }
```
Returns path, ISA-95 segments, schema, retain flag, parent, and immediate children.

Both tools honor the uns_topic:read scope and resolve org_id from the authenticated MCP context.

Admin HTTP endpoint

Operators can force a full reindex (truncate + rebuild) for a single org:

POST /api/v1/uns/search/rebuild

Required permission: uns_topic:operate. Returns {org_id, written, elapsed, timestamp}. Expect ~200 seconds for 100k topics at the default rate; tune backfillRatePerSec if your environment supports more.

Observability

All search metrics carry the prefix uns_search_ and are exported through the same OpenTelemetry pipeline as the rest of the UNS module:

Metric	Type	Description
`uns_search_index_events_total{kind}`	counter	Successful indexing events (`upsert`, `delete`)
`uns_search_query_duration_seconds`	histogram	Query end-to-end latency
`uns_search_query_results_total`	histogram	Hits returned per query
`uns_search_backfill_written_total`	counter	Cumulative topics written by backfill
`uns_search_index_queue_depth`	gauge	Current async-indexing backlog

Operational runbook

Search returns stale results after a rename. The ConfigChanged event intentionally does not carry org_id in Phase 1, so schema-only edits are reconciled by the next create/delete cycle or by a triggered rebuild. Run the admin rebuild endpoint to force consistency.
Backfill is slow on a large deployment. The default backfillRatePerSec: 500 means 200 seconds per 100k topics. Raise it (for example, 2000) when the underlying database has headroom; log lines show progress as rows are flushed.
No hits for a cryptic tag. Confirm the tag appears in the indexed content — topics are indexed using their path and schema description only. If your tag lives in a separate identifier field, extend the document builder (application/search/doc_builder.go) and rebuild.
Managed Postgres refuses CREATE EXTENSION pg_trgm. Run CREATE EXTENSION pg_trgm; as a superuser once before the first deployment. The migration is idempotent.
Killing the feature. Flip enabled: false and restart. The tables remain in place (cheap, avoids destructive rollback); nothing else changes.

Running integration tests locally

The Postgres adapter and scale benchmarks use testcontainers-go to spin up a throwaway postgres:15-alpine container per test. On Windows + Docker Desktop, testcontainers-go can misdetect the socket and panic with "rootless Docker is not supported on Windows". The fix is a single env var pointing at Docker Desktop's named pipe:

$env:DOCKER_HOST = "npipe:////./pipe/docker_engine"
go test -tags=integration ./apps/backend/modules/uns/...

Linux and macOS developers don't need to set anything. CI already runs with a native Docker socket.

Tuning notes (Phase 1)

Phase 1 targets MVP lexical discovery. The following levers are intentionally conservative:

Top-5 hit rate target is 75% — internal eval runs typically measure 95%+ on well-curated ISA-95 topics.
Query p95 budgets (40 ms Postgres, 60 ms SQLite at 100k topics) assume the default worker count and no concurrent heavy writes.
Vector embeddings and OEE-style aggregation tools are Phase 2 work; the search table schema and RRF helper are already shaped to accept a dense ranking alongside lexical + trigram.

What it covers​

Enabling the feature​

MCP tools​

Admin HTTP endpoint​

Observability​

Operational runbook​

Running integration tests locally​

Tuning notes (Phase 1)​