openspec change · fullsync-compact-response

Compact tool responses for the full-sync MCP path.

Three full-sync tools currently flood the MCP client's context window with internal queue bookkeeping. This change replaces a single field — surgically — so responses shrink by two orders of magnitude. Nothing on disk moves.

branch  feature/fullsync-token-reduction base  c92f79a capability  fullsync-response (new) impact  tool response shape only
01 · the problem

Six unbounded arrays ride along on every call.

Each of roxtra_index_full_sync, roxtra_index_full_sync_status, and roxtra_index_manage (action status) currently embeds the entire in-memory progress object. That object carries six internal arrays that grow with the size of the document tree being indexed. On a mid-run roXtra instance, the folder queue alone holds thousands of entries; discovered-file IDs an order of magnitude larger.

per-call payload
100–500
KB of JSON, today
target after change
< 2
KB of JSON, fixed shape
arrays removed
6
queue + ID arrays
summary keys
16
whitelist, no leaks
before
~500 KB
after
< 2 KB

The six arrays leaving the tool response

Each grows with tree size. None is actionable for the language model. All remain in the persisted sync_progress.json on disk — only the wire shape changes.

folderQueue visitedFolderIds discoveredFileIds processedContentFileIds failedFileIds unsupportedFileIds
02 · the shape change

One helper. Two return sites. A renamed field.

Add a pure read-transform createFullSyncSummary(progress) alongside the existing createFullSyncStatus, then swap the progress field for progressSummary at the two return sites. The rename is deliberate — readers can no longer assume the same shape as the persisted progress object, and the change becomes visible at every call site that still depended on the old shape (none in src/).

1 · helper

A pure projection. Reads top-level metadata and the flat counters sub-object. Lives next to createFullSyncStatus.

2 · field rename

Every response site swaps the key from progress to progressSummary. The new name flags the new shape.

3 · counters flatten

Nine counter fields surface at the summary's top level, not nested under .counters. One less level for the model to navigate.

03 · whitelist

Exactly sixteen keys. No spread, no surprises.

The summary keys are fixed at construction. The spec's first scenario asserts equality — extras would fail the gate. Counters are reported bit-for-bit from the persisted object; missing counter keys coalesce to 0.

Top-level metadata · 7 keys

verbatim from progress
  • statusenum
  • runIdstring
  • startedAtiso-8601
  • updatedAtiso-8601
  • completedAtiso-8601 | null
  • rootFolderIdstring
  • lastErrorstring | null

Counters · 9 keys

from progress.counters · flattened
  • foldersVisitedint
  • foldersPendingint
  • documentsDiscoveredint
  • documentsIndexedint
  • contentSyncedint
  • contentFailedint
  • contentUnsupportedint
  • chromaUpsertedMetadataRecordsint
  • chromaUpsertedChunksint

Peer fields, unchanged

The response continues to expose documentCount, indexDir, progressPath, and nextRecommendedCall at the same nesting level, with the same names and semantics. No new keys; no removed keys.

Null-progress code path

createFullSyncSummary(null) returns null — preserving the existing status-on-empty-index response shape for roxtra_index_manage action status against a fresh index.

04 · before / after

What the MCP client actually receives.

Illustrative roxtra_index_full_sync_status response, mid-run, against a small folder tree. Real payloads can balloon by two orders of magnitude as the queue and ID arrays grow.

before · today ~180 KB
{
  "progress": {
    "status": "running",
    "runId": "7f3...",
    "startedAt": "2026-05-21T08:14:02Z",
    "updatedAt": "2026-05-21T08:21:18Z",
    "completedAt": null,
    "rootFolderId": "42",
    "lastError": null,
    "counters": { /* 9 keys */ },
    "folderQueue": ["f-12", "f-13", /* 3,200 more */],
    "visitedFolderIds": ["f-1", /* 4,800 more */],
    "discoveredFileIds": ["d-9", /* 41,000 more */],
    "processedContentFileIds": [/* ... */],
    "failedFileIds": [/* ... */],
    "unsupportedFileIds": [/* ... */]
  },
  "documentCount": 3211,
  "indexDir": "/.../indices/roxtra-default",
  "progressPath": "/.../sync_progress.json",
  "nextRecommendedCall": { /* resume args */ }
}
after · this change < 2 KB
{
  "progressSummary": {
    "status": "running",
    "runId": "7f3...",
    "startedAt": "2026-05-21T08:14:02Z",
    "updatedAt": "2026-05-21T08:21:18Z",
    "completedAt": null,
    "rootFolderId": "42",
    "lastError": null,
    "foldersVisited": 4800,
    "foldersPending": 3200,
    "documentsDiscovered": 41210,
    "documentsIndexed": 12044,
    "contentSynced": 11891,
    "contentFailed": 12,
    "contentUnsupported": 141,
    "chromaUpsertedMetadataRecords": 12044,
    "chromaUpsertedChunks": 38211
  },
  "documentCount": 3211,
  "indexDir": "/.../indices/roxtra-default",
  "progressPath": "/.../sync_progress.json",
  "nextRecommendedCall": { /* resume args */ }
}

Counter values are illustrative. Field names and shape are exactly per the spec.

05 · affected files

Surgical scope. No broad refactor.

One helper added; two return sites edited. Downstream wrappers pick up the new shape transitively. The persistence layer and resume code paths are untouched by name.

src/server/index_store/IndexStoreCore.ts
Add createFullSyncSummary(progress) helper. Swap progressprogressSummary on createFullSyncStatus's return (lines 157–177).
helper + modified
src/server/index_store/FullSyncOperations.ts
Swap progressprogressSummary on runFullSyncOperation's return (lines 298–321).
modified
src/server/index_store/IndexManageFlow.ts
Picks up the new shape transitively via createFullSyncStatus (line 100). No edits needed.
transitive
src/server/index_store/IndexDebugStatusOperations.ts
Same — picks up via createFullSyncStatus. No edits.
transitive
src/server/RoxtraMcpServer.ts
Wrapper at lines 982–987 delegates to createFullSyncStatusFromStore. No edits.
untouched
scripts/dashboard-poc/dashboard.ts
Reads raw arrays from sync_progress.json via file watcher (lines 53–58, 265–270). Not a tool-response consumer.
unaffected
06 · what stays put

The persistence layer and resume loop are load-bearing — and they don't move.

The persisted file is the deliberately complete source of truth for the resume loop. Reviewers should confirm that nothing in the code path below changes shape.

in-memory
FullSyncProgress
processFullSyncFolderBatch · processFullSyncContentBatch

Continue to read and mutate the in-memory progress object, including all six internal arrays. Untouched.

disk write
writeFullSyncProgress(progress)

Receives the full in-memory progress, not the summary. sync_progress.json schema is identical to the pre-change shape.

resume read
readFullSyncProgressIfExists · prepareFullSyncProgress

Return the full persisted shape. The resume code path consumes the persisted shape, not the summary. Users with an in-progress sync continue without resetting their index.

counters refresh
refreshFullSyncCounters

Unchanged. Counter values exposed on progressSummary are reported bit-for-bit; the helper performs no recompute, round, clip, or recalculation.

batch sizing
folderBatchSize · metadataBatchSize · contentBatchSize · maxRuntimeMs · maxContentChars · contentChunkSize

No change to defaults or semantics. nextRecommendedCall.arguments keys and values are reconstructed from the same inputs as before.

network · auth
traversal · Chroma writes · authentication

Untouched. The single behavioural change is the shape of the tool response object returned to MCP clients.

07 · decision

No includeRawProgress input flag. Here's why.

rejected · documented in spec requirement 5

Raw arrays remain available off-band via progressPath.

A repo-wide grep finds no consumer of any raw array off the tool response. The only raw-array consumer is the dashboard POC, which reads sync_progress.json directly via a file watcher — completely independent of MCP tool calls. The persisted file is the documented debug-only escape hatch, and its absolute path is exposed as progressPath on every response.

  • No new input schema fields on any of the three tools.
  • No way for a single call to opt back into the raw shape.
  • If a future automation legitimately needs raw arrays inline, the flag can be added then — not pre-emptively.
08 · out of scope

Three follow-ups intentionally deferred.

Each would be welcome — none belongs in this change. Listing them here so reviewers don't ask "why not also …" and so future work can pick them up cleanly.

CLI status pretty-printer

A roxtra-mcp sync status command that pretty-prints sync_progress.json from disk — high-fidelity progress for humans without re-bloating tool responses.

Derived phase / progress ratio

A computed phase field (discovery | drain | content | complete) or a metadataPercent ratio. Useful UX signal — but introduces interpretation rules best validated against live runs first.

Debug includeRawProgress

A response-shape opt-out for a single call. Add later if a future automation truly needs raw arrays inline. Today there is no consumer.

09 · validation

What gets checked before this lands.

Unit checks · helper

  • Counters round-trip 1:1 from a fixture progress.counters.
  • Output omits every ID array across idle / running / paused / complete / failed fixtures.
  • Output omits folderQueue even when input contains thousands of entries.
  • createFullSyncSummary(null) returns null.
  • Summary keys are exactly the sixteen documented; no spread leaks.
  • Counter coalesces to 0 when missing from persisted counters.

Integration · gates · live

  • pnpm typecheck clean.
  • pnpm gate:parity (typecheck + build + validateTools + validateToolsDist) clean.
  • Bounded roxtra_index_full_sync against the live verification instance — end-to-end.
  • roxtra_index_full_sync_status response contains progressSummary; lacks progress, folderQueue, *FileIds.
  • nextRecommendedCall non-null on incomplete responses.
  • On-disk sync_progress.json still contains the full raw arrays.
  • Dashboard POC continues to render correctly off the unchanged file.
  • Before/after response sizes recorded — target ≥ 1 order of magnitude reduction.