Skip to content

Add fast memory search mode#188

Open
Amhitox wants to merge 1 commit into
XortexAI:mainfrom
Amhitox:bounty/xmem-163-search-fastpath
Open

Add fast memory search mode#188
Amhitox wants to merge 1 commit into
XortexAI:mainfrom
Amhitox:bounty/xmem-163-search-fastpath

Conversation

@Amhitox
Copy link
Copy Markdown

@Amhitox Amhitox commented May 20, 2026

This PR adds a fast search path for memory lookup without forcing the full agentic retrieval flow every time. The default behavior is raw search results, and callers can opt into answer synthesis with answer=true when they still want an LLM-generated response.

What changed:

  • expanded /v1/memory/search to return ranked hits from profile, temporal, summary, snippet, and code memory
  • kept raw search as the default path, with optional answer generation behind answer=true
  • added per-domain timing data and p50/p95/p99 latency snapshots for raw vs answer mode
  • added short-lived caching for profile catalogs and retrieval plans to avoid repeated planning work
  • covered the raw and answer search behavior with focused API tests

Tested with:

.venv/bin/python -m pytest tests/api/test_dependencies_and_routes.py
.venv/bin/python -m py_compile src/api/schemas.py src/api/routes/memory.py src/pipelines/retrieval.py tests/api/test_dependencies_and_routes.py

Fixes #163

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the /search endpoint by adding optional LLM-based answer synthesis, expanding search domains to include code and snippets, and implementing detailed latency tracking. It also introduces TTL-based caching for retrieval plans and profile catalogs to improve efficiency. Review feedback identifies an opportunity to further optimize search performance by executing domain queries concurrently and suggests removing an unnecessary side effect in the profile search logic.

Comment thread src/api/routes/memory.py
Comment on lines +720 to +731
if "profile" in domains:
all_results.extend(await _timed_sync("profile", timings, _search_profile, pipeline, user_id))
if "temporal" in domains:
all_results.extend(await _timed_sync("temporal", timings, _search_temporal, pipeline, req.query, user_id, req.top_k))
if "summary" in domains:
all_results.extend(await _timed_async("summary", timings, _search_summary, pipeline, req.query, user_id, req.top_k))
if "snippet" in domains:
all_results.extend(await _timed_async("snippet", timings, _search_snippet, pipeline, req.query, user_id, req.top_k))
if "code" in domains:
if not req.org_id or not req.repo:
return _error(request, "org_id and repo are required when domains includes 'code'.", 400, 0)
all_results.extend(await _timed_async("code", timings, _search_code, req, user_id))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The domain searches are currently executed sequentially. To improve performance for this 'fast search' endpoint, these searches should be run concurrently using asyncio.gather.

Suggested change
if "profile" in domains:
all_results.extend(await _timed_sync("profile", timings, _search_profile, pipeline, user_id))
if "temporal" in domains:
all_results.extend(await _timed_sync("temporal", timings, _search_temporal, pipeline, req.query, user_id, req.top_k))
if "summary" in domains:
all_results.extend(await _timed_async("summary", timings, _search_summary, pipeline, req.query, user_id, req.top_k))
if "snippet" in domains:
all_results.extend(await _timed_async("snippet", timings, _search_snippet, pipeline, req.query, user_id, req.top_k))
if "code" in domains:
if not req.org_id or not req.repo:
return _error(request, "org_id and repo are required when domains includes 'code'.", 400, 0)
all_results.extend(await _timed_async("code", timings, _search_code, req, user_id))
tasks = []
if "profile" in domains:
tasks.append(_timed_sync("profile", timings, _search_profile, pipeline, user_id))
if "temporal" in domains:
tasks.append(_timed_sync("temporal", timings, _search_temporal, pipeline, req.query, user_id, req.top_k))
if "summary" in domains:
tasks.append(_timed_async("summary", timings, _search_summary, pipeline, req.query, user_id, req.top_k))
if "snippet" in domains:
tasks.append(_timed_async("snippet", timings, _search_snippet, pipeline, req.query, user_id, req.top_k))
if "code" in domains:
if not req.org_id or not req.repo:
return _error(request, "org_id and repo are required when domains includes 'code'.", 400, 0)
tasks.append(_timed_async("code", timings, _search_code, req, user_id))
results_from_domains = await asyncio.gather(*tasks)
for domain_results in results_from_domains:
all_results.extend(domain_results)

Comment thread src/api/routes/memory.py
)
return [SourceRecord(domain="profile", content=r.content, score=r.score, metadata=r.metadata) for r in raw]
_, raw = pipeline._fetch_profile_catalog(user_id)
pipeline._cached_profile_records = raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line appears to be unnecessary. The _cached_profile_records attribute is not used within this file's logic, and setting it here creates a potentially confusing side effect on the pipeline object. Please consider removing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add low-latency raw search path separate from agentic answer synthesis

1 participant