Skip to content

Commit 0276869

Browse files
tangg555fridayLglin93
authored
Feat/redis scheduler: task broker + orchestrator + new scheduler monitor (#571)
* debug an error function name * feat: Add DynamicCache compatibility for different transformers versions - Fix build_kv_cache method in hf.py to handle both old and new DynamicCache structures - Support new 'layers' attribute with key_cache/value_cache or keys/values - Maintain backward compatibility with direct key_cache/value_cache attributes - Add comprehensive error handling and logging for unsupported structures - Update move_dynamic_cache_htod function in kv.py for cross-version compatibility - Handle layers-based structure in newer transformers versions - Support alternative attribute names (keys/values vs key_cache/value_cache) - Preserve original functionality for older transformers versions - Add comprehensive tests for DynamicCache compatibility - Test activation memory update with mock DynamicCache layers - Verify layers attribute access across different transformers versions - Fix scheduler logger mock to include memory_manager attribute This resolves AttributeError issues when using different versions of the transformers library and ensures robust handling of DynamicCache objects. debug * feat: implement APIAnalyzerForScheduler for memory operations - Add APIAnalyzerForScheduler class with search/add operations - Support requests and http.client with connection reuse - Include comprehensive error handling and dynamic configuration - Add English test suite with real-world conversation scenarios * feat: Add search_ws API endpoint and enhance API analyzer functionality - Add search_ws endpoint in server_router.py for scheduler-enabled search - Fix missing imports: time module, SearchRequest class, and get_mos_product_instance function - Implement search_ws method in api_analyzer.py with HTTP client support - Add _search_ws_with_requests and _search_ws_with_http_client private methods - Include search_ws usage example in demonstration code - Enhance scheduler and dispatcher capabilities for improved memory management - Expand test coverage to ensure functionality stability This update primarily strengthens the memory scheduling system's search capabilities, providing users with more flexible API interface options. * fix: resolve test failures and warnings in test suite - Fix Pydantic serialization warning in test_memos_chen_tang_hello_world * Add warnings filter to suppress UserWarning from Pydantic serialization - Fix KeyError: 'past_key_values' in test_build_kv_cache_and_generation * Update mock configuration to properly return forward_output with past_key_values * Add DynamicCache version compatibility handling in test mocks * Support both old and new transformers versions with layers/key_cache attributes * Improve assertion logic to check all model calls for required parameters - Update base_scheduler.py to use centralized DEFAULT_MAX_INTERNAL_MESSAGE_QUEUE_SIZE constant * Add import for DEFAULT_MAX_INTERNAL_MESSAGE_QUEUE_SIZE from general_schemas * Replace hardcoded value 100 with configurable constant (1000) All tests now pass successfully with proper version compatibility handling. * feat: add a test_robustness execution to test thread pool execution * feat: optimize scheduler configuration and API search functionality - Add DEFAULT_TOP_K and DEFAULT_CONTEXT_WINDOW_SIZE global constants in general_schemas.py - Update base_scheduler.py to use global default values instead of hardcoded numbers - Fix SchedulerConfigFactory initialization issue by using keyword argument expansion - Resolve UnboundLocalError variable conflict in search_memories_ws function - Fix indentation and parameter issues in OptimizedScheduler search_for_api method - Improve code standardization and maintainability * feat: Add Redis auto-initialization with fallback strategies - Add auto_initialize_redis() with config/env/local fallback - Move Redis logic from dispatcher_monitor to redis_service - Update base_scheduler to use auto initialization - Add proper resource cleanup and error handling * feat: add database connection management to ORM module - Add MySQL engine loading from environment variables in BaseDBManager - Add Redis connection loading from environment variables in BaseDBManager - Enhance database configuration validation and error handling - Complete database adapter infrastructure for ORM module - Provide unified database connection management interface This update provides comprehensive database connection management capabilities for the mem_scheduler module, supporting dynamic MySQL and Redis configuration loading from environment variables, establishing reliable data persistence foundation for scheduling services and API services. * remove part of test * feat: add Redis-based ORM with multiprocess synchronization - Add RedisDBManager and RedisLockableORM classes - Implement atomic locking mechanism for concurrent access - Add merge functionality for different object types - Include comprehensive test suite and examples - Fix Redis key type conflicts in lock operations * fix: resolve scheduler module import and Redis integration issues * revise naive memcube creation in server router * remove long-time tests in test_scheduler * remove redis test which needs .env * refactor all codes about mixture search with scheduler * fix: resolve Redis API synchronization issues and implement search API with reranker - Fix running_entries to running_task_ids migration across codebase - Update sync_search_data method to properly handle TaskRunningStatus - Correct variable naming and logic in API synchronization flow - Implement search API endpoint with reranker functionality - Update test files to reflect new running_task_ids convention - Ensure proper Redis state management for concurrent tasks * remove a test for api module * revise to pass the test suite * address some bugs to make mix_search normally running * modify codes according to evaluation logs * feat: Optimize mixture search and enhance API client * feat: Add conversation_turn tracking for session-based memory search - Add conversation_turn field to APIMemoryHistoryEntryItem schema with default value 0 - Implement session counter in OptimizedScheduler to track turn count per session_id - Update sync_search_data method to accept and store conversation_turn parameter - Maintain session history with LRU eviction (max 5 sessions) - Rename conversation_id to session_id for consistency with request object - Enable direct access to session_id from search requests This feature allows tracking conversation turns within the same session, providing better context for memory retrieval and search history management. * adress time bug in monitor * revise simple tree * add mode to evaluation client; rewrite print to logger.info in db files * feat: 1. add redis queue for scheduler 2. finish the code related to mix search and fine search * debug the working memory code * addressed a range of bugs to make scheduler running correctly * remove test_dispatch_parallel test * print change to logger.info * adjucted the core code related to fine and mixture apis * feat: create task queue to wrap local queue and redis queue. queue now split FIFO to multi queue from different users. addressed a range of bugs * fix bugs: debug bugs about internet trigger * debug get searcher mode * feat: add manual internet * Fix: fix code format * feat: add strategy for fine search * debug redis queue * debug redis queue * fix bugs: completely addressed bugs about redis queue * refactor: add searcher to handler_init; remove info log from task_queue * refactor: modify analyzer * refactor: revise locomo_eval to make it support llm other than gpt-4o-mini * feat: develop advanced searcher with deep search * feat: finish a complete version of deep search * refactor: refactor deep search feature, now only allowing one-round deep search * feat: implement the feature of get_tasks_status, but completed tasks are not recorded yet; waiting to be developed * debuging merged code; searching memories have bugs * change logging level * debug api evaluation * fix bugs: change top to top_k * change log * refactor: rewrite deep search to make it work better * change num_users * feat: developed and test task broker and orchestrator * Fix: Include task_id in ScheduleMessageItem serialization * Fix(Scheduler): Correct event log creation and task_id serialization * Feat(Scheduler): Add conditional detailed logging for KB updates Fix(Scheduler): Correct create_event_log indentation * Fix(Scheduler): Correct create_event_log call sites Reverts previous incorrect fix to scheduler_logger.py and correctly fixes the TypeError at the call sites in general_scheduler.py by removing the invalid 'log_content' kwarg and adding the missing memory_type kwargs. * Fix(Scheduler): Deserialize task_id in ScheduleMessageItem.from_dict This completes the fix for the task_id loss. The 'to_dict' method was previously fixed to serialize the task_id, but the corresponding 'from_dict' method was not updated to deserialize it, causing the value to be lost when messages were read from the queue. * Refactor(Config): Centralize RabbitMQ config override logic Moves all environment variable override logic into initialize_rabbitmq for a single source of truth. This ensures Nacos-provided environment variables for all RabbitMQ settings are respected over file configurations. Also removes now-redundant logging from the publish method. * Revert "Refactor(Config): Centralize RabbitMQ config override logic" This reverts commit b8cc42a. * Fix(Redis): Convert None task_id to empty string during serialization Resolves DataError in Redis Streams when task_id is None by ensuring it's serialized as an empty string instead of None, which Redis does not support. Applies to ScheduleMessageItem.to_dict method. * Feat(Log): Add diagnostic log to /product/add endpoint Adds an INFO level diagnostic log message at the beginning of the create_memory function to help verify code deployment. * Feat(Log): Add comprehensive diagnostic logs for /product/add flow Introduces detailed INFO level diagnostic logs across the entire call chain for the /product/add API endpoint. These logs include relevant context, such as full request bodies, message items before scheduler submission, and messages before RabbitMQ publication, to aid in debugging deployment discrepancies and tracing data flow, especially concerning task_id propagation. Logs added/enhanced in: - src/memos/api/routers/product_router.py - src/memos/api/handlers/add_handler.py - src/memos/multi_mem_cube/single_cube.py - src/memos/mem_os/core.py - src/memos/mem_scheduler/general_scheduler.py - src/memos/mem_scheduler/base_scheduler.py - src/memos/mem_scheduler/webservice_modules/rabbitmq_service.py * Feat(Log): Add comprehensive diagnostic logs for /product/add flow and apply ruff formatting Introduces detailed INFO level diagnostic logs across the entire call chain for the /product/add API endpoint. These logs include relevant context, such as full request bodies, message items before scheduler submission, and messages before RabbitMQ publication, to aid in debugging deployment discrepancies and tracing data flow, especially concerning task_id propagation. Also applies automatic code formatting using ruff format to all modified files. Logs added/enhanced in: - src/memos/api/routers/product_router.py - src/memos/api/handlers/add_handler.py - src/memos/multi_mem_cube/single_cube.py - src/memos/mem_os/core.py - src/memos/mem_scheduler/general_scheduler.py - src/memos/mem_scheduler/base_scheduler.py - src/memos/mem_scheduler/webservice_modules/rabbitmq_service.py * Fix(rabbitmq): Use env vars for KB updates and improve logging * Fix(rabbitmq): Explicitly use MEMSCHEDULER_RABBITMQ_EXCHANGE_NAME and empty routing key for KB updates * Fix(add_handler): Update diagnostic log timestamp * Fix(add_handler): Update diagnostic log timestamp again (auto-updated) * Update default scheduler redis stream prefix * Update diagnostic timestamp in add handler * Allow optional log_content in scheduler event log * feat: new examples to test scheduelr * feat: fair scheduler and refactor of search function * fix bugs: address bugs caused by outdated test code * feat: add task_schedule_monitor * fix: handle nil mem_cube in scheduler message consumers * fix bugs: response messaged changed in memos code * refactor: revise task queue to allow it dealing with pending tasks when no task remaining * refactor: revise mixture search and scheduler logger * Fix scheduler task tracking * fix bugs: address ai review issues * fix bugs: address rabbitmq initialization failed when doing pytest * fix(scheduler): Correct dispatcher task and future tracking --------- Co-authored-by: fridayL <[email protected]> Co-authored-by: [email protected] <> Co-authored-by: Zehao Lin <[email protected]>
1 parent c3c8403 commit 0276869

39 files changed

+1041
-576
lines changed

dump.rdb

3.45 KB
Binary file not shown.

evaluation/scripts/locomo/locomo_eval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ async def main(frame, version="default", options=None, num_runs=1, max_workers=4
311311
with open(response_path) as file:
312312
locomo_responses = json.load(file)
313313

314-
num_users = 2
314+
num_users = 10
315315
all_grades = {}
316316

317317
total_responses_count = sum(

evaluation/scripts/utils/client.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,9 @@ def search(self, query, user_id, top_k):
189189
)
190190
response = requests.request("POST", url, data=payload, headers=self.headers)
191191
assert response.status_code == 200, response.text
192-
assert json.loads(response.text)["message"] == "Memory searched successfully", response.text
192+
assert json.loads(response.text)["message"] == "Search completed successfully", (
193+
response.text
194+
)
193195
return json.loads(response.text)["data"]
194196

195197

examples/data/config/mem_scheduler/general_scheduler_config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ config:
44
act_mem_update_interval: 30
55
context_window_size: 10
66
thread_pool_max_workers: 5
7-
consume_interval_seconds: 1
7+
consume_interval_seconds: 0.01
88
working_mem_monitor_capacity: 20
99
activation_mem_monitor_capacity: 5
1010
enable_parallel_dispatch: true

examples/data/config/mem_scheduler/memos_config_w_optimized_scheduler.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ mem_scheduler:
3838
act_mem_update_interval: 30
3939
context_window_size: 10
4040
thread_pool_max_workers: 10
41-
consume_interval_seconds: 1
41+
consume_interval_seconds: 0.01
4242
working_mem_monitor_capacity: 20
4343
activation_mem_monitor_capacity: 5
4444
enable_parallel_dispatch: true

examples/data/config/mem_scheduler/memos_config_w_scheduler.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ mem_scheduler:
3838
act_mem_update_interval: 30
3939
context_window_size: 10
4040
thread_pool_max_workers: 10
41-
consume_interval_seconds: 1
41+
consume_interval_seconds: 0.01
4242
working_mem_monitor_capacity: 20
4343
activation_mem_monitor_capacity: 5
4444
enable_parallel_dispatch: true

examples/mem_scheduler/api_w_scheduler.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
print(f"Queue maxsize: {getattr(mem_scheduler.memos_message_queue, 'maxsize', 'N/A')}")
1818
print("=====================================\n")
1919

20-
mem_scheduler.memos_message_queue.debug_mode_on()
2120
queue = mem_scheduler.memos_message_queue
2221
queue.clear()
2322

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
import sys
2+
3+
from collections import defaultdict
4+
from pathlib import Path
5+
6+
from memos.api.routers.server_router import mem_scheduler
7+
from memos.mem_scheduler.schemas.message_schemas import ScheduleMessageItem
8+
9+
10+
FILE_PATH = Path(__file__).absolute()
11+
BASE_DIR = FILE_PATH.parent.parent.parent
12+
sys.path.insert(0, str(BASE_DIR))
13+
14+
15+
def make_message(user_id: str, mem_cube_id: str, label: str, idx: int | str) -> ScheduleMessageItem:
16+
return ScheduleMessageItem(
17+
item_id=f"{user_id}:{mem_cube_id}:{label}:{idx}",
18+
user_id=user_id,
19+
mem_cube_id=mem_cube_id,
20+
label=label,
21+
content=f"msg-{idx} for {user_id}/{mem_cube_id}/{label}",
22+
)
23+
24+
25+
def seed_messages_for_test_fairness(queue, combos, per_stream):
26+
# send overwhelm message by one user
27+
(u, c, label) = combos[0]
28+
task_target = 100
29+
print(f"{u}:{c}:{label} submit {task_target} messages")
30+
for i in range(task_target):
31+
msg = make_message(u, c, label, f"overwhelm_{i}")
32+
queue.submit_messages(msg)
33+
34+
for u, c, label in combos:
35+
print(f"{u}:{c}:{label} submit {per_stream} messages")
36+
for i in range(per_stream):
37+
msg = make_message(u, c, label, i)
38+
queue.submit_messages(msg)
39+
print("======= seed_messages Done ===========")
40+
41+
42+
def count_by_stream(messages):
43+
counts = defaultdict(int)
44+
for m in messages:
45+
key = f"{m.user_id}:{m.mem_cube_id}:{m.label}"
46+
counts[key] += 1
47+
return counts
48+
49+
50+
def run_fair_redis_schedule(batch_size: int = 3):
51+
print("=== Redis Fairness Demo ===")
52+
print(f"use_redis_queue: {mem_scheduler.use_redis_queue}")
53+
mem_scheduler.consume_batch = batch_size
54+
queue = mem_scheduler.memos_message_queue
55+
56+
# Isolate and clear queue
57+
queue.clear()
58+
59+
# Define multiple streams: (user_id, mem_cube_id, task_label)
60+
combos = [
61+
("u1", "u1", "labelX"),
62+
("u1", "u1", "labelY"),
63+
("u2", "u2", "labelX"),
64+
("u2", "u2", "labelY"),
65+
]
66+
per_stream = 5
67+
68+
# Seed messages evenly across streams
69+
seed_messages_for_test_fairness(queue, combos, per_stream)
70+
71+
# Compute target batch size (fair split across streams)
72+
print(f"Request batch_size={batch_size} for {len(combos)} streams")
73+
74+
for _ in range(len(combos)):
75+
# Fetch one brokered pack
76+
msgs = queue.get_messages(batch_size=batch_size)
77+
print(f"Fetched {len(msgs)} messages in first pack")
78+
79+
# Check fairness: counts per stream
80+
counts = count_by_stream(msgs)
81+
for k in sorted(counts):
82+
print(f"{k}: {counts[k]}")
83+
84+
85+
if __name__ == "__main__":
86+
# task 1 fair redis schedule
87+
run_fair_redis_schedule()
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
from pathlib import Path
2+
from time import sleep
3+
4+
# Note: we skip API handler status/wait utilities in this demo
5+
from memos.api.routers.server_router import mem_scheduler
6+
from memos.mem_scheduler.schemas.message_schemas import ScheduleMessageItem
7+
8+
9+
# Debug: Print scheduler configuration
10+
print("=== Scheduler Configuration Debug ===")
11+
print(f"Scheduler type: {type(mem_scheduler).__name__}")
12+
print(f"Config: {mem_scheduler.config}")
13+
print(f"use_redis_queue: {mem_scheduler.use_redis_queue}")
14+
print(f"Queue type: {type(mem_scheduler.memos_message_queue).__name__}")
15+
print(f"Queue maxsize: {getattr(mem_scheduler.memos_message_queue, 'maxsize', 'N/A')}")
16+
print("=====================================\n")
17+
18+
queue = mem_scheduler.memos_message_queue
19+
20+
21+
# Define a handler function
22+
def my_test_handler(messages: list[ScheduleMessageItem]):
23+
print(f"My test handler received {len(messages)} messages: {[one.item_id for one in messages]}")
24+
for msg in messages:
25+
# Create a file named by task_id (use item_id as numeric id 0..99)
26+
task_id = str(msg.item_id)
27+
file_path = tmp_dir / f"{task_id}.txt"
28+
try:
29+
print(f"writing {file_path}...")
30+
file_path.write_text(f"Task {task_id} processed.\n")
31+
except Exception as e:
32+
print(f"Failed to write {file_path}: {e}")
33+
34+
35+
def submit_tasks():
36+
mem_scheduler.memos_message_queue.clear()
37+
38+
# Create 100 messages (task_id 0..99)
39+
users = ["user_A", "user_B"]
40+
messages_to_send = [
41+
ScheduleMessageItem(
42+
item_id=str(i),
43+
user_id=users[i % 2],
44+
mem_cube_id="test_mem_cube",
45+
label=TEST_HANDLER_LABEL,
46+
content=f"Create file for task {i}",
47+
)
48+
for i in range(100)
49+
]
50+
# Submit messages in batch and print completion
51+
print(f"Submitting {len(messages_to_send)} messages to the scheduler...")
52+
mem_scheduler.memos_message_queue.submit_messages(messages_to_send)
53+
print(f"Task submission done! tasks in queue: {mem_scheduler.get_tasks_status()}")
54+
55+
56+
# Register the handler
57+
TEST_HANDLER_LABEL = "test_handler"
58+
mem_scheduler.register_handlers({TEST_HANDLER_LABEL: my_test_handler})
59+
60+
61+
tmp_dir = Path("./tmp")
62+
tmp_dir.mkdir(exist_ok=True)
63+
64+
# Test stop-and-restart: if tmp already has >1 files, skip submission and print info
65+
existing_count = len(list(Path("tmp").glob("*.txt"))) if Path("tmp").exists() else 0
66+
if existing_count > 1:
67+
print(f"Skip submission: found {existing_count} files in tmp (>1), continue processing")
68+
else:
69+
submit_tasks()
70+
71+
# 6. Wait until tmp has 100 files or timeout
72+
poll_interval = 0.01
73+
expected = 100
74+
tmp_dir = Path("tmp")
75+
while mem_scheduler.get_tasks_status()["remaining"] != 0:
76+
count = len(list(tmp_dir.glob("*.txt"))) if tmp_dir.exists() else 0
77+
tasks_status = mem_scheduler.get_tasks_status()
78+
mem_scheduler.print_tasks_status(tasks_status=tasks_status)
79+
print(f"[Monitor] Files in tmp: {count}/{expected}")
80+
sleep(poll_interval)
81+
print(f"[Result] Final files in tmp: {len(list(tmp_dir.glob('*.txt')))})")
82+
83+
# 7. Stop the scheduler
84+
print("Stopping the scheduler...")
85+
mem_scheduler.stop()

src/memos/api/handlers/add_handler.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,9 @@ def handle_add_memories(self, add_req: APIADDRequest) -> MemoryResponse:
4747
Returns:
4848
MemoryResponse with added memory information
4949
"""
50-
self.logger.info(f"[AddHandler] Add Req is: {add_req}")
50+
self.logger.info(
51+
f"[DIAGNOSTIC] server_router -> add_handler.handle_add_memories called (Modified at 2025-11-29 18:46). Full request: {add_req.model_dump_json(indent=2)}"
52+
)
5153

5254
if add_req.info:
5355
exclude_fields = list_all_fields()

0 commit comments

Comments
 (0)