Skip to content

Conversation

@satp42
Copy link

@satp42 satp42 commented Oct 24, 2025

Added new settings to control embedding performance in packages/types/src/config.types.ts. Specifically:

  1. embedding_batch_size (number, default: 64)
  2. embedding_max_threads (number, default: 4)
  3. embedding_max_connections (number, default: 8)

Modified packages/backend-server/src/main.rs to accept additional command-line arguments for batch size, max threads, and max connections.

Updated packages/backend-server/src/server/mod.rs:

  • Added max_connections field to LocalAIServer
  • Implemented a semaphore or connection counter to limit concurrent client connections (currently unlimited thread spawning)
  • Configured rayon global thread pool using rayon::ThreadPoolBuilder before starting the server

Modified packages/backend-server/src/embeddings/model.rs:

  • Added batch_size field to EmbeddingModel struct
  • Replaced hardcoded batch size Some(1) at line 71 with configurable self.batch_size

Passed Configuration from Electron Main Process

Implemented Lazy Embeddings for Large Document Types

  • Extended lazy embeddings logic to include ResourceTextContentType::PDF, ResourceTextContentType::Document, and ResourceTextContentType::Article
  • These document types will get a generateLazyEmbeddings tag instead of immediate embedding generation
  • Embeddings will then be generated on-demand when documents are accessed in chat/search

Optimized Chunking Strategy

  • Increased max_chunk_size from 2000 to 2500 characters (reduces total chunks by ~20% while maintaining quality)
  • Kept overlap_sentences at 1 for continuity
  • This change reduced the number of embeddings needed per document

The expected impact of this PR:

  • Batch size increase (1 → 64): reduction in CPU overhead due to better model utilization
  • Thread pool limits: Prevents CPU saturation, keeps usage under control
  • Connection limits: Prevents thread explosion during bulk uploads
  • Lazy embeddings for large docs: Defers expensive operations until needed
  • Larger chunks (2000 → 2500): fewer embeddings to generate and store

Related to #28

@satp42 satp42 marked this pull request as draft October 24, 2025 22:10
@satp42 satp42 marked this pull request as ready for review October 24, 2025 22:11
@satp42 satp42 marked this pull request as draft October 24, 2025 22:12
@satp42 satp42 marked this pull request as ready for review October 24, 2025 22:13
@aavshr aavshr self-requested a review October 31, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant