-
Notifications
You must be signed in to change notification settings - Fork 57
feat(pdf-batch-upload-frontend-retry-change): Knowledgebase-Vespa-Ing… #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds enum-based NodeType/UploadStatus checks in the FileTree, implements an async per-file retry flow in KnowledgeManagement that calls a new server endpoint, introduces a server InsertFileDocumentApi and route POST /document/:fileId/insert, and refactors worker initialization into a generic createWorker. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant FileTree as Frontend FileTree
participant KM as KnowledgeManagement
participant Server as POST /document/:fileId/insert
participant Worker as Processing Worker
User->>FileTree: Click "Retry" on failed file
FileTree->>KM: onRetry(node, path)
KM->>KM: set node.uploadStatus = PROCESSING\nstatusMessage = "Retrying..."
KM->>Server: POST /document/:fileId/insert
Server->>Server: auth + validate fileId + ownership
Server->>Worker: processDocumentDirect(fileId)
Worker-->>Server: processing result (success / failure)
alt success
Server-->>KM: 200 {result: success}
KM->>KM: set node.uploadStatus = COMPLETED
KM-->>User: show success toast
else failure
Server-->>KM: error / failed result
KM->>KM: set node.uploadStatus = FAILED
KM-->>User: show destructive toast
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)frontend/src/routes/_authenticated/knowledgeManagement.tsx (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @MohdShoaib-18169, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the knowledge base management system by implementing a robust retry mechanism for failed document uploads. It introduces a new backend API endpoint to facilitate the re-processing of individual files and integrates this functionality into the frontend's file tree view. Users can now explicitly retry failed uploads, with the UI providing clear visual feedback throughout the re-processing cycle, improving the overall reliability and user experience of document ingestion. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a retry mechanism for failed file uploads in the knowledge base. A new API endpoint is added to re-trigger file processing, and the frontend is updated to include a retry button and handle the retry logic.
My review focuses on improving code quality and performance. I've suggested refactoring the frontend retry handler to eliminate significant code duplication, making the state updates more robust and maintainable. On the backend, I've recommended parallelizing database queries in the new API handler for better efficiency.
The changes are functionally sound and address an important user-facing feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
frontend/src/components/FileTree.tsx(3 hunks)frontend/src/routes/_authenticated/knowledgeManagement.tsx(1 hunks)server/api/documentInsertion.ts(1 hunks)server/server.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/server.ts (1)
server/api/documentInsertion.ts (1)
InsertFileDocumentApi(52-124)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (3)
frontend/src/utils/fileUtils.ts (1)
FileNode(134-146)frontend/src/api.ts (1)
api(5-5)frontend/src/hooks/use-toast.ts (1)
toast(201-201)
server/api/documentInsertion.ts (5)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/queue/fileProcessor.ts (2)
ProcessingJob(32-35)processJob(100-122)server/api/agent.ts (2)
getAuth(134-142)safeGet(124-130)server/db/user.ts (1)
getUserByEmail(147-156)server/db/knowledgeBase.ts (2)
getCollectionItemById(130-141)getCollectionById(40-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (1)
1988-2049: CRITICAL: Race condition in onRetry still not fixed.The state management issue flagged in previous reviews remains unresolved. The current implementation captures a stale snapshot of
collectionsbefore the async API call, then overwrites state after the call completes, which will clobber concurrent updates from polling or other operations.Current problematic pattern (line 1995):
const newCollections = [...collections] // Stale snapshot captured here // ... await happens ... setCollections(newCollections) // Overwrites any concurrent updatesRequired fix:
Use functional setters for all three state updates (PROCESSING, COMPLETED, FAILED) to always operate on the latest state:- onRetry={async (node, path) => { - if (node.type !== "file" || node.uploadStatus !== UploadStatus.FAILED || !node.id) { - return - } - - const updateNodeStatusInState = (status: UploadStatus, message?: string) => { - setCollections(prevCollections => { - const newCollections = [...prevCollections] - const coll = newCollections.find(c => c.id === collection.id) - if (!coll) return prevCollections - - const updateRecursively = (nodes: FileNode[]): FileNode[] => { - return nodes.map(n => { - if (n.id === node.id) { - return { ...n, uploadStatus: status, statusMessage: message } - } - if (n.children) { - return { ...n, children: updateRecursively(n.children) } - } - return n - }) - } - - coll.items = updateRecursively(coll.items) - return newCollections - }) - } - - try { - updateNodeStatusInState(UploadStatus.PROCESSING, "Retrying...") - - const response = await api.document[":fileId"].insert.$post({ - param: { fileId: node.id }, - }) - - if (!response.ok) { - const errorBody = await response.text() - throw new Error(`API returned ${response.status}: ${errorBody}`) - } - - const result = await response.json() - if (!result.success || result.status !== "completed") { - throw new Error(result.message || "Processing failed after successful API call.") - } - - toast({ - title: "File processed successfully", - description: `${node.name} has been processed and added to the knowledge base.`, - }) - updateNodeStatusInState(UploadStatus.COMPLETED, undefined) - - } catch (error) { - console.error("Retry failed:", error) - const errorMessage = error instanceof Error ? error.message : `Failed to process ${node.name}. Please try again.` - toast({ - title: "Retry failed", - description: errorMessage, - variant: "destructive", - }) - updateNodeStatusInState(UploadStatus.FAILED, "Retry failed") - } - }} + onRetry={async (node, path) => { + if (node.type !== "file" || node.uploadStatus !== UploadStatus.FAILED || !node.id) { + return + } + + // Helper to update node status immutably + const updateNodeStatus = ( + nodes: FileNode[], + targetId: string, + status: UploadStatus, + message?: string + ): FileNode[] => { + return nodes.map(n => { + if (n.id === targetId) { + return { ...n, uploadStatus: status, statusMessage: message } + } + if (n.children) { + return { ...n, children: updateNodeStatus(n.children, targetId, status, message) } + } + return n + }) + } + + try { + // Set PROCESSING status using functional setter + setCollections(prev => + prev.map(c => + c.id !== collection.id + ? c + : { ...c, items: updateNodeStatus(c.items, node.id!, UploadStatus.PROCESSING, "Retrying...") } + ) + ) + + const response = await api.document[":fileId"].insert.$post({ + param: { fileId: node.id }, + }) + + if (!response.ok) { + const errorBody = await response.text() + throw new Error(`API returned ${response.status}: ${errorBody}`) + } + + const result = await response.json() + if (!result.success || result.status !== "completed") { + throw new Error(result.message || "Processing failed after successful API call.") + } + + toast({ + title: "File processed successfully", + description: `${node.name} has been processed and added to the knowledge base.`, + }) + + // Set COMPLETED status using functional setter + setCollections(prev => + prev.map(c => + c.id !== collection.id + ? c + : { ...c, items: updateNodeStatus(c.items, node.id!, UploadStatus.COMPLETED, undefined) } + ) + ) + + } catch (error) { + console.error("Retry failed:", error) + const errorMessage = error instanceof Error ? error.message : `Failed to process ${node.name}. Please try again.` + toast({ + title: "Retry failed", + description: errorMessage, + variant: "destructive", + }) + + // Set FAILED status using functional setter + setCollections(prev => + prev.map(c => + c.id !== collection.id + ? c + : { ...c, items: updateNodeStatus(c.items, node.id!, UploadStatus.FAILED, "Retry failed") } + ) + ) + } + }}Why this matters:
During theawait api.document[":fileId"].insert.$post()call, the polling logic (lines 534-606) runs every 5 seconds and updatescollectionsstate with fresh status information from the server. When the await completes andsetCollections(newCollections)runs, it overwrites those polling updates with the stale snapshot, causing:
- Loss of status updates for other files
- UI showing incorrect states
- Potential infinite polling loops
🧹 Nitpick comments (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (1)
625-641: Remove debug console.logs before merging.The added console.log statements will pollute the browser console in production and may expose internal state details.
Apply this diff to remove the debug logs:
const checkItems = (items: FileNode[]): boolean => { - return items.some((item) => { - console.log( - `Checking file: ${item.name}, status: ${item.uploadStatus}`, - ) - return ( - item.uploadStatus === UploadStatus.PROCESSING || - item.uploadStatus === UploadStatus.PENDING || - (item.children && checkItems(item.children)) - ) - }) + return items.some((item) => + item.uploadStatus === UploadStatus.PROCESSING || + item.uploadStatus === UploadStatus.PENDING || + (item.children && checkItems(item.children)) + ) } return checkItems(collection.items) }) - console.log( - `hasProcessingFiles: ${hasProcessingFiles}, isPolling: ${isPolling}`, - ) - if (hasProcessingFiles && !isPolling) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src/routes/_authenticated/knowledgeManagement.tsx(19 hunks)server/api/documentInsertion.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/api/documentInsertion.ts
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (5)
frontend/src/hooks/useScopedFind.ts (1)
useScopedFind(43-434)frontend/src/contexts/UploadProgressContext.tsx (1)
useUploadProgress(41-47)frontend/src/utils/fileUtils.ts (2)
uploadFileBatch(214-263)FileNode(134-146)frontend/src/api.ts (1)
api(5-5)frontend/src/hooks/use-toast.ts (1)
toast(201-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (1)
626-628: Remove debug console.log statements.These debug logs should be removed before merging to production.
Apply this diff to remove the logs:
- console.log( - `Checking file: ${item.name}, status: ${item.uploadStatus}`, - ) +- console.log( - `hasProcessingFiles: ${hasProcessingFiles}, isPolling: ${isPolling}`, - ) +Also applies to: 639-641
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx(19 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (5)
frontend/src/hooks/useScopedFind.ts (1)
useScopedFind(43-434)frontend/src/contexts/UploadProgressContext.tsx (1)
useUploadProgress(41-47)frontend/src/utils/fileUtils.ts (2)
uploadFileBatch(214-263)FileNode(134-146)frontend/src/api.ts (1)
api(5-5)frontend/src/hooks/use-toast.ts (1)
toast(201-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (1)
1988-2088: LGTM! The stale state issue has been resolved.The onRetry handler now correctly uses functional state setters throughout, which prevents the race condition flagged in previous reviews. The implementation properly:
- Uses
setCollections((prev) => ...)functional form in all three state transitions (PROCESSING, COMPLETED, FAILED)- Operates on the latest state snapshot each time via the
prevparameter- Immutably updates the matching collection and node via
updateRecursively- Avoids capturing and mutating a stale
collectionssnapshotThis ensures that concurrent polling updates are merged correctly rather than being clobbered.
…estion-Thread-Flow-0
d1c848d to
92f5057
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
frontend/src/components/FileTree.tsx (1)
60-66: Type this prop asUploadStatusYou now pass enum values into
UploadStatusIndicator, but the prop is still typed as a string. Tightening it toUploadStatuskeeps the component aligned with the new enum-based checks.-const UploadStatusIndicator = ({ - uploadStatus, +const UploadStatusIndicator = ({ + uploadStatus, statusMessage }: { - uploadStatus: string + uploadStatus: UploadStatus statusMessage?: string }) => {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
frontend/src/components/FileTree.tsx(8 hunks)frontend/src/routes/_authenticated/knowledgeManagement.tsx(1 hunks)server/api/documentInsertion.ts(1 hunks)server/server.ts(3 hunks)server/shared/types.ts(1 hunks)server/worker.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
server/api/documentInsertion.ts (5)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/queue/fileProcessor.ts (2)
ProcessingJob(32-35)processJob(100-122)server/api/agent.ts (2)
getAuth(134-142)safeGet(124-130)server/db/user.ts (1)
getUserByEmail(147-156)server/db/knowledgeBase.ts (2)
getCollectionItemById(130-141)getCollectionById(40-49)
server/worker.ts (2)
server/queue/api-server-queue.ts (3)
boss(9-12)FileProcessingQueue(14-14)PdfFileProcessingQueue(15-15)server/queue/fileProcessor.ts (1)
ProcessingJob(32-35)
server/server.ts (1)
server/api/documentInsertion.ts (1)
InsertFileDocumentApi(53-133)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (2)
frontend/src/utils/fileUtils.ts (1)
FileNode(134-146)frontend/src/api.ts (1)
api(5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
…estion-Thread-Flow-1
…estion-Thread-Flow-0
Description
Testing
Additional Notes
Summary by CodeRabbit
New Features
Improvements
Bug Fixes