Releases: oceanbase/powerrag
v0.2.1
What's Changed
- fix:chat with knowledge base return not found by @Zhangg7723 in #21
- feat:add PowerRAG SDK and API Proxy by @Zhangg7723 in #23
- hotfix: use locally built deps image in GHA by @whhe(川粉)(川粉) in #24
- ci: change to publish Docker images to DockerHub for main branch and tags by @whhe in #26
- Add powerrag sdk publish action by @Zhangg7723 in #27
- fix MinerU API configuration error by @Zhangg7723 in #28
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Release Notes
v0.2.0
RAGFlow Integration
This release integrates RAGFlow updates from v0.21.1 to v0.22.1, bringing the following improvements:
From RAGFlow v0.22.1:
- Agent: Supports exporting Agent outputs in Word or Markdown formats
- Agent: Adds a List operations component
- Agent: Adds a Variable aggregator component
- Data sources: Supports S3-compatible data sources, e.g., MinIO
- Data sources: Adds data synchronization with JIRA
- Continues the redesign of the Profile page layouts
- Upgrades the Flask web framework from synchronous to asynchronous, increasing concurrency and preventing blocking issues caused when requesting upstream LLM services
From RAGFlow v0.22.0:
- Dataset: Supports data synchronization from five online sources (AWS S3, Google Drive, Notion, Confluence, and Discord)
- Dataset: RAPTOR can be built across an entire dataset or on individual documents
- Ingestion pipeline: Supports Docling document parsing in the Parser component
- Launches a new administrative Web UI dashboard for graphical user management and service status monitoring
- Agent: Supports structured output
- Agent: Supports metadata filtering in the Retrieval component
- Agent: Introduces a Variable aggregator component with data operation and session variable definition capabilities
- Upgrades RAGFlow's document engine Infinity to v0.6.5
New Features
-
Optimized Gotenberg Functions (#7)
- Enhanced document conversion capabilities
-
OceanBase Docker Configuration (#4)
- Updated OceanBase docker configuration for better deployment
-
Enhanced Search Performance (#17)
- Improved search functionality for better performance
Improvements
-
Refactored Title and Regex Based Chunk Method (#16)
- Improved chunking logic for better document processing
-
Updated Merging Logic in split_with_title_chunks (#8)
- Enhanced chunk merging algorithm
-
Simplified String Escaping (#13)
- Refactored string escaping in
get_value_strandOBConnectionfor better maintainability
- Refactored string escaping in
-
Docker Configuration and Documentation (#12)
- Updated docker configurations and README
-
Build Workflow (#9)
- Added workflow to build dev docker image
Bug Fixes
-
Fixed PowerRAG Server Timeout Error (#5)
- Resolved timeout issues in PowerRAG server
-
Fixed Image Source Lost in Smart Chunks (#11)
- Fixed issue where image sources were lost during smart chunking
-
Fixed Security Alerts and Chunk Saved Error (#19)
- Resolved security issues and chunk saving errors
Contributors
Thanks to all contributors who made this release possible:
Full Changelog: v0.1.0...v0.2.0
发布说明
v0.2.0
RAGFlow 集成
本次发布集成了 RAGFlow 从 v0.21.1 到 v0.22.1 的更新,包含以下改进:
来自 RAGFlow v0.22.1:
- Agent:支持导出 Agent 输出为 Word 或 Markdown 格式
- Agent:新增列表操作组件
- Agent:新增变量聚合器组件
- 数据源:支持 S3 兼容数据源,例如 MinIO
- 数据源:新增 JIRA 数据同步功能
- 继续重新设计个人中心页面布局
- 将 Flask Web 框架从同步升级为异步,提高并发性能,防止请求上游 LLM 服务时出现阻塞问题
来自 RAGFlow v0.22.0:
- 数据集:支持从五个在线数据源同步数据(AWS S3、Google Drive、Notion、Confluence 和 Discord)
- 数据集:RAPTOR 可以在整个数据集或单个文档上构建
- 数据摄取管道:在解析器组件中支持 Docling 文档解析
- 推出新的管理 Web UI 仪表板,用于图形化用户管理和服务状态监控
- Agent:支持结构化输出
- Agent:在检索组件中支持元数据过滤
- Agent:引入变量聚合器组件,具有数据操作和会话变量定义功能
- 将 RAGFlow 的文档引擎 Infinity 升级至 v0.6.5
新功能
-
优化 Gotenberg 功能 (#7)
- 增强文档转换能力
-
OceanBase Docker 配置 (#4)
- 更新 OceanBase docker 配置,优化部署体验
-
增强搜索性能 (#17)
- 优化搜索功能,提升性能
改进
-
重构基于标题和正则的分块方法 (#16)
- 改进分块逻辑,提升文档处理效果
-
更新 split_with_title_chunks 的合并逻辑 (#8)
- 增强分块合并算法
-
简化字符串转义 (#13)
- 重构
get_value_str和OBConnection中的字符串转义逻辑,提升可维护性
- 重构
-
Docker 配置和文档 (#12)
- 更新 docker 配置和 README
-
构建工作流 (#9)
- 新增开发版 docker 镜像构建工作流
错误修复
-
修复 PowerRAG 服务器超时错误 (#5)
- 解决 PowerRAG 服务器超时问题
-
修复智能分块中图片源丢失问题 (#11)
- 修复智能分块过程中图片源丢失的问题
-
修复安全告警和分块保存错误 (#19)
- 解决安全问题和分块保存错误
贡献者
感谢所有为本版本做出贡献的开发者:
完整更新日志: v0.1.0...v0.2.0
v0.1.0
New features
- New parsing and processing workflow: Introduces custom parsing and processing pipeline, providing more flexible document parsing and chunking strategies. This feature includes a complete pipeline of parsing, extraction, conversion, and splitting, enabling users to customize data processing pipelines according to business requirements.
- Custom Parsers: Adds support for multiple parsing methods including MinerU, vLLM, and DotsOCR. The MinerU parser supports calling remote services via HTTP API, the vLLM parser enables document understanding using large language models, and the DotsOCR parser specializes in processing documents containing charts and formulas.
- Flow Components: Includes core components such as converters, extractors, parsers, and splitters. Converters handle document format conversion, extractors support entity extraction and metadata extraction, parsers process various document formats, and splitters provide intelligent chunking strategies.
- Server and APIs: Provides complete backend service support, including a standalone PowerRAG server, RESTful API interfaces, and task queue management. Supports asynchronous task processing and task status queries.
- Frontend Updates: Adds PowerRAG-related configuration interfaces and operational components, including parser configuration forms, flow design interfaces, and task monitoring panels, improving user experience.
新功能
- 全新的解析和处理流程:引入自定义解析和处理流程,提供更灵活的文档解析和分块策略。该功能包含完整的解析、提取、转换和分割流程,支持用户根据业务需求自定义数据处理管道。
- 自定义解析器:新增支持 MinerU、vLLM、DotsOCR 等多种解析方式。MinerU 解析器支持通过 HTTP API 调用远程服务,vLLM 解析器支持使用大语言模型进行文档理解,DotsOCR 解析器专门处理包含图表和公式的文档。
- 流程组件:包括转换器、提取器、解析器、分割器等核心组件。转换器负责文档格式转换,提取器支持实体提取和元数据提取,解析器处理各种文档格式,分割器提供智能分块策略。
- 服务器和 API:提供完整的后端服务支持,包括独立的 PowerRAG 服务器、RESTful API 接口和任务队列管理。支持异步任务处理和任务状态查询。
- 前端界面更新:新增 PowerRAG 相关配置界面和操作组件,包括解析器配置表单、流程设计界面、任务监控面板等,提升用户体验。