Skip to content

Conversation

@chenyushuo
Copy link
Collaborator

@chenyushuo chenyushuo commented Nov 28, 2024

Unlike #489, the main approach here is based on Ray Actor's implementation of multi process union find set to complete equivalence class merging.

@yxdyc yxdyc requested review from HYLcool, pan-x-c and yxdyc December 11, 2024 08:04
@yxdyc yxdyc added dj:op issues/PRs about some specific OPs dj:dist issues/PRs about distributed data processing labels Dec 11, 2024
@yxdyc yxdyc added the dj:efficiency regarding to efficiency issues and enhancements label Dec 20, 2024
@chenyushuo chenyushuo changed the title [WIP] Add minhash deduplicator based on RAY. Add minhash deduplicator based on RAY. Dec 20, 2024
Copy link
Collaborator

@pan-x-c pan-x-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the inline comments

Copy link
Collaborator

@pan-x-c pan-x-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pan-x-c pan-x-c merged commit 1fe821f into modelscope:main Dec 31, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants