-
-
Notifications
You must be signed in to change notification settings - Fork 0
Use a hash-based pick #41
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Fields
Give feedbackNo fields configured for issues without a type.
Projects
Status
No status
Use a hash-based pick: sort tokens by a stable hash (e.g., hash(token) or sha1(token)), then take the first N. This approximates MinHash and tends to distribute blocks more evenly across the token space.
This change shrinks
candidate_indicesper block, so the inner Jaccard loop runs far fewer times.