-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: add sample method to document array #3003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3003 +/- ##
==========================================
- Coverage 90.05% 89.96% -0.09%
==========================================
Files 140 140
Lines 9510 9520 +10
==========================================
+ Hits 8564 8565 +1
- Misses 946 955 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
jina/types/arrays/document.py
Outdated
if k < len(self): | ||
raise ValueError('DocumentArray do not have enough Document to sample') | ||
indices = random.sample(range(len(self)), k) | ||
sampled = DocumentArray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think u can do sampled = self[indices]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not woking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? we should maybe understand why first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoanFM it's not a python syntax, it's supported by numpy only
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
jina/types/arrays/search_ops.py
Outdated
from .document import DocumentArray | ||
|
||
if k > len(self): | ||
raise ValueError('DocumentArray do not have enough Document to sample') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better tell user how many in DA are k must be smaller than this number
jina/types/arrays/search_ops.py
Outdated
|
||
if k > len(self): | ||
raise ValueError('DocumentArray do not have enough Document to sample') | ||
if seed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed =0 wont trigger this line (which should), so has to be:
if seed: | |
if seed is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
sampled_seed_2 = da.sample(5, seed=1) | ||
assert len(sampled_seed_1) == 5 | ||
assert sampled_seed_1 == sampled_seed_2 | ||
assert sampled != sampled_seed_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can break.
@@ -397,3 +397,17 @@ def test_cache_invalidation_sort_reverse(docarray_for_cache): | |||
docarray_for_cache.reverse() | |||
assert docarray_for_cache[0].id == '2' | |||
assert docarray_for_cache[1].id == '1' | |||
|
|||
|
|||
def test_sample(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can break this unit test into two: one testing that correct amount of items are returned, another testing that setting random seed works as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, PR updated
add
sample
to DocumentArray