8000 feat: add sample method to document array by bwanglzu · Pull Request #3003 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: add sample method to document array #3003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 26, 2021
Merged

feat: add sample method to document array #3003

merged 14 commits into from
Jul 26, 2021

Conversation

bwanglzu
Copy link
Member
@bwanglzu bwanglzu commented Jul 23, 2021

add sample to DocumentArray

@jina-bot jina-bot added size/S area/core This issue/PR affects the core codebase component/type labels Jul 23, 2021
@codecov
Copy link
codecov bot commented Jul 23, 2021

Codecov Report

Merging #3003 (ebbbd13) into master (a7e5384) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3003      +/-   ##
==========================================
- Coverage   90.05%   89.96%   -0.09%     
==========================================
  Files         140      140              
  Lines        9510     9520      +10     
==========================================
+ Hits         8564     8565       +1     
- Misses        946      955       +9     
Flag Coverage Δ
daemon 44.03% <20.00%> (-0.03%) ⬇️
jina 89.95% <100.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/types/arrays/search_ops.py 100.00% <100.00%> (ø)
jina/peapods/peas/__init__.py 92.26% <0.00%> (-3.87%) ⬇️
jina/peapods/pods/compound.py 90.41% <0.00%> (-1.37%) ⬇️
jina/flow/base.py 89.02% <0.00%> (-0.03%) ⬇️
jina/helloworld/chatbot/my_executors.py 94.44% <0.00%> (ø)
jina/helloworld/fashion/my_executors.py 93.05% <0.00%> (ø)
jina/types/arrays/neural_ops.py 98.07% <0.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7e5384...ebbbd13. Read the comment docs.

if k < len(self):
raise ValueError('DocumentArray do not have enough Document to sample')
indices = random.sample(range(len(self)), k)
sampled = DocumentArray()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think u can do sampled = self[indices]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not woking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? we should maybe understand why first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoanFM it's not a python syntax, it's supported by numpy only

@github-actions
Copy link
github-actions bot commented Jul 23, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 1497, delta to last 2 avg.: +2%
  • 😶 query QPS at 17, delta to last 2 avg.: -4%
  • 😶 dam extend QPS at 58315, delta to last 2 avg.: +4%
  • 😶 avg flow time within 1.8202 seconds, delta to last 2 avg.: -1%
  • 😶 import jina within 0.3477 seconds, delta to last 2 avg.: -5%

Breakdown

Version Index QPS Query QPS DAM Extend QPS Avg Flow Time (s) Import Time (s)
current 1497 17 58315 1.8202 0.3477
2.0.10 1458 17 56445 1.8664 0.3661
2.0.9 1464 17 55455 1.815 0.3734

Backed by latency-tracking. Further commits will update this comment.

@jina-bot jina-bot added the area/testing This issue/PR affects testing label Jul 23, 2021
@bwanglzu bwanglzu marked this pull request as ready for review July 23, 2021 19:47
@bwanglzu bwanglzu requested a review from a team as a code owner July 23, 2021 19:47
@bwanglzu bwanglzu linked an issue Jul 23, 2021 that may be closed by this pull request
@bwanglzu bwanglzu requested a review from davidbp July 23, 2021 19:55
@bwanglzu bwanglzu changed the title feat: add sample method feat: add sample method to document array Jul 23, 2021
@bwanglzu bwanglzu removed the request for review from imsergiy July 23, 2021 19:59
@bwanglzu bwanglzu self-assigned this Jul 23, 2021
from .document import DocumentArray

if k > len(self):
raise ValueError('DocumentArray do not have enough Document to sample')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better tell user how many in DA are k must be smaller than this number


if k > len(self):
raise ValueError('DocumentArray do not have enough Document to sample')
if seed:
Copy link
Member
@hanxiao hanxiao Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed =0 wont trigger this line (which should), so has to be:

Suggested change
if seed:
if seed is not None:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

sampled_seed_2 = da.sample(5, seed=1)
assert len(sampled_seed_1) == 5
assert sampled_seed_1 == sampled_seed_2
assert sampled != sampled_seed_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can break.

@@ -397,3 +397,17 @@ def test_cache_invalidation_sort_reverse(docarray_for_cache):
docarray_for_cache.reverse()
assert docarray_for_cache[0].id == '2'
assert docarray_for_cache[1].id == '1'


def test_sample():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can break this unit test into two: one testing that correct amount of items are returned, another testing that setting random seed works as expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, PR updated

@jina-bot jina-bot added the area/housekeeping This issue/PR is housekeeping label Jul 24, 2021
@hanxiao hanxiao merged commit ba2e537 into master Jul 26, 2021
@hanxiao hanxiao deleted the feat-sample-da branch July 26, 2021 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/housekeeping This issue/PR is housekeeping area/testing This issue/PR affects testing component/type size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sample to DocumentArray
5 participants
0