8000 fix: put siblings back by florian-hoenicke · Pull Request #2234 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: put siblings back #2234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 2, 2021
Merged

fix: put siblings back #2234

merged 13 commits into from
Apr 2, 2021

Conversation

florian-hoenicke
Copy link
Member

Removing the length attribute broke the BiMatchRanker.
This pr restores the length attribute and renames it to siblings.

@jina-bot jina-bot added size/M area/core This issue/PR affects the core codebase area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality component/client component/driver component/proto component/type labels Mar 25, 2021
@codecov
Copy link
codecov bot commented Mar 25, 2021

Codecov Report

Merging #2234 (e8931cd) into master (6228128) will decrease coverage by 3.18%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2234      +/-   ##
==========================================
- Coverage   90.59%   87.40%   -3.19%     
==========================================
  Files         209      209              
  Lines       11352    11372      +20     
==========================================
- Hits        10284     9940     -344     
- Misses       1068     1432     +364     
Flag Coverage Δ
daemon 51.27% <44.44%> (-0.01%) ⬇️
jina 87.64% <100.00%> (-3.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/proto/jina_pb2.py 100.00% <ø> (ø)
jina/clients/request/__init__.py 100.00% <100.00%> (ø)
jina/clients/request/asyncio.py 89.47% <100.00%> (ø)
jina/drivers/segment.py 78.78% <100.00%> (-1.86%) ⬇️
jina/types/document/__init__.py 92.12% <100.00%> (-0.13%) ⬇️
jina/types/sets/chunk.py 93.54% <100.00%> (-2.29%) ⬇️
jina/drivers/predict.py 33.87% <0.00%> (-54.84%) ⬇️
jina/drivers/multimodal.py 37.83% <0.00%> (-54.06%) ⬇️
jina/jaml/parsers/flow/v1.py 48.14% <0.00%> (-44.45%) ⬇️
jina/clients/asyncio.py 50.00% <0.00%> (-40.63%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6228128...e8931cd. Read the comment docs.

@github-actions
Copy link
github-actions bot commented Mar 25, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 987, delta to last 3 avg.: +1%
  • 😶 query QPS at 14, delta to last 3 avg.: +0%

Breakdown

Version Index QPS Query QPS
current 987 14
1.1.0 990 13
1.0.16 952 13

Backed by latency-tracking. Further commits will update this comment.

@florian-hoenicke florian-hoenicke marked this pull request as ready for review March 25, 2021 12:25
@florian-hoenicke florian-hoenicke requested a review from a team as a code owner March 25, 2021 12:25
@florian-hoenicke florian-hoenicke requested review from maximilianwerk, BastinJafari and JoanFM and removed request for BastinJafari March 25, 2021 12:25
Copy link
Contributor
@Yongxuanzhang Yongxuanzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to add tests?

florian-hoenicke and others added 2 commits March 26, 2021 12:03
Co-authored-by: Nan Wang <nan.wang@jina.ai>
@florian-hoenicke
Copy link
Member Author
florian-hoenicke commented Mar 26, 2021

Maybe need to add tests?

@Yongxuanzhang thanks, I added the test

@jina-bot jina-bot added the area/testing This issue/PR affects testing label Mar 26, 2021
nan-wang
nan-wang previously approved these changes Mar 26, 2021
Copy link
Member
@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

Copy link
Contributor
@cristianmtr cristianmtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an integration test? Or adapt an existing one with segmentation to assert nr of siblings?

@@ -54,8 +54,9 @@ def _apply_all(self, docs: 'DocumentSet', *args, **kwargs):

@staticmethod
def _add_chunks(doc, chunks):
num_siblings = len(chunks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_siblings = len(chunks)
num_siblings = len(chunks) + len(doc.chunks)

@maximilianwerk maximilianwerk force-pushed the fix-put-siblings-back branch from 0d39275 to 3cba88b Compare March 30, 2021 09:58
@maximilianwerk
Copy link
Member

Rebased to fix broken test

8000 Copy link
Contributor
@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we recompute the siblings in the ChunkSet and add it in every add operation?

@@ -54,8 +54,9 @@ def _apply_all(self, docs: 'DocumentSet', *args, **kwargs):

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an assertion in the segment driver test to see if the resulting chunks have the right amount of siblings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I added a test.

@@ -48,17 +56,41 @@ def append(self, document: 'Document', **kwargs) -> 'Document':
chunk.update_content_hash()
return chunk

def extend(self, iterable: Iterable['Document']) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the sibling update functionality to the extend method. It is a bit inconsistent because append does not update the siblings. But if we would update this attribute for append as well it could slow down the system when having many chunks. We could argue that the system is slowed down anyways if we have a lot of chunks and have the doc.siblings = len(self) also in the append.
Anyways, I think there is the goal to remove siblings in the long term. So A3E2 thinking about all the cases might not be necessary

@nan-wang
Copy link
Member
nan-wang commented Apr 1, 2021

Can you also add an integration test? Or adapt an existing one with segmentation to assert nr of siblings?

@cristianmtr I've added an integration test

Copy link
Member
@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@hanxiao hanxiao merged commit d9ca73d into master Apr 2, 2021
@hanxiao hanxiao deleted the fix-put-siblings-back branch April 2, 2021 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/client component/driver component/proto component/type size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0