8000 refactor: meth update for document primitive type by bwanglzu · Pull Request #2465 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor: meth update for document primitive type #2465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 26, 2021

Conversation

bwanglzu
Copy link
Member
@bwanglzu bwanglzu commented May 22, 2021
  1. make update easier to use & understand, get rid of included_fields and exclude_fields and all the complex logic with fields, if user specify fields, then only updated specified fields. Otherwise update all fields.
  2. Resolve a bug when calling update, e.g. doc1.update(doc2), in current implementation will call CopyFrom, CopyFrom will clear everything in doc1 and assign doc2 to doc1. This is incorrect when doc1 has more attributes than doc2, the extra attributes in doc1 will gone. We only want to replace the existent fields.

@jina-bot jina-bot added size/S area/core This issue/PR affects the core codebase component/type labels May 22, 2021
@codecov
Copy link
codecov bot commented May 22, 2021

Codecov Report

Merging #2465 (a9279a4) into master (162c9d2) will increase coverage by 0.03%.
The diff coverage is 92.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2465      +/-   ##
==========================================
+ Coverage   83.39%   83.42%   +0.03%     
==========================================
  Files         151      152       +1     
  Lines        9417     9414       -3     
==========================================
+ Hits         7853     7854       +1     
+ Misses       1564     1560       -4     
Flag Coverage Δ
daemon 47.07% <0.00%> (-0.13%) ⬇️
jina 83.24% <92.30%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/types/document/__init__.py 96.21% <92.30%> (+0.53%) ⬆️
jina/types/arrays/document.py 81.25% <0.00%> (-3.89%) ⬇️
jina/parsers/client.py 100.00% <0.00%> (ø)
jina/helloworld/chatbot/app.py 0.00% <0.00%> (ø)
jina/types/document/generators.py 92.00% <0.00%> (ø)
jina/flow/base.py 92.54% <0.00%> (+0.03%) ⬆️
jina/helloworld/multimodal/app.py 86.00% <0.00%> (+0.28%) ⬆️
jina/__init__.py 69.69% <0.00%> (+0.46%) ⬆️
jina/clients/base.py 79.00% <0.00%> (+1.10%) ⬆️
jina/peapods/runtimes/asyncio/grpc/async_call.py 98.11% <0.00%> (+3.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 162c9d2...a9279a4. Read the comment docs.

@jina-bot jina-bot added size/M and removed size/S labels May 22, 2021
@hanxiao
Copy link
Member
hanxiao commented May 24, 2021

test broken

@jina-bot jina-bot added the area/testing This issue/PR affects testing label May 24, 2021
@bwanglzu bwanglzu self-assigned this May 24, 2021
@bwanglzu bwanglzu marked this pull request as ready for review May 24, 2021 10:30
@bwanglzu bwanglzu requested a review from a team as a code owner May 24, 2021 10:30
@bwanglzu bwanglzu requested review from nan-wang and imsergiy May 24, 2021 10:30
@bwanglzu bwanglzu changed the title refactor: update api for document refactor: meth: update for document primitive type May 24, 2021
@bwanglzu bwanglzu changed the title refactor: meth: update for document primitive type refactor: meth update for document primitive type May 24, 2021
doc1.update(source=doc2, fields=['tags', 'embedding', 'chunks'])
assert doc1.id != doc2.id
assert doc1.content != doc2.content # None was updated by source's content.
assert doc1.tags == doc2.tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the tags, it is better to stay consistent with the python update method. To be specific, if the key exists, replace the old value with the new one. Otherwise, append the key-value.

Suggested change
assert doc1.tags == doc2.tags
target = {'a': 'b'}
target.update({'c': 'd'})
assert doc1.tags == target

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nan-wang code updated

Copy link
Member
@hanxiao hanxiao May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the tags, it is better to stay consistent with the python update method.

im not really convinced by this logic, see my comment below

@bwanglzu bwanglzu requested a review from JoanFM May 25, 2021 21:19
@hanxiao
Copy link
Member
hanxiao commented May 26, 2021

very good observation and good PR

for field in fields_to_preserve:
setattr(destination, field, getattr(dest_copy, field))
# For the tags, stay consistent with the python update method.
self._pb_body.tags.update(dest_copy.tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all other fields are overridden during the update, but tags are expanded? I'm not really convinced by this inconsistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nan-wang how do you think? shall we replace the tags or expand it?

Copy link
Member
@nan-wang nan-wang May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all other fields are overridden during the update

@hanxiao During update, the existing fields are overridden and the non-existing fields are added. The same logic goes for tags because tags is a dictionary by itself. The existing key/value pairs in the tags are overridden and only the non-existing ones are added.

Copy link
Contributor
@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

I like it better actually. What do you think @nan-wang?

:param exclude_fields: a tuple of field names that excluded from the current document,
when not given the non-empty fields of the current document is considered as ``exclude_fields``
:param include_fields: a tuple of field names that included from the source document
:param fields: a list of field names that included from the current document,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wording is weird. These fields are from self or from source?

Copy link
Member
@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@hanxiao hanxiao merged commit 3f94956 into master May 26, 2021
@hanxiao hanxiao deleted the refactor-document-update branch May 26, 2021 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/type size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0