refactor: meth update for document primitive type #2465

bwanglzu · 2021-05-22T17:16:28Z

make update easier to use & understand, get rid of included_fields and exclude_fields and all the complex logic with fields, if user specify fields, then only updated specified fields. Otherwise update all fields.
Resolve a bug when calling update, e.g. doc1.update(doc2), in current implementation will call CopyFrom, CopyFrom will clear everything in doc1 and assign doc2 to doc1. This is incorrect when doc1 has more attributes than doc2, the extra attributes in doc1 will gone. We only want to replace the existent fields.

codecov · 2021-05-22T17:19:47Z

Codecov Report

Merging #2465 (a9279a4) into master (162c9d2) will increase coverage by 0.03%.
The diff coverage is 92.30%.

@@            Coverage Diff             @@
##           master    #2465      +/-   ##
==========================================
+ Coverage   83.39%   83.42%   +0.03%     
==========================================
  Files         151      152       +1     
  Lines        9417     9414       -3     
==========================================
+ Hits         7853     7854       +1     
+ Misses       1564     1560       -4

Flag	Coverage Δ
daemon	`47.07% <0.00%> (-0.13%)`	⬇️
jina	`83.24% <92.30%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
jina/types/document/__init__.py	`96.21% <92.30%> (+0.53%)`	⬆️
jina/types/arrays/document.py	`81.25% <0.00%> (-3.89%)`	⬇️
jina/parsers/client.py	`100.00% <0.00%> (ø)`
jina/helloworld/chatbot/app.py	`0.00% <0.00%> (ø)`
jina/types/document/generators.py	`92.00% <0.00%> (ø)`
jina/flow/base.py	`92.54% <0.00%> (+0.03%)`	⬆️
jina/helloworld/multimodal/app.py	`86.00% <0.00%> (+0.28%)`	⬆️
jina/__init__.py	`69.69% <0.00%> (+0.46%)`	⬆️
jina/clients/base.py	`79.00% <0.00%> (+1.10%)`	⬆️
jina/peapods/runtimes/asyncio/grpc/async_call.py	`98.11% <0.00%> (+3.77%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 162c9d2...a9279a4. Read the comment docs.

hanxiao · 2021-05-24T04:48:49Z

test broken

nan-wang · 2021-05-24T14:36:50Z

tests/unit/types/document/test_document.py

+    doc1.update(source=doc2, fields=['tags', 'embedding', 'chunks'])
+    assert doc1.id != doc2.id
+    assert doc1.content != doc2.content  # None was updated by source's content.
+    assert doc1.tags == doc2.tags


As for the tags, it is better to stay consistent with the python update method. To be specific, if the key exists, replace the old value with the new one. Otherwise, append the key-value.

Suggested change

assert doc1.tags == doc2.tags

target = {'a': 'b'}

target.update({'c': 'd'})

assert doc1.tags == target

@nan-wang code updated

As for the tags, it is better to stay consistent with the python update method.

im not really convinced by this logic, see my comment below

hanxiao · 2021-05-26T02:33:34Z

very good observation and good PR

hanxiao · 2021-05-26T02:37:42Z

jina/types/document/__init__.py

+                for field in fields_to_preserve:
+                    setattr(destination, field, getattr(dest_copy, field))
+        # For the tags, stay consistent with the python update method.
+        self._pb_body.tags.update(dest_copy.tags)


so all other fields are overridden during the update, but tags are expanded? I'm not really convinced by this inconsistency.

@nan-wang how do you think? shall we replace the tags or expand it?

so all other fields are overridden during the update

@hanxiao During update, the existing fields are overridden and the non-existing fields are added. The same logic goes for tags because tags is a dictionary by itself. The existing key/value pairs in the tags are overridden and only the non-existing ones are added.

JoanFM

8000

I like it better actually. What do you think @nan-wang?

JoanFM · 2021-05-26T10:41:03Z

jina/types/document/__init__.py

-        :param exclude_fields: a tuple of field names that excluded from the current document,
-                when not given the non-empty fields of the current document is considered as ``exclude_fields``
-        :param include_fields: a tuple of field names that included from the source document
+        :param fields: a list of field names that included from the current document,


this wording is weird. These fields are from self or from source?

nan-wang

LGTM👍

refactor: update api for document

64d391f

jina-bot added size/S area/core This issue/PR affects the core codebase component/type labels May 22, 2021

refactor: add preserved fields

753666e

jina-bot added size/M and removed size/S labels May 22, 2021

refactor: add update logic

d3ebb0b

refactor: add tests

fe83d8b

jina-bot added the area/testing This issue/PR affects testing label May 24, 2021

bwanglzu added 2 commits May 24, 2021 12:02

Merge branch 'master' into refactor-document-update

7555483

refactor: update black

8551fb1

bwanglzu self-assigned this May 24, 2021

bwanglzu marked this pull request as ready for review May 24, 2021 10:30

bwanglzu requested a review from a team as a code owner May 24, 2021 10:30

bwanglzu requested review from nan-wang and imsergiy May 24, 2021 10:30

bwanglzu changed the title ~~refactor: update api for document~~ refactor: meth: update for document primitive type May 24, 2021

bwanglzu changed the title ~~refactor: meth: update for document primitive type~~ refactor: meth update for document primitive type May 24, 2021

refactor: add more test cases

89133c0

nan-wang requested changes May 24, 2021

View reviewed changes

feat: change tags field update logic

c593c4f

bwanglzu requested a review from JoanFM May 25, 2021 21:19

hanxiao reviewed May 26, 2021

View reviewed changes

bwanglzu added 3 commits May 26, 2021 12:03

feat: refine update logic

657dc55

feat: remove unused import

2985012

feat: update comments

a9279a4

JoanFM reviewed May 26, 2021

View reviewed changes

nan-wang approved these changes May 26, 2021

View reviewed changes

hanxiao merged commit 3f94956 into master May 26, 2021

hanxiao deleted the refactor-document-update branch May 26, 2021 14:30

bwanglzu mentioned this pull request May 30, 2021

fix: use update instead of MergeFrom #2512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: meth update for document primitive type #2465

refactor: meth update for document primitive type #2465

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

-    assert doc1.tags == doc2.tags
+    target = {'a': 'b'}
+    target.update({'c': 'd'})
+    assert doc1.tags == target

refactor: meth update for document primitive type #2465

refactor: meth update for document primitive type #2465

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!