-
Notifications
You must be signed in to change notification settings - Fork 2.2k
refactor: meth update for document primitive type #2465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2465 +/- ##
==========================================
+ Coverage 83.39% 83.42% +0.03%
==========================================
Files 151 152 +1
Lines 9417 9414 -3
==========================================
+ Hits 7853 7854 +1
+ Misses 1564 1560 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
test broken |
doc1.update(source=doc2, fields=['tags', 'embedding', 'chunks']) | ||
assert doc1.id != doc2.id | ||
assert doc1.content != doc2.content # None was updated by source's content. | ||
assert doc1.tags == doc2.tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the tags
, it is better to stay consistent with the python update
method. To be specific, if the key exists, replace the old value with the new one. Otherwise, append the key-value.
assert doc1.tags == doc2.tags | |
target = {'a': 'b'} | |
target.update({'c': 'd'}) | |
assert doc1.tags == target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nan-wang code updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the tags, it is better to stay consistent with the python update method.
im not really convinced by this logic, see my comment below
very good observation and good PR |
jina/types/document/__init__.py
Outdated
for field in fields_to_preserve: | ||
setattr(destination, field, getattr(dest_copy, field)) | ||
# For the tags, stay consistent with the python update method. | ||
self._pb_body.tags.update(dest_copy.tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so all other fields are overridden during the update, but tags
are expanded? I'm not really convinced by this inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nan-wang how do you think? shall we replace the tags or expand it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so all other fields are overridden during the update
@hanxiao During update
, the existing fields are overridden and the non-existing fields are added. The same logic goes for tags
because tags
is a dictionary by itself. The existing key/value pairs in the tags
are overridden and only the non-existing ones are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it better actually. What do you think @nan-wang?
:param exclude_fields: a tuple of field names that excluded from the current document, | ||
when not given the non-empty fields of the current document is considered as ``exclude_fields`` | ||
:param include_fields: a tuple of field names that included from the source document | ||
:param fields: a list of field names that included from the current document, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wording is weird. These fields are from self
or from source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
update
easier to use & understand, get rid ofincluded_fields
andexclude_fields
and all the complex logic withfields
, if user specifyfields
, then only updated specified fields. Otherwise update all fields.update
, e.g.doc1.update(doc2)
, in current implementation will callCopyFrom
,CopyFrom
will clear everything indoc1
and assigndoc2
todoc1
. This is incorrect whendoc1
has more attributes thandoc2
, the extra attributes indoc1
will gone. We only want to replace the existent fields.