refactor: try to speed up update content hash #3045

bwanglzu · 2021-07-29T21:38:49Z

the idea:

when calling update_content_hash, we actually never change parameters, so include_fields never being used except test. We also get rid of the mutually exclusive check.
maybe this is a better solution for hashing, we don’t specify exclude_fields ,but specify fields necessary for hashing, safer
CopyFrom and FieldMask is expensive, we manage to only use FieldMask without creating the empty_proto
SerializeToString = SerializePartialToString + check if message is initialised, we initialise the pb message in the first line of the method, so checking is unnecessary please refer to here.

codecov · 2021-07-29T21:42:55Z

Codecov Report

Merging #3045 (c35652b) into master (f40d796) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #3045   +/-   ##
=======================================
  Coverage   89.35%   89.35%           
=======================================
  Files         141      141           
  Lines        9489     9483    -6     
=======================================
- Hits         8479     8474    -5     
+ Misses       1010     1009    -1

Flag	Coverage Δ
daemon	`43.99% <75.00%> (-0.03%)`	⬇️
jina	`89.34% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
jina/types/document/__init__.py	`96.53% <100.00%> (+0.15%)`	⬆️
jina/types/document/multimodal.py	`96.29% <100.00%> (ø)`
jina/peapods/runtimes/gateway/http/models.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f40d796...c35652b. Read the comment docs.

github-actions · 2021-07-29T21:46:06Z

Latency summary

Current PR yields:

😶 index QPS at 1510, delta to last 2 avg.: +1%
🐎🐎🐎🐎 query QPS at 20, delta to last 2 avg.: +13%
🐎🐎🐎🐎 dam extend QPS at 56789, delta to last 2 avg.: +1%
🐎🐎🐎🐎 avg flow time within 1.8804 seconds, delta to last 2 avg.: +0%
🐢🐢 import jina within 0.3368 seconds, delta to last 2 avg.: -8%

Breakdown

Version	Index QPS	Query QPS	DAM Extend QPS	Avg Flow Time (s)	Import Time (s)
current	1510	20	56789	1.8804	0.3368
`2.0.13`	1513	17	57412	1.8485	0.3575
`2.0.12`	1473	17	54512	1.9095	0.3759

Backed by latency-tracking. Further commits will update this comment.

hanxiao · 2021-07-30T03:06:15Z

SerializeToString = SerializePartialToString + check if message is initialised, we initialise the pb message in the first line of the method, so checking is unnecessary.

if this is true and brings more efficiency, then we should replace it globally

bwanglzu · 2021-07-30T09:32:19Z

if this is true and brings more efficiency, then we should replace it globally

it is true link, checking if it brings more efficiency

bwanglzu · 2021-07-30T10:48:15Z

on master: update content hash account for 27.71% of execution time

on feature branch, update content hash accounts for 16.06% of entire execution time:

note line 295 is calling update_content_hash

bwanglzu · 2021-07-30T10:54:08Z

match on 200 random docs avg time
on master branch : --- 13.43830680847168 seconds ---
on feature branch: --- 11.4239821434021 seconds ---

not significant performance increase, speed up 15% percent, similar as what we got in the profiling tool, 10-15 percent speed up.

bwanglzu added 2 commits July 29, 2021 21:57

perf: improve update content hash

9172e77

perf: try to speed up update content hash

ba02b3e

github-actions bot added size/S area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/type labels Jul 29, 2021

bwanglzu changed the title ~~refactor: update content hash~~ refactor: try to speed up update content hash Jul 29, 2021

hanxiao linked an issue Jul 30, 2021 that may be closed by this pull request

performance improvement on update content hash #3043

Closed

bwanglzu added 2 commits July 30, 2021 11:23

perf: do not serializer evaluations and scores

5e054d1

perf: use partial serialize

4a23657

bwanglzu self-assigned this Jul 30, 2021

bwanglzu added 2 commits July 30, 2021 11:45

perf: specify default fields to hash

999740f

perf: adopt fields to multimodal document

a37821d

github-actions bot added size/M and removed size/S labels Jul 30, 2021

perf: adopt fields to multimodal document

c35652b

bwanglzu marked this pull request as ready for review July 30, 2021 10:55

bwanglzu requested a review from a team as a code owner July 30, 2021 10:55

bwanglzu requested review from CatStark and JoanFM July 30, 2021 10:55

davidbp mentioned this pull request Jul 30, 2021

Perf match fast #3055

Closed

hanxiao merged commit ab657d9 into master Jul 31, 2021

hanxiao deleted the refactor-content-hash branch July 31, 2021 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: try to speed up update content hash #3045

refactor: try to speed up update content hash #3045

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

refactor: try to speed up update content hash #3045

refactor: try to speed up update content hash #3045

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Latency summary

Breakdown

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!