8000 refactor: try to speed up update content hash by bwanglzu · Pull Request #3045 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor: try to speed up update content hash #3045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 31, 2021
Merged

Conversation

bwanglzu
Copy link
Member
@bwanglzu bwanglzu commented Jul 29, 2021

the idea:

  1. when calling update_content_hash, we actually never change parameters, so include_fields never being used except test. We also get rid of the mutually exclusive check.
  2. maybe this is a better solution for hashing, we don’t specify exclude_fields ,but specify fields necessary for hashing, safer
  3. CopyFrom and FieldMask is expensive, we manage to only use FieldMask without creating the empty_proto
  4. SerializeToString = SerializePartialToString + check if message is initialised, we initialise the pb message in the first line of the method, so checking is unnecessary please refer to here.

@github-actions github-actions bot added size/S area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/type labels Jul 29, 2021
@bwanglzu bwanglzu changed the title refactor: update content hash refactor: try to speed up update content hash Jul 29, 2021
@codecov
Copy link
codecov bot commented Jul 29, 2021

Codecov Report

Merging #3045 (c35652b) into master (f40d796) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3045   +/-   ##
=======================================
  Coverage   89.35%   89.35%           
=======================================
  Files         141      141           
  Lines        9489     9483    -6     
=======================================
- Hits         8479     8474    -5     
+ Misses       1010     1009    -1     
Flag Coverage Δ
daemon 43.99% <75.00%> (-0.03%) ⬇️
jina 89.34% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/types/document/__init__.py 96.53% <100.00%> (+0.15%) ⬆️
jina/types/document/multimodal.py 96.29% <100.00%> (ø)
jina/peapods/runtimes/gateway/http/models.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f40d796...c35652b. Read the comment docs.

@github-actions
Copy link
github-actions bot commented Jul 29, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 1510, delta to last 2 avg.: +1%
  • 🐎🐎🐎🐎 query QPS at 20, delta to last 2 avg.: +13%
  • 🐎🐎🐎🐎 dam extend QPS at 56789, delta to last 2 avg.: +1%
  • 🐎🐎🐎🐎 avg flow time within 1.8804 seconds, delta to last 2 avg.: +0%
  • 🐢🐢 import jina within 0.3368 seconds, delta to last 2 avg.: -8%

Breakdown

Version Index QPS Query QPS DAM Extend QPS Avg Flow Time (s) Import Time (s)
current 1510 20 56789 1.8804 0.3368
2.0.13 1513 17 57412 1.8485 0.3575
2.0.12 1473 17 54512 1.9095 0.3759

Backed by latency-tracking. Further commits will update this comment.

@hanxiao
Copy link
Member
hanxiao commented Jul 30, 2021

SerializeToString = SerializePartialToString + check if message is initialised, we initialise the pb message in the first line of the method, so checking is unnecessary.

if this is true and brings more efficiency, then we should replace it globally

@hanxiao hanxiao linked an issue Jul 30, 2021 that may be closed by this pull request
@bwanglzu bwanglzu self-assigned this Jul 30, 2021
@bwanglzu
Copy link
Member Author
bwanglzu commented Jul 30, 2021

if this is true and brings more efficiency, then we should replace it globally

it is true link, checking if it brings more efficiency

@github-actions github-actions bot added size/M and removed size/S labels Jul 30, 2021
@bwanglzu
Copy link
Member Author
bwanglzu commented Jul 30, 2021

on master: update content hash account for 27.71% of execution time
0288B4E6-2286-4C92-BA39-AE78EFA83E59

on feature branch, update content hash accounts for 16.06% of entire execution time:

30D497FE-3281-4053-8D23-CA7642AAAB20

note line 295 is calling update_content_hash

@bwanglzu
Copy link
Member Author
bwanglzu commented Jul 30, 2021

match on 200 random docs avg time
on master branch : --- 13.43830680847168 seconds ---
on feature branch: --- 11.4239821434021 seconds ---

not significant performance increase, speed up 15% percent, similar as what we got in the profiling tool, 10-15 percent speed up.

@bwanglzu bwanglzu marked this pull request as ready for review July 30, 2021 10:55
@bwanglzu bwanglzu requested a review from a team as a code owner July 30, 2021 10:55
@bwanglzu bwanglzu requested review from CatStark and JoanFM July 30, 2021 10:55
@davidbp davidbp mentioned this pull request Jul 30, 2021
@hanxiao hanxiao merged commit ab657d9 into master Jul 31, 2021
@hanxiao hanxiao deleted the refactor-content-hash branch July 31, 2021 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/type size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance improvement on update content hash
2 participants
0