8000 fix: proper context management of handlers by JoanFM · Pull Request #2322 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: proper context management of handlers #2322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 23, 2021
Merged

fix: proper context management of handlers #2322

merged 13 commits into from
Apr 23, 2021

Conversation

JoanFM
Copy link
Contributor
@JoanFM JoanFM commented Apr 20, 2021

Try to have a better and more robust implementation of BinaryPbIndexer and the access to the Writing and ReadHandlers.

I believe this can be a cause of #2320

@JoanFM JoanFM requested a review from a team as a code owner April 20, 2021 16:58
@jina-bot jina-bot added size/S area/core This issue/PR affects the core codebase component/executor executor/indexer labels Apr 20, 2021
@codecov
Copy link
codecov bot commented Apr 20, 2021

Codecov Report

Merging #2322 (1722714) into master (82fb810) will increase coverage by 1.05%.
The diff coverage is 97.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2322      +/-   ##
==========================================
+ Coverage   89.77%   90.83%   +1.05%     
==========================================
  Files         222      222              
  Lines       11958    11976      +18     
==========================================
+ Hits        10735    10878     +143     
+ Misses       1223     1098     -125     
Flag Coverage Δ
daemon 50.90% <11.42%> (-0.06%) ⬇️
jina 90.99% <97.14%> (+1.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/executors/indexers/keyvalue.py 98.12% <96.82%> (-0.44%) ⬇️
jina/executors/indexers/dbms/keyvalue.py 97.43% <100.00%> (+0.13%) ⬆️
jina/executors/indexers/query/keyvalue.py 93.75% <100.00%> (-0.37%) ⬇️
jina/executors/crafters/__init__.py 100.00% <0.00%> (ø)
jina/executors/encoders/__init__.py 100.00% <0.00%> (ø)
jina/executors/segmenters/__init__.py 100.00% <0.00%> (ø)
jina/executors/encoders/numeric/__init__.py 100.00% <0.00%> (ø)
jina/executors/__init__.py 92.74% <0.00%> (+0.40%) ⬆️
jina/peapods/runtimes/container/__init__.py 85.71% <0.00%> (+0.89%) ⬆️
jina/flow/base.py 92.53% <0.00%> (+0.99%) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82fb810...1722714. Read the comment docs.

@github-actions
Copy link
github-actions bot commented Apr 20, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 940, delta to last 3 avg.: +0%
  • 😶 query QPS at 13, delta to last 3 avg.: -4%

Breakdown

Version Index QPS Query QPS
current 940 13
1.1.9 940 13

Backed by latency-tracking. Further commits will update this comment.

@JoanFM JoanFM force-pushed the ctxt-managers branch 2 times, most recently from 294155f to e15f54e Compare April 21, 2021 07:30
@jina-bot jina-bot added size/M and removed size/S labels Apr 21, 2021
@jina-bot jina-bot added the area/testing This issue/PR affects testing label Apr 21, 2021
@cristianmtr
Copy link
Contributor

I am not sure now about the delete_on_dump and the general closing of the executor

Can you perhaps try to apply the fix without the refactoring first? See if we can isolate it

@JoanFM
Copy link
Contributor Author
JoanFM commented Apr 22, 2021

I am not sure now about the delete_on_dump and the general closing of the executor

Can you perhaps try to apply the fix without the refactoring first? See if we can isolate it

I am not having a fix, It is just a feeling that the structure is not robust and may cause it

@JoanFM JoanFM changed the title refactor: proper context management of handlers fix: proper context management of handlers Apr 22, 2021
@JoanFM
Copy link
Contributor Author
JoanFM commented Apr 22, 2021

Fixes #2320

@JoanFM JoanFM requested review from nan-wang and cristianmtr April 22, 2021 15:39
@JoanFM
Copy link
Contributor Author
JoanFM commented Apr 22, 2021

Is it possible to add a test to reproduce the issue? It seems this fix will slow down the adding by frequent closing and opening the files

@nan-wang can u review again, I made sure it only flushes at every run as a context managed instance. Which I also think that may help in providing ACIDity

@JoanFM JoanFM closed this Apr 22, 2021
@JoanFM JoanFM reopened this Apr 22, 2021
Copy link
Contributor
@cristianmtr cristianmtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run a benchmark? When I tried to do it with ctx mger, it slowed it down a bit. I think we should know. Use some of the crud tests or the benchmark in test_dump_dbms

@JoanFM
Copy link
Contributor Author
JoanFM commented Apr 23, 2021

can you run a benchmark? When I tried to do it with ctx mger, it slowed it down a bit. I think we should know. Use some of the crud tests or the benchmark in test_dump_dbms

Run on master:

### baseline - query time with 1 on 100000 docs takes 2 seconds (2.67s)
### baseline - indexing: 100000 docs takes 11 seconds (11.65s)
### indexing 100000 docs takes 19 seconds (19.82s)
### dumping 100000 docs takes 5 seconds (5.33s)
### reloading 100000 takes 3 seconds (3.75s)

on this branch:

### baseline - query time with 1 on 100000 docs takes 2 seconds (2.66s)
### baseline - indexing: 100000 docs takes 12 seconds (12.13s)
### indexing 100000 docs takes 19 seconds (19.61s)
### dumping 100000 docs takes 5 seconds (5.16s)
### reloading 100000 takes 4 seconds (4.11s)

I am not seeing any big conclusion.

@JoanFM JoanFM requested a review from cristianmtr April 23, 2021 07:55
Comment on lines +28 to +29
self.body = open(self.path, self.mode)
self.header = open(self.path + '.head', self.mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we opening the files in __init__? Shouldn't we only open them in __enter__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the concerns from @nan-wang , we avoid opening and closing at everybatch which could damage performance. But we still check the need for opening

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So we're kinda using "half" of what a ctx mger should do (open and close within in the context, right)?

return self

def __exit__(self, exc_type, exc_val, exc_tb):
self.flush()
Copy link
A3E2
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we close() on exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we wanted to avoid the need to reopen all the time, but at least we assume the data is flushed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing the file after every 'add' will be quite expensive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, got it. That's what I tried in the previous approach. And we're kinda doing "half" of the ctx management. Makes sense

Copy link
Contributor
@cristianmtr cristianmtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions

Copy link
Member
@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JoanFM JoanFM requested a review from cristianmtr April 23, 2021 08:31
@JoanFM JoanFM merged commit 4b3c180 into master Apr 23, 2021
@JoanFM JoanFM deleted the ctxt-managers branch April 23, 2021 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/executor executor/indexer size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0