10000 feat: k8s failure testing by jacobowitz · Pull Request #4743 · jina-ai/serve · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: k8s failure testing #4743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 6, 2022
Merged

feat: k8s failure testing #4743

merged 17 commits into from
May 6, 2022

Conversation

jacobowitz
Copy link
Contributor
@jacobowitz jacobowitz commented May 3, 2022

This adds a more comprehensive test to our K8s based test suite. I separated it so that it runs as its own step and does not prolong the k8s tests even more. This copies a lot of code in ci/cd unfortunately.

The test covers:

  • Scaling up/down deployments
  • Killing a Pod
  • Restarting a Pod
  • Load testing
  • Random grpc requests failing (5% probability of random grpc request failing)

Besides the test, I also added retries for failing gRPC requests which are outside our control (non Executor errors etcs).

  • Separate this test from the other K8s tests if possible/necessary
  • Install linkerd SMI extention to test cluster
  • add to CD
  • fix all issues in CI

Closes #4629

@github-actions github-actions bot added size/M area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing labels May 3, 2022
Copy link
Contributor
@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have an specific file for this specific test

@github-actions
Copy link
github-actions bot commented May 3, 2022

Latency summary

Current PR yields:

  • 🐎🐎🐎🐎 index QPS at 1374, delta to last 2 avg.: +15%
  • 🐎🐎🐎🐎 query QPS at 78, delta to last 2 avg.: +16%
  • 🐢🐢 avg flow time within 1.6546 seconds, delta to last 2 avg.: -8%
  • 🐢🐢 import jina within 0.4664 seconds, delta to last 2 avg.: -8%

Breakdown

Version Index QPS Query QPS Avg Flow Time (s) Import Time (s)
current 1374 78 1.6546 0.4664
3.3.25 1048 53 1.9937 0.5858
3.3.24 1333 80 1.6309 0.4363

Backed by latency-tracking. Further commits will update this comment.

@codecov
Copy link
codecov bot commented May 3, 2022

Codecov Report

Merging #4743 (b93d630) into master (73718c9) will increase coverage by 0.20%.
The diff coverage is 84.75%.

@@            Coverage Diff             @@
##           master    #4743      +/-   ##
==========================================
+ Coverage   87.46%   87.66%   +0.20%     
==========================================
  Files         117      119       +2     
  Lines        8526     8668     +142     
==========================================
+ Hits         7457     7599     +142     
  Misses       1069     1069              
Flag Coverage Δ
jina 87.66% <84.75%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/__init__.py 65.88% <0.00%> (-1.59%) ⬇️
jina/enums.py 88.07% <ø> (-0.32%) ⬇️
jina/helper.py 81.72% <ø> (-1.79%) ⬇️
jina/jaml/__init__.py 94.36% <ø> (-0.03%) ⬇️
jina/orchestrate/deployments/config/helper.py 98.24% <ø> (+1.75%) ⬆️
jina/parsers/__init__.py 97.77% <ø> (-0.03%) ⬇️
jina/serve/runtimes/monitoring.py 100.00% <ø> (ø)
jina/hubble/hubio.py 86.02% <44.44%> (-1.34%) ⬇️
jina/orchestrate/flow/base.py 89.57% <72.72%> (-0.47%) ⬇️
jina/serve/runtimes/gateway/__init__.py 89.65% <72.72%> (-10.35%) ⬇️
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06fc848...b93d630. Read the comment docs.

@github-actions github-actions bot added area/cicd This issue/PR affects the cicd pipeline area/housekeeping This issue/PR is housekeeping labels May 4, 2022
@github-actions github-actions bot added the size/L label May 4, 2022
@jacobowitz jacobowitz marked this pull request as ready for review May 5, 2022 12:00
@jacobowitz jacobowitz requested a review from JoanFM May 5, 2022 14:04
Copy link
Contributor
@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the test required

@jacobowitz jacobowitz merged commit 01dc9da into master May 6, 2022
@jacobowitz jacobowitz deleted the feat-k8s-chaos-testing branch May 6, 2022 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cicd This issue/PR affects the cicd pipeline area/core This issue/PR affects the core codebase area/housekeeping This issue/PR is housekeeping area/testing This issue/PR affects testing size/L size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

K8s system test with chaos failures
2 participants
0