10000 roles: added dep and var cache to improve performance by monsdar · Pull Request #85249 · ansible/ansible · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

roles: added dep and var cache to improve performance #85249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 52 commits into
base: devel
Choose a base branch
from

Conversation

monsdar
Copy link
Contributor
@monsdar monsdar commented Jun 3, 2025
SUMMARY

These changes introduce updating the plays role_cache after calculating the role dependencies, default vars and role vars to avoid those being recalculated during each Task. This PR also contains a fix to not add duplicate dependencies to the roles dep-list.

These fixes improve overhead in deployment where a lot of role dependencies are involved, see related issue.

Fixes #85206

ISSUE TYPE
  • Feature Pull Request

@ansibot ansibot added feature This issue/PR relates to a feature request. needs_triage Needs a first human triage before being processed. has_issue labels Jun 3, 2025
@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Jun 3, 2025
@ansibot

This comment was marked as outdated.

@monsdar
Copy link
Contributor Author
monsdar commented Jun 3, 2025

I need to put down work on this for the time being due to something else coming up. I've planned to continue working on this by the end of June latest.

def get_all_dependencies(self):
"""
Returns a list of all deps, built recursively from all child dependencies,
in the proper order in which they should be executed or evaluated.
"""
if self._all_dependencies is None:

self._all_dependencies = []
self._all_dependencies = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while sets are more efficient, they are also 'unordered' so this will break expected variable precedence and will change the display of 'role name' variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation to use a list again, but check whether the dep is already in it. Would this be alright regarding var precedence? It seemed to work ok in my test run at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, I need to add precedence test for this case, but either this works or you need to reverse the operation to preserve it: add dupe, but remove previous instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to remove previous instances. Guess this is reflecting the current behavior in the correct way. Having a test would surely help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The 8000 reason will be displayed to describe this comment to others. Learn more.

I extended the role_dep_chain test to include different types of variables, including overwriting them in parent roles. Please check 35e35eb and let me know if this fits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a minor copy/paste error when pasting the test into my machine connected to GitHub. Should be fixed now, let's see whether the tests now pass.

@bcoca bcoca removed the needs_triage Needs a first human triage before being processed. label Jun 3, 2025
@monsdar
Copy link
Contributor Author
monsdar commented Jun 3, 2025

With the current implementation I reduced the runtime of one of our deployments from 68min to 16min. This deployment in particular uses one of our high-level roles but limits what is actually been executed with tags.

@ansibot

This comment was marked as outdated.

@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Jun 4, 2025
@ansibot ansibot removed the ci_verified Changes made in this PR are causing tests to fail. label Jun 5, 2025
@ansibot
Copy link
Contributor
ansibot commented Jun 5, 2025

The test ansible-test sanity --test mypy [explain] failed with the error:

Command "/root/.ansible/test/venv/sanity.mypy/3.13/cfef9f44/bin/python /root/ansible/test/sanity/code-smell/mypy.py" returned exit status 1.
>>> Standard Error
Traceback (most recent call last):
  File "/root/ansible/test/sanity/code-smell/mypy.py", line 235, in <module>
    main()
    ~~~~^^
  File "/root/ansible/test/sanity/code-smell/mypy.py", line 49, in main
    unfiltered_messages.extend(test_context(python_version, context, paths))
                               ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/ansible/test/sanity/code-smell/mypy.py", line 163, in test_context
    raise Exception(f'{stdout=} {stderr=}')
Exception: stdout='lib/ansible/playbook/role/__init__.py:133:9: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]\n' stderr=''

click here for bot help

@webknjaz
Copy link
Member
webknjaz commented Jun 9, 2025

NOTE: This PR is still work in progress.

@monsdar if it's WIP, you could mark it as draft. Also, some people put WIP or DNM into the PR title to indicate that it's not ready to be reviewed or merged.

Copy link
Member
@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include a few cosmetic improvements, while on it?

Comment on lines 15 to 22
- assert:
that:
- (inherit_var is defined) and (inherit_var == 'overwritten_as_well')
- (inherit_default is defined) and (inherit_default == 'overwritten')
- (inherit_playbook is defined) and (inherit_playbook == 'playbook')
- (inherit_include is defined) and (inherit_include == 'include')
- (include_default_var is defined) and (include_default_var == 'overwritten_again')
when: (overwritten is defined) and overwritten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the two opposite assertion variants needed in the same flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st flow tests the variables when they still have their initial values, the 2nd flow tests what happens when values get overwritten from a location with higher precedence. It's not a full precedence test, but it helps to check whether this works as intended when running with mixed import/include roles.

In the end this tests different cases for import/include, therefore I chose not to add a separate test for it. I can do so if requested though.

@bcoca bcoca changed the title 85206: Added dep and var cache to improve performance roles: added dep and var cache to improve performance Jun 24, 2025
Copy link
Member
@sivel sivel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to complete a full review of this right now, but I see a few things that concern me, such as a module level global.

I'll need to do a much more in depth review after 2.19 has released, and probably rope in a few others as well.

@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Jun 24, 2025
@bcoca
Copy link
Member
bcoca commented Jun 24, 2025

role cache should be at play level

mattclay pushed a commit that referenced this pull request Jun 24, 2025
PR #85346

This patch adds type-checking to the `Role.__init__()` method.

Context: #85249 (comment)

Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko <webknjaz@redhat.com>
Co-authored-by: Nils Brinkmann <nils.brinkmann@rheinmetall.com>
(cherry picked from commit 578d25f)
monsdar and others added 8 commits June 25, 2025 08:49
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
@ansibot ansibot added the stale_review Updates were made after the last review and the last review is more than 7 days old. label Jun 25, 2025
…defaults/main.yml

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
@ansibot
Copy link
Contributor
ansibot commented Jun 25, 2025

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

lib/ansible/playbook/role/__init__.py:504:1: W293: blank line contains whitespace
lib/ansible/playbook/role/__init__.py:548:1: W293: blank line contains whitespace

click here for bot help

@monsdar
Copy link
Contributor Author
monsdar commented Jun 25, 2025

role cache should be at play level

I see that there's already a role_cache at play level. Sorry, I didn't notice when I made myself familiar with the codebase first. I think we should utilize this instead of adding a global cache.

At first glance it looks like the role in the cache isn't updated after initializing it, so any queried dependencies, default and role vars aren't added to the cached role (and therefore are read over and over again).

I'll check how this could be integrated. Switching the PR back to draft status again.

@monsdar monsdar marked this pull request as draft June 25, 2025 09:24
@ansibot ansibot removed the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Jun 25, 2025
@monsdar
Copy link
Contributor Author
monsdar commented Jun 26, 2025

Seems it's enough to store the deps and vars as cached_properties, rest is done by the already existing cache. The fix is turning out to be pretty simple the more I look into it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This issue/PR relates to a feature request. has_issue stale_review Updates were made after the last review and the last review is more than 7 days old.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ansible execution slows down with a large role dependency tree
6 participants
0