10000 copy: Reduce calls to stat() by 50% in chown_recursive by moreati · Pull Request #75741 · ansible/ansible · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

copy: Reduce calls to stat() by 50% in chown_recursive #75741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related e 10000 mails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

moreati
Copy link
Contributor
@moreati moreati commented Sep 21, 2021
SUMMARY

This reduces the number of times lstat() or chown() are called by the copy module. Reductions come from

  • combining loops for owner and group, thus not looping twice
  • not looping over the list of directories returned by os.walk(), each directory is already handled when os.walk() descends into it
ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

copy

ADDITIONAL INFORMATION

Possible further reductions could be achieved by

I propose to do these seperately, if there is interest.

@ansibot ansibot added affects_2.12 core_review In order to be merged, this PR must follow the core review workflow. feature This issue/PR relates to a feature request. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Sep 21, 2021
moreati added a commit to moreati/ansible that referenced this pull request Sep 21, 2021
…rent

This merges implementations of `AnsibleModule.set_owner_if_different()`
& `AnsibleModule.set_group_if_different()` into a single method. The
benefits are
- reduction in code duplication
- makes possible halving the number of calls to `os.lchown()` when the
  copy module sets both owner and group.

There should be no outward change in behavior. The two original methods
become thin wrappers around the new one.

This PR is related to ansible#75741. If both are merged, then the second
benefit above can be realised - by replacing calls in
`copy.chown_recursive()`.

Open questions:

- Should this new method be public (without a leading underscore)?
- Is there a better name?
moreati added a commit to moreati/ansible that referenced this pull request Sep 21, 2021
…rent

This merges implementations of `AnsibleModule.set_owner_if_different()`
& `AnsibleModule.set_group_if_different()` into a single method. The
benefits are
- reduction in code duplication
- makes possible halving the number of calls to `os.lchown()` when the
  copy module sets both owner and group.

There should be no outward change in behavior. The two original methods
become thin wrappers around the new one.

This PR is related to ansible#75741. If both are merged, then the second
benefit above can be realised - by replacing calls in
`copy.chown_recursive()`.

Open questions:

- Should this new method be public (without a leading underscore)?
- Is there a better name?
@s-hertel s-hertel removed the needs_triage Needs a first human triage before being processed. label Sep 21, 2021
@moreati moreati changed the title copy: Reduce calls to stat() in chown_recursive copy: Reduce calls to lstat() in chown_recursive Sep 22, 2021
8000
@moreati
Copy link
Contributor Author
moreati commented Sep 22, 2021

Some profiling of an extreme case, calls to lstat are cut by 50%

Setup

Create a deep directory tree of approx 18 000 nodes

# rm -rf /tmp/{src,dst} && mkdir -p /tmp/src/{a..z}/{a..z}/{a..z}
# ls -l /tmp/src | head -n2
drwxr-xr-x 28 root root 4096 Sep 22 21:24 a
drwxr-xr-x 28 root root 4096 Sep 22 21:24 b
# find /tmp/src/ | wc -l
18279

Without

# rm -rf /tmp/{src,dst} && mkdir -p /tmp/src/{a..z}/{a..z}/{a..z}
# python3 -m cProfile -s calls -m ansible.modules.copy<<<'{"ANSIBLE_MODULE_ARGS": {"src":"/tmp/src", "dest":"/tmp/dst", "owner":"alex", "group":"alex", "remote_src":true}}' | egrep '(os|posix)\.(chown|lchown|lstat|stat)'
   219396    0.517    0.000    0.517    0.000 {built-in method posix.lstat}
    92156    0.189    0.000    0.189    0.000 {built-in method posix.stat}
    36560    0.143    0.000    0.143    0.000 {built-in method posix.lchown}

With

# rm -rf /tmp/{src,dst} && mkdir -p /tmp/src/{a..z}/{a..z}/{a..z}
# python3 -m cProfile -s calls -m ansible.modules.copy<<<'{"ANSIBLE_MODULE_ARGS": {"src":"/tmp/src", "dest":"/tmp/dst", "owner":"alex", "group":"alex", "remote_src":true}}' | egrep '(os|posix)\.(chown|lchown|lstat|stat)'
   109806    0.261    0.000    0.261    0.000 {built-in method posix.lstat}
    92158    0.191    0.000    0.191    0.000 {built-in method posix.stat}
    36560    0.137    0.000    0.137    0.000 {built-in method posix.lchown}

without.cprof.gz
with.cprof.gz

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Sep 30, 2021
@moreati moreati closed this Nov 12, 2021
@moreati moreati reopened this Nov 12, 2021
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Nov 12, 2021
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Nov 24, 2021
@mattclay mattclay added the P3 Priority 3 - Approved, No Time Limitation label Jul 13, 2022
@moreati moreati force-pushed the copy-reduce-stats branch from bd42af2 to 1e0134b Compare July 29, 2022 18:08
@moreati moreati changed the title copy: Reduce calls to lstat() in chown_recursive copy: copy: Reduce calls to stat() by 50% in chown_recursive Jul 29, 2022
@moreati
Copy link
Contributor Author
moreati commented Jul 29, 2022

Rebased on devel, and included the headline number in the summary

@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 29, 2022
moreati added a commit to moreati/ansible that referenced this pull request Jul 29, 2022
…rent

This merges implementations of `AnsibleModule.set_owner_if_different()`
& `AnsibleModule.set_group_if_different()` into a single method. The
benefits are
- reduction in code duplication
- makes possible halving the number of calls to `os.lchown()` when the
  copy module sets both owner and group.

There should be no outward change in behavior. The two original methods
become thin wrappers around the new one.

This PR is related to ansible#75741. If both are merged, then the second
benefit above can be realised - by replacing calls in
`copy.chown_recursive()`.
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Aug 6, 2022
moreati added a commit to moreati/ansible that referenced this pull request Jul 8, 2023
…rent

This merges implementations of `AnsibleModule.set_owner_if_different()`
& `AnsibleModule.set_group_if_different()` into a single method. The
benefits are
- reduction in code duplication
- makes possible halving the number of calls to `os.lchown()` when the
  copy module sets both owner and group.

There should be no outward change in behavior. The two original methods
become thin wrappers around the new one.

This PR is related to ansible#75741. If both are merged, then the second
benefit above can be realised - by replacing calls in
`copy.chown_recursive()`.
@moreati moreati changed the title copy: copy: Reduce calls to stat() by 50% in chown_recursive copy: Reduce calls to stat() by 50% in chown_recursive Jul 8, 2023
@moreati moreati force-pushed the copy-reduce-stats branch from 1e0134b to 13e8188 Compare July 8, 2023 00:11
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 8, 2023
@ansibot
Copy link
Contributor
ansibot commented Jul 8, 2023

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

lib/ansible/modules/copy.py:370:21: disallowed-name: Disallowed name "_"
lib/ansible/modules/copy.py:383:17: disallowed-name: Disallowed name "_"

click here for bot help

@ansibot ansibot added ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Jul 8, 2023
@moreati moreati force-pushed the copy-reduce-stats branch from 13e8188 to 60cbd7f Compare July 8, 2023 00:44
@ansibot ansibot removed ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Jul 8, 2023
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 18, 2023
@webknjaz
Copy link
Member

/azp run

Copy link
Azure Pipelines successfully started running 1 pipeline(s).

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Oct 25, 2024
@webknjaz
Copy link
Member

@moreati would you mind rebasing?

@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Oct 29, 2024
@moreati
Copy link
Contributor Author
moreati commented Oct 29, 2024

@moreati would you mind rebasing?

on it

moreati added a commit to moreati/ansible that referenced this pull request Oct 29, 2024
…rent

This merges implementations of `AnsibleModule.set_owner_if_different()`
& `AnsibleModule.set_group_if_different()` into a single method. The
benefits are
- reduction in code duplication
- makes possible halving the number of calls to `os.lchown()` when the
  copy module sets both owner and group.

There should be no outward change in behavior. The two original methods
become thin wrappers around the new one.

This PR is related to ansible#75741. If both are merged, then the second
benefit above can be realised - by replacing calls in
`copy.chown_recursive()`.

Open questions:

- Should this new method be public (without a leading underscore)?
- Is there a better name?
< 8000 /div>
This reduces the number of times `stat()` or `chown()` are called by the
copy module. Reductions come from

- combining loops for owner and group, thus not looping twice
- not looping over the list of directories returned by `os.walk()`, each
  directory is already handled when `os.walk()` descends into it

Possible further reductions could be achieved by
- exiting early from the check_mode loop
- Replacing `m.set_owner_if_different()` & `m.set_group_if_different()`
  calls with a single method.

I propose to do these separately, if there is interest.
@moreati
Copy link
Contributor Author
moreati commented Oct 29, 2024

Notes to self:

  • os.walk() is still the lowest common denominator as of upcoming Ansible 11 (ansible-core 2.18, Python 3.11-3.13 on controller, 3.8-3.13 on targets) . pathlib.Path.walk() (using scandir(3)) was introduced in Python 3.12

@ansibot ansibot removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. ci_verified Changes made in this PR are causing tests to fail. labels Oct 29, 2024
@bcoca
Copy link
Member
bcoca commented Oct 29, 2024

check the action plugin, it has it's own 'recursive' code and might need mirroring of these changes

@moreati
Copy link
Contributor Author
moreati commented Oct 29, 2024

ok, I'll try to take a look tomorrow.

@moreati
Copy link
Contributor Author
moreati commented Oct 29, 2024

check the action plugin, it has it's own 'recursive' code and might need mirroring of these changes

I don't think ansible.plugins.action.copy._walk_dirs() requires any changes. It's possible there are speedup opportunities, but they would require more in-depth investigation due to extra symlink handling that function does. That could be tackled independantly of this PR. There's no other directory walking code in that Python module that I noticed.

The other major directory walking code I'm aware of are the ansible.module_utils.basic.set_*_if_different() functions. PR #75742 combines the owner and group variants so that chown_recursive() can be sped up even more. If you prefer, then I could combine all the speed-ups into a single PR.

@bcoca
Copy link
Member
bcoca commented Oct 29, 2024

no, separate is fine, just wanted to make sure they kept as in sync as possible ... hate the duplicity

@ansible ansible deleted a comment from ansibot Oct 31, 2024
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Nov 7, 2024
@s-hertel
Copy link
Contributor

FYI, I have an alternative PR that simplifies this bit of code further #84313.

@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.12 core_review In order to be merged, this PR must follow the core review workflow. feature This issue/PR relates to a feature request. module This issue/PR relates to a module. needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html P3 Priority 3 - Approved, No Time Limitation stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0