8000 Parallelise outerloop quarantined test execution by RussKie · Pull Request #8618 · dotnet/aspire · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Parallelise outerloop quarantined test execution #8618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 10, 2025

Conversation

RussKie
Copy link
Contributor
@RussKie RussKie commented Apr 8, 2025

This should address unexplainable "ran out of disk space" errors on GHA (example):
image

Use the Arcade SDK capabilities to provide a custom test runner (see in eng/QuarantinedTestRunsheetBuilder) to intercept test execution to pretend to run tests. Instead, only query test assemblies whether those contain quarantined tests by running dotnet test --list-tests. Then inspect the output, and if a test project contains quarantined tests include the test project into a runsheet.

The runsheet generation itself is a two step process.

  1. The first step is performed by each project during RunTests target execution (in QuarantinedTestRunsheetBuilder) and it results in project-specific runsheets being generated. E.g., Aspire.Cli.Tests_net8.0_x64.linux.runsheet.json.

  2. After all test projects have been inspected, all the individual runsheets are aggregated into a single runsheet, which is then used as a matrix by GHA workflow.
    This is happening in eng/AfterSolutionBuild.targets.

@Copilot Copilot AI review requested due to automatic review settings April 8, 2025 06:52
@RussKie RussKie requested review from radical and eerhardt as code owners April 8, 2025 06:52
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 8, 2025
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 4 changed files in this pull request and generated no comments.

Files not reviewed (3)
  • eng/AfterSolutionBuild.targets: Language not supported
  • eng/QuarantinedTestRunsheetBuilder/QuarantinedTestRunsheetBuilder.targets: Language not supported
  • tests/Aspire.Templates.Tests/Aspire.Templates.Tests.csproj: Language not supported
Comments suppressed due to low confidence (3)

.github/workflows/tests-outerloop.yml:131

  • Changing the condition to 'if: failure()' for the log upload step may result in logs not being captured for successful runs. Consider using 'if: always()' if maintaining complete log records is desired.
if: failure()

.github/workflows/tests-outerloop.yml:66

  • There is an inconsistency between the log directory defined here and the path used later for uploading test result artifacts. Please ensure both paths are aligned for proper artifact extraction.
$logDirectory = "${{ github.workspace }}/artifacts/TestResults"

.github/workflows/tests-outerloop.yml:32

  • The use of '/p:Restore=false' and '/p:Build=false' in the build command might lead to outdated binaries if dependencies have changed. Verify that skipping these steps is safe for all test projects involved.
./build.cmd -test /p:TestRunnerName=QuarantinedTestRunsheetBuilder /p:RunQuarantinedTests=true /bl -c Release -ci /p:CI=false /p:Restore=false /p:Build=false

@RussKie
Copy link
Contributor Author
RussKie commented Apr 8, 2025

I tested this pretty extensively in my fork. Here's one of the latest tests: https://github.com/RussKie/aspire/actions/runs/14328854911/job/40159969972.

@RussKie RussKie added area-engineering-systems infrastructure helix infra engineering repo stuff and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 8, 2025
@RussKie RussKie force-pushed the igveliko/split_outerloop branch from 282369d to 2b72077 Compare April 8, 2025 07:04
@RussKie RussKie marked this pull request as draft April 8, 2025 07:41
@radical
Copy link
Member
radical commented Apr 8, 2025

The outerloop tests are hitting the "no space" issue!

@RussKie
Copy link
Contributor Author
RussKie commented Apr 8, 2025

This will address that.

@RussKie RussKie force-pushed the igveliko/split_outerloop branch from 2b72077 to 8f7927c Compare April 8, 2025 08:36
@RussKie RussKie self-assigned this Apr 8, 2025
@RussKie RussKie marked this pull request as ready for review April 8, 2025 08:36
@radical
Copy link
Member
radical commented Apr 8, 2025

We should add a job to run at the end of all the matrix jobs to generate a combined report.

@RussKie
Copy link
Contributor Author
RussKie commented Apr 8, 2025

We should add a job to run at the end of all the matrix jobs to generate a combined report.

We can do it as a follow up after we unblock the outerloop workflow.

@RussKie RussKie force-pushed the igveliko/split_outerloop branch from 8f7927c to c3ef280 Compare April 8, 2025 08:57
8000
Copy link
Member
@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Posted some comments.

@dotnet-policy-service dotnet-policy-service bot added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 8, 2025
@radical
Copy link
Member
radical commented Apr 8, 2025

/azp run

Copy link
Azure Pipelines successfully started running 1 pipeline(s).

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 8, 2025
@RussKie RussKie force-pushed the igveliko/split_outerloop branch from c3ef280 to 9cde5b8 Compare April 8, 2025 23:25
@RussKie
Copy link
Contributor Author
RussKie commented Apr 9, 2025

Could you please try running this job on ubuntu-latest to see if the run will faster? On other build setups we see a difference.

For the outerloop the run time is not critical (unlike for PR validation).
At present the job can't run on Ubuntu as it fails because of the strange SDK-related error. This is the current and unrelated to this change issue.

[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v3.0.2+dd36e86129 (64-bit .NET 9.0.2)
You must install or update .NET to run this application.
App: /home/runner/work/aspire/aspire/artifacts/bin/Aspire.Npgsql.EntityFrameworkCore.PostgreSQL.Tests/Release/net9.0/Aspire.Npgsql.EntityFrameworkCore.PostgreSQL.Tests
Architecture: x64
Framework: 'Microsoft.NETCore.App', version '9.0.0' (x64)
.NET location: /usr/lib/dotnet
The following frameworks were found:
  8.0.14 at [/usr/lib/dotnet/shared/Microsoft.NETCore.App]

@RussKie RussKie force-pushed the igveliko/split_outerloop branch from 9cde5b8 to f9b45d5 Compare April 9, 2025 01:18
Copy link
Member
@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and will really help with getting results from all the quarantined tests! Except for the last comment that I posted, LGTM! 👍

@radical
Copy link
Member
radical commented Apr 10, 2025

The last Outerloop workflow run for this branch seems to be from 2-3 days ago. Could we please kick off a run before merging, since I think a fair bit has changed since then?

@RussKie

This comment was marked as resolved.

@Youssef1313

This comment was marked as resolved.

@radical

This comment was marked as resolved.

@RussKie
6D40

This comment was marked as resolved.

@RussKie RussKie force-pushed the igveliko/split_outerloop branch from f9b45d5 to 5a9013d Compare April 10, 2025 06:39
@RussKie
Copy link
Contributor Author
RussKie commented Apr 10, 2025

With @Youssef1313's help, it's back online 🎉

Here's the latest test run: https://github.com/RussKie/aspire/actions/runs/14375744440/job/40308019729#step:4:67

RussKie and others added 6 commits April 11, 2025 08:59
Use the Arcade SDK capabilities to provide a custom test runner
(see in eng/QuarantinedTestRunsheetBuilder) to intercept test execution
to pretend to run tests. Instead, only query test assemblies whether those
contain quarantined tests by running `dotnet test --list-tests`.
Then inspect the output, and if a test project contains quarantined tests
include the test project into a runsheet.

The runsheet generation itself is a two step process.

1. The first step is performed by each project during RunTests target
execution (in QuarantinedTestRunsheetBuilder) and it
results in project-specific runsheets being generated.
E.g., Aspire.Cli.Tests_net8.0_x64.linux.runsheet.json.

2. After all test projects have been inspected, all the individual runsheets
are aggregated into a single runsheet, which is then used as a matrix
by GHA workflow.
This is happening in eng/AfterSolutionBuild.targets.
@RussKie RussKie force-pushed the igveliko/split_outerloop branch from d0b649c to a93aa01 Compare April 10, 2025 22:59
@RussKie RussKie enabled auto-merge (squash) April 10, 2025 23:28
@RussKie RussKie merged commit 5afa640 into main Apr 10, 2025
177 of 178 checks passed
@RussKie RussKie deleted the igveliko/split_outerloop branch April 10, 2025 23:39
@RussKie
Copy link
Contributor Author
RussKie commented Apr 10, 2025

🎉

RussKie added a commit that referenced this pull request Apr 11, 2025
@RussKie RussKie mentioned this pull request Apr 11, 2025
RussKie added a commit that referenced this pull request Apr 11, 2025
RussKie added a commit that referenced this pull request Apr 11, 2025
@github-actions github-actions bot locked and limited conversation to collaborators May 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-engineering-systems infrastructure helix infra engineering repo stuff
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0