This article explains, in detail, the need for this action and the problem it solves.
In Summary the core features are:
- The power to finish thousands of your tests in a pre defined desired run time (say 2 mins or 5 mins).
- Smart runner orchestration that scales runners up or down based on your test load (where test load is in terms of time - not just count).
- Equal test load distribution (in terms of execution time, not just count).
In below example, we see more than three thousand tests run in just 1.5 minutes with a desired run time of 2 mins in total.
** At the time of writing this document, there are no known other solutions (paid or open source), that can do this using Playwright and GitHub.
This action covers both execution modes in Playwright. i.e.
- When
fullyParallel=true
. [ Parallel run all individual test cases on runners] - When
fullyParallel=false
. [ Parallel run all individual test files on runners]
There are 3 main steps involved:
-
Add a
pre-commit
hook file as shown here.- This will run
--only-changed
tests on local commits.
- This will run
-
Copy state-reporter.js file and put it in the root repository.
- This will create a
state.json
file that contains the mapping of test path and the time it took to run (in ms).
- This will create a
-
Update playwright.config.ts file reporters to include this reporter as shown below.
reporter: [["list"], ["html"], ["github"], ["./state-reporter.js"]],
-
Add a
post-commit
hook file as shown here.- This will automatically stage and commit the updated
state.json
file to the feature branch.
- This will automatically stage and commit the updated
-
Add a reusable workflow that can take inputs from user to run playwright commands and finish tests in x mins.
-
Add a example trigger workflow that shows how to use the reusable workflow to run desired tests. Here are a few examples of
trigger workflows
.
Step2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand etc all)
- To test the setup use one of below workflows (either in your own test repository or fork the above sandbox repository to try out).
For reference, a example-workflow.yml file is also available in the root of the RunWright GitHub project.
- If you find any issues, use the issues page to raise them.
- Do not use sharding related commands in the input playwright command to run; since this solution is meant to overcome the flaws of sharding. Using sharding again would introduce those short comings again.
- If you are using custom powerful GitHub runners, use the same custom runner type for job that evaluates "RunWright" then what you would use in subsequent job for running tests. This is important to correctly calcualte the workers (threads), which is half of cores of the runners.
Sr No | Test Description | Test Condition | Expected Result | Actual Result | Status |
---|---|---|---|---|---|
1 | Run with 0 tests found | npx playwright test --grep="non-existent-test" |
Action should handle gracefully, set RUNNER_COUNT=1, and exit successfully | β Action exits gracefully with "No tests found" message | β PASS |
2 | Test with fullyParallel=true | npx playwright test with -fully-parallel=true |
Should distribute individual tests across runners | β Looking into runwright job, we can see that tests from same file are distributed across different runners | β PASS |
3 | Test with fullyParallel=false | npx playwright test with -fully-parallel=false |
Should distribute tests files across runners | β File-level distribution works correctly. No tests from same file + project seen in different runners. With each file size of 102 seconds run time, and 2 workers, each runner getting 2 files is also accurate. Slowest runner time was 2 mins 48 seconds. Total run time in html report = 1.9m | β PASS |
4 | Run with very few tests (with < 2 min total run time) | npx playwright test --grep='Wait for 5 seconds' |
Should create 1 runner with optimal worker allocation | β Creates 1 runner, completes in less than <2 mins> | β PASS |
5 | Run with all tests (~30 mins when run sequentially) | npx playwright test |
Should create around 8 runners with optimal worker allocation | β Creates 9 runners. Slowest runner time was 2 mins 59 seconds. Total run time on html report: 2.1m | β PASS |
6 | Run ~1.5k tests in 2 minutes | [Test Command Placeholder] | Should create optimal number of runners to finish within 2 minutes | β Completes in ~2 minutes with dynamic runner allocation (as seen in the early tests image. runner id not available) | β PASS |
7 | Run ~3k tests in 2 minutes | npx playwright test |
Should scale up runners appropriately to meet time constraint | β Scales to multiple runners, finishes in ~2 minutes | β PASS |
8 | Test missing from state.json | Delete tests from file06 for 20,25 seconds for firefox in state.json. Run command npx playwright test --grep='Wait for 20 seconds' --project=firefox |
Should fail with clear error message and suggestions. No grace failure to avoid "false positive" situation from runs. | β Provides clear error with post-commit hook guidance | β PASS |
9 | Single project configuration | npx playwright test --project='chromium' |
Should work with single browser project | β Handles single project scenarios correctly | β PASS |
10 | Multiple project configuration | npx playwright test --project='chromium' --project='webkit' |
Should group tests by project within runners | β Correctly groups and distributes multi-project tests | β PASS |
11 | CPU core detection | [check any of previous runs] | Should detect available cores and calculate optimal workers | β Detects cores correctly, sets workers = cores/2 | β PASS |
12 | Dynamic matrix generation | [check any of previous runs] | Should create proper GitHub Actions matrix format | β Generates valid JSON array for matrix strategy | β PASS |
13 | Load balancing accuracy | [check any of previous runs] | Distribution should be based on execution time, not test count | β Uses actual test execution times for optimal distribution | β PASS |
14 | Runner utilization | [check any of previous runs] | All runners should finish at approximately the same time | β Even load distribution across all runners | β PASS |
15 | Browser caching | [check any of previous runs] | Should cache and reuse Playwright browsers efficiently | β Implements proper browser caching strategy | β PASS |
16 | Error handling for malformed state.json | [Test Command Placeholder] | Should provide clear error when state.json is corrupted | β Handles JSON parsing errors gracefully | β PASS |
17 | Large test suite scalability | run with .5 seconds (before putting a restriction for minimum time) | Should handle test suites with 5k+ tests efficiently | β Scales appropriately for large test suites (tested with tests of size 3k+) | β PASS |
18 | Custom runner types compatibility | [Test Command Placeholder] | Should work with custom GitHub runner configurations | β Compatible with custom runner specifications | οΌ NOT-YET-TESTED |
19 | Output format validation | [check any of previous runs] | All outputs should be valid JSON and consumable by workflows | β All outputs are properly formatted and consumable | β PASS |
20 | Invalid time input (< 1 minute) | [Test Command Placeholder] | Should handle minimum time constraint appropriately | β Validates minimum 1 minute requirement | β PASS |
For the curious minds, here are the equations that I used to device this solution.
π Definitions
Let:
- TargetRunTime = total desired time to complete the run (in minutes)
- We get this as input from the user.
- Ξ£ T_i = TestRunTimeForEachTest(i) = execution time of test i (from state.json)
- We get this value from
state.json
file that is u 7B8C pdated on a post-commit hook.
- We get this value from
- N = total number of tests to run.
- We get this by running playwright command with
--list
option.
- We get this by running playwright command with
- TotalLoad = Ξ£ T_i = total test load (in terms of test run time)
- We iterate over each runner to keep the
Ξ£ T_i <= TargetRunTime
. - Note that the total run time for each runner is affected by the number of parallel threads and is explained in more details in the next section.
- We iterate over each runner to keep the
- Cores = number of cores per runner.
- For Linux runners,
NUM_CORES=$(nproc)
- For Linux runners,
- Threads (per runner).
- Recommended Threads per runner is half of cores; i.e. (Threads = Cores / 2).
- Runners = Total number of required runners.
- We get this from equation given in the next section.
π Equation
Since we know every individual TestRunTimeForEachTest(i) from state.json, the total workload is:
Total parallel capacity available on the runners:
Equating Load and Capacity:
Solving for Runners:
- It could be a good idea to generate the
state.json
file from scratch every few days or weeks to avoid having redundant test path and names.
- Add option for when a user doesn't want to limit by time but want to limit the maximum runners to use.