Extend tests with clang-10 #484

Spacetown · 2021-05-02T19:33:13Z

Extend test suite with clang-10 inside docker container and add reference data for clang-10 based on gcc-8.
As far as I can see, gcc adds a branch to closing braces which is missing in clang-10.

codecov · 2021-05-02T19:36:31Z

Codecov Report

Merging #484 (63fc704) into master (e0b7afe) will decrease coverage by 0.06%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
- Coverage   95.50%   95.44%   -0.07%     
==========================================
  Files          20       20              
  Lines        2604     2610       +6     
  Branches      441      443       +2     
==========================================
+ Hits         2487     2491       +4     
  Misses         55       55              
- Partials       62       64       +2

Flag	Coverage Δ
ubuntu-18.04	`94.90% <66.66%> (-0.07%)`	⬇️
windows-2019	`94.90% <66.66%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
gcovr/gcov.py	`87.06% <66.66%> (-0.58%)`	⬇️
gcovr/tests/test_gcovr.py	`98.00% <0.00%> (+0.64%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0b7afe...63fc704. Read the comment docs.

chrta

Just one question

Makefile

Spacetown · 2021-08-04T20:25:56Z

Closes #134

Option is documented but it isn't accepted. Raise RuntimeError if unknown option is used for gcov.

chrta

In general this looks good to me. Just two minor questions/remarks.

gcovr/gcov.py

latk

Testing with Clang would be really good! Since everything seems to work, this could be merged as is.

But it seems this kind of work touches on lots of tangential concerns. I've added some inline comments below. It could also make sense to mention the Clang tests in installation.rst.

I have only looked at the code, not at the generated reference files. So I'm assuming/hoping that the coverage data is reasonable.

Makefile

doc/examples/example_cmake.sh

gcovr/gcov.py

gcovr/tests/html-source-encoding-cp1252/Makefile

gcovr/tests/simple1/Makefile

chrta

lgtm

Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the Nautilus parser test. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will 6D47 be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.

Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the **Nautilus parser test**. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. In the **excl-branch test**, gcovr will encounter gcov input as the following when using GCC-8 or later: #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... call 0 never executed call 1 never executed ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ The old parser associated the `function` annotations with line 11. This was clearly incorrect. The test reference was updated to associate the destructor with line 9. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.

…ic links.

Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the **Nautilus parser test**. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. In the **excl-branch test**, gcovr will encounter gcov input as the following when using GCC-8 or later: #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... call 0 never executed call 1 never executed ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ The old parser associated the `function` annotations with line 11. This was clearly incorrect. The test reference was updated to associate the destructor with line 9. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.

Spacetown requested a review from latk May 2, 2021 19:33

Spacetown added this to the 4.3 milestone May 2, 2021

Spacetown mentioned this pull request May 5, 2021

Line numbers reported incorrectly #345

Closed

Spacetown force-pushed the add_clang branch from 9d5e487 to 57f07a6 Compare May 10, 2021 20:19

Spacetown force-pushed the add_clang branch from 57f07a6 to 3674e0d Compare May 20, 2021 04:14

Spacetown modified the milestones: 4.3, UpcomingRelease May 20, 2021

Spacetown force-pushed the add_clang branch from 3674e0d to ef52991 Compare June 8, 2021 20:33

chrta mentioned this pull request Jun 9, 2021

Prepare gcovr 5.0 release #424

Closed

6 tasks

Spacetown force-pushed the add_clang branch 4 times, most recently from 764e0fa to f87ec45 Compare June 14, 2021 19:52

Spacetown added the needs review label Jun 14, 2021

Spacetown force-pushed the add_clang branch from 5f064e9 to 2bd24d0 Compare June 14, 2021 20:31

Spacetown force-pushed the add_clang branch 4 times, most recently from 315cac5 to f38cb99 Compare July 27, 2021 19:35

chrta reviewed Aug 4, 2021

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Spacetown added 7 commits August 8, 2021 22:02

Add test for clang.

745ce9e

Add reference for clang-10

9341c91

Update changelog.

c4252f0

Add GCC version to artifact.

3b06994

Update reference (don't use gcc-8 as reference).

c400828

Ignore coverage in test_args for optional windows test data.

1efa43b

Replace local path with make variables.

98b7e5a

Spacetown force-pushed the add_clang branch from 05fe791 to 98b7e5a Compare August 8, 2021 20:03

Don't use option --demagled-names for llvm-cov.

f88d05b

Option is documented but it isn't accepted. Raise RuntimeError if unknown option is used for gcov.

Spacetown requested a review from chrta August 26, 2021 19:00

chrta requested changes Aug 31, 2021

View reviewed changes

gcovr/gcov.py Outdated Show resolved Hide resolved

gcovr/gcov.py Outdated Show resolved Hide resolved

Spacetown added 2 commits September 5, 2021 22:35

Add review remarks.

7268e9b

Update reference.

9095b5d

Spacetown force-pushed the add_clang branch from e296d8c to 9095b5d Compare September 6, 2021 15:42

Remove function run_gcov_and_process_files.

cb8ba65

Spacetown force-pushed the add_clang branch from 641a2d4 to cb8ba65 Compare September 6, 2021 15:56

latk approved these changes Sep 6, 2021

View reviewed changes

chrta approved these changes Sep 7, 2021

View reviewed changes

latk mentioned this pull request Sep 10, 2021

new parser for the .gcov format #512

Merged

Spacetown added 3 commits September 13, 2021 22:05

Add review remarks: Undo change building of command line.

714a156

Add review remarks: Skip test for clang-10

703f35a

Add review remarks: Add threshold to reference data.

409a239

Spacetown force-pushed the add_clang branch from 6375d8a to 4e1849e Compare September 13, 2021 20:05

Add review remarks: Configure pytest to skip directory loop by symbol…

63fc704

…ic links.

Spacetown force-pushed the add_clang branch from 4e1849e to 63fc704 Compare September 13, 2021 20:18

Spacetown merged commit aa7c0a1 into gcovr:master Sep 13, 2021

Spacetown deleted the add_clang branch September 13, 2021 20:30

Spacetown mentioned this pull request Sep 15, 2021

Add tests for llvm #134

Closed

Spacetown added QA related to testing, build infrastructure, etc Internal change Internal change not visible to user and removed needs review labels Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend tests with clang-10 #484

Extend tests with clang-10 #484

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Extend tests with clang-10 #484

Extend tests with clang-10 #484

Uh oh!

Conversation

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!