-
Notifications
You must be signed in to change notification settings - Fork 283
Extend tests with clang-10 #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #484 +/- ##
==========================================
- Coverage 95.50% 95.44% -0.07%
==========================================
Files 20 20
Lines 2604 2610 +6
Branches 441 443 +2
==========================================
+ Hits 2487 2491 +4
Misses 55 55
- Partials 62 64 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
764e0fa
to
f87ec45
Compare
315cac5
to
f38cb99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question
Closes #134 |
Option is documented but it isn't accepted. Raise RuntimeError if unknown option is used for gcov.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good to me. Just two minor questions/remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing with Clang would be really good! Since everything seems to work, this could be merged as is.
But it seems this kind of work touches on lots of tangential concerns. I've added some inline comments below. It could also make sense to mention the Clang tests in installation.rst
.
I have only looked at the code, not at the generated reference files. So I'm assuming/hoping that the coverage data is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the Nautilus parser test. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will 6D47 be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.
Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the **Nautilus parser test**. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. In the **excl-branch test**, gcovr will encounter gcov input as the following when using GCC-8 or later: #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... call 0 never executed call 1 never executed ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ The old parser associated the `function` annotations with line 11. This was clearly incorrect. The test reference was updated to associate the destructor with line 9. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.
Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the **Nautilus parser test**. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. In the **excl-branch test**, gcovr will encounter gcov input as the following when using GCC-8 or later: #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... call 0 never executed call 1 never executed ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ The old parser associated the `function` annotations with line 11. This was clearly incorrect. The test reference was updated to associate the destructor with line 9. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.
6375d8a
to
4e1849e
Compare
4e1849e
to
63fc704
Compare
Summary: This commit introduces a completely rewritten parser for gcov's human-readable report format. The implementation pays closer attention to the upstream file format, e.g. with respect to the exact number formatting. Compared to the previous parser, the data flow has been greatly clarified, making this parser more obviously correct. The new parser will make it easier to react to future changes in the format, and to make use of improvements from newer GCC versions – such as the `Working directory` metadata. Background ---------- The old parser had some problems that could be addressed by this rewrite. * The old parser was very complex due to mixing low level parsing and high-level decisions about coverage exclusion. Fixed in the new parser: now, staging is split into multiple phases that clearly separate concerns. The low-level parsing phase produces a data model that is safe to use for the other phases. * The old parser used an object-oriented design were parsing state was stored in instance fields. This created a huge "state space" that was difficult to track, potentially leading to bugs. For example, it was not immediately clear in the investigation of <gcovr#511> whether the problem was just that the parser forgot to update its line number correctly. Fixed in the new parser: using a more functional/procedural design, the data flows are very clear. State is tracked explicitly. By separating parsing phases, the state space is much smaller. * To expand on the previous point: The old parser essentially consisted of multiple interleaved state machines. There was one state machine for processing coverage data, and an interleaved state machine for tracking active coverage exclusion markers. This interleaving made it difficult to understand whether the state transitions were correct. Fixed in the new parser: coverage exclusion patterns are collected in a separate phase, before the actual coverage data is processed. * The old parser made use of very fragile parsing strategies, such as using `str.split()` excessively. This gave rise to fragile assumptions about the exact format. For example, the IndexError in <gcovr#226> was an example of wrong assumptions. The new parser uses regular expressions to parse tag lines, and only uses `str.split()` for the more structured source code lines. This is more self-documenting. Treatment of numerical values was aligned with the routines in the gcov source code. Should the format deviate in the future, the regexes will fail to match, making it possible to detect and fix the errors. (Until then, `--gcov-ignore-parse-errors` can be used). Design of the new parser ------------------------ The new parser is more complex in the sense that there is a lot more code. But there is a clearer separation of concerns, and the parser was closely informed by the gcov documentation and source code. As a result, I am confident that it handles far more edge cases correctly, in particular relating to the handling of numbers/percentages. There are three items for external use: **`parse_metadata(lines)`** creates a dict of values from the metadata lines. The old parser handled the very first line of the report separately to extract the filename. The new parser uses the same more robust parsing code for this metadata. **`ParserFlags`** is a flag-enum that describes various boolean features. A single object with flags seems simpler to handle than multiple variables like `exclude_throw_branches`. **`parse_coverage(lines, ...)`** is the main function for parsing the coverage. It performs multiple phases: * Each input line is parsed/tokenized into an internal data model. The data model is strongly typed. The various classes like `_SourceLine` are implemented as NamedTuples, which is both very convenient and very memory-efficient. Relevant items: `_parse_line()`, `_parse_tag_line()`, data model * Exclusion markers are extracted from source code lines and arranged into a data structure for later lookup. Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()` * Parsed lines are processed to populate a `FileCoverage` model. At this stage, exclusions are applied. The state space is very small, with only four variables that have to be tracked. Relevant items: `_ParserState`, `_gather_coverage_from_line()`, `_line_noncode_and_count()`, `_function_can_be_excluded()`, `_branch_can_be_excluded()`, `_add_coverage_for_function()` * Warnings are reported, and any potential errors re-raised. This is equivalent to the previous parser. Relevant items: `_report_lines_with_errors()` Impact on tests --------------- The new parser is almost completely bug-compatible with the old parser. This is e.g. visible in the potentially unintuitive handling of the `noncode` status. The new separation between low-level parsing and high-level decisions makes it more clear what is actually going on. There was a significant change in the **Nautilus parser test**. The input file contains the following pattern: ------------------ #####: 52:foo() ? bar(): Previously, this line 52 was not reported as uncovered. I consider that to be an error, and have updated the expected output correspondingly. This could indicate that the new parser is in fact more robust than the old parser when it comes to template specialization sections. In the **excl-branch test**, gcovr will encounter gcov input as the following when using GCC-8 or later: #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... call 0 never executed call 1 never executed ------------------ Bar::~Bar(): function Bar::~Bar() called 0 returned 0% blocks executed 0% #####: 9: virtual ~Bar() #####: 10: {} // ... ------------------ The old parser associated the `function` annotations with line 11. This was clearly incorrect. The test reference was updated to associate the destructor with line 9. Other than that, the tests were only updated to account for the different parser APIs. Internally, the new parser uses a lot of doctests. Future directions ----------------- The new parser extracts *all* available data, only to throw it away. It might now become feasible to make use of more of this data. In particular: * handling template specialization sections properly * collecting block-level coverage data * using the `working directory` metadata field Conflicts with other development efforts ---------------------------------------- * <gcovr#503> report of excluded coverage Makes a small patch to the parser. The same effect can be achieved by adding a few lines in `_gather_coverage_from_line()`. * <gcovr#484> tests with clang-10 Touches neighboring lines. Will be reported as a merge conflict by Git, but there's no semantic conflict. * <gcovr#474> abstract interface for reader/writer Small change in the parser code regarding `sys.exit(1)` (new parser: `raise SystemExit(1)`). It's worth noting that this is effectively unreachable code. Lines will only be reported if there was an exception, and if there was an exception it will be re-thrown. * <gcovr#361> --no-markers to ignore exclusion markers Touches the exclusion handling code. This is of course totally changed by the new parser. But the new parser would make it even easier to implement that functionality. * <gcovr#350> decision coverage Adds significant new parsing code, but most of it is outside of the gcov-parser. These changes could be ported with moderate effort to the new parser.
Extend test suite with clang-10 inside docker container and add reference data for clang-10 based on gcc-8.
As far as I can see, gcc adds a branch to closing braces which is missing in clang-10.