8000 Extend tests with clang-10 by Spacetown · Pull Request #484 · gcovr/gcovr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Extend tests with clang-10 #484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 13, 2021
Merged

Extend tests with clang-10 #484

merged 15 commits into from
Sep 13, 2021

Conversation

Spacetown
Copy link
Member

Extend test suite with clang-10 inside docker container and add reference data for clang-10 based on gcc-8.
As far as I can see, gcc adds a branch to closing braces which is missing in clang-10.

@Spacetown Spacetown requested a review from latk May 2, 2021 19:33
@Spacetown Spacetown added this to the 4.3 milestone May 2, 2021
@codecov
Copy link
codecov bot commented May 2, 2021

Codecov Report

Merging #484 (63fc704) into master (e0b7afe) will decrease coverage by 0.06%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
- Coverage   95.50%   95.44%   -0.07%     
==========================================
  Files          20       20              
  Lines        2604     2610       +6     
  Branches      441      443       +2     
==========================================
+ Hits         2487     2491       +4     
  Misses         55       55              
- Partials       62       64       +2     
Flag Coverage Δ
ubuntu-18.04 94.90% <66.66%> (-0.07%) ⬇️
windows-2019 94.90% <66.66%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
gcovr/gcov.py 87.06% <66.66%> (-0.58%) ⬇️
gcovr/tests/test_gcovr.py 98.00% <0.00%> (+0.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0b7afe...63fc704. Read the comment docs.

Copy link
Contributor
@chrta chrta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question

@Spacetown
Copy link
Member Author

Closes #134

Option is documented but it isn't accepted. Raise RuntimeError if unknown option is used for gcov.
@Spacetown Spacetown requested a review from chrta August 26, 2021 19:00
Copy link
Contributor
@chrta chrta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good to me. Just two minor questions/remarks.

Copy link
Member
@latk latk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with Clang would be really good! Since everything seems to work, this could be merged as is.

But it seems this kind of work touches on lots of tangential concerns. I've added some inline comments below. It could also make sense to mention the Clang tests in installation.rst.

I have only looked at the code, not at the generated reference files. So I'm assuming/hoping that the coverage data is reasonable.

Copy link
Contributor
@chrta chrta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

latk added a commit to latk/gcovr that referenced this pull request Sep 10, 2021
Summary: This commit introduces a completely rewritten parser for gcov's
human-readable report format. The implementation pays closer attention to the
upstream file format, e.g. with respect to the exact number formatting. Compared
to the previous parser, the data flow has been greatly clarified, making this
parser more obviously correct. The new parser will make it easier to react to
future changes in the format, and to make use of improvements from newer GCC
versions – such as the `Working directory` metadata.

Background
----------

The old parser had some problems that could be addressed by this rewrite.

* The old parser was very complex due to mixing low level parsing and high-level
  decisions about coverage exclusion.

  Fixed in the new parser: now, staging is split into multiple phases that
  clearly separate concerns. The low-level parsing phase produces a data model
  that is safe to use for the other phases.

* The old parser used an object-oriented design were parsing state was stored in
  instance fields. This created a huge "state space" that was difficult to
  track, potentially leading to bugs. For example, it was not immediately clear
  in the investigation of <gcovr#511> whether
  the problem was just that the parser forgot to update its line number
  correctly.

  Fixed in the new parser: using a more functional/procedural design, the data
  flows are very clear. State is tracked explicitly. By separating parsing
  phases, the state space is much smaller.

* To expand on the previous point: The old parser essentially consisted of
  multiple interleaved state machines. There was one state machine for
  processing coverage data, and an interleaved state machine for tracking active
  coverage exclusion markers. This interleaving made it difficult to understand
  whether the state transitions were correct.

  Fixed in the new parser: coverage exclusion patterns are collected in a
  separate phase, before the actual coverage data is processed.

* The old parser made use of very fragile parsing strategies, such as using
  `str.split()` excessively. This gave rise to fragile assumptions about the
  exact format. For example, the IndexError in
  <gcovr#226> was an example of wrong
  assumptions.

  The new parser uses regular expressions to parse tag lines, and only uses
  `str.split()` for the more structured source code lines. This is more
  self-documenting. Treatment of numerical values was aligned with the routines
  in the gcov source code. Should the format deviate in the future, the regexes
  will fail to match, making it possible to detect and fix the errors. (Until
  then, `--gcov-ignore-parse-errors` can be used).

Design of the new parser
------------------------

The new parser is more complex in the sense that there is a lot more code. But
there is a clearer separation of concerns, and the parser was closely informed
by the gcov documentation and source code. As a result, I am confident that it
handles far more edge cases correctly, in particular relating to the handling of
numbers/percentages.

There are three items for external use:

**`parse_metadata(lines)`** creates a dict of values from the metadata lines.
The old parser handled the very first line of the report separately to extract
the filename. The new parser uses the same more robust parsing code for this
metadata.

**`ParserFlags`** is a flag-enum that describes various boolean features. A single
object with flags seems simpler to handle than multiple variables like
`exclude_throw_branches`.

**`parse_coverage(lines, ...)`** is the main function for parsing the coverage.
It performs multiple phases:

* Each input line is parsed/tokenized into an internal data model. The data
  model is strongly typed. The various classes like `_SourceLine` are
  implemented as NamedTuples, which is both very convenient and very
  memory-efficient.

  Relevant items: `_parse_line()`, `_parse_tag_line()`, data model

* Exclusion markers are extracted from source code lines and arranged into a
  data structure for later lookup.

  Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()`

* Parsed lines are processed to populate a `FileCoverage` model. At this stage,
  exclusions are applied. The state space is very small, with only four
  variables that have to be tracked.

  Relevant items: `_ParserState`, `_gather_coverage_from_line()`,
  `_line_noncode_and_count()`, `_function_can_be_excluded()`,
  `_branch_can_be_excluded()`, `_add_coverage_for_function()`

* Warnings are reported, and any potential errors re-raised. This is equivalent
  to the previous parser.

  Relevant items: `_report_lines_with_errors()`

Impact on tests
---------------

The new parser is almost completely bug-compatible with the old parser. This is
e.g. visible in the potentially unintuitive handling of the `noncode` status.
The new separation between low-level parsing and high-level decisions makes it
more clear what is actually going on.

There was a significant change in the Nautilus parser test. The input file
contains the following pattern:

    ------------------
        #####:   52:foo() ? bar():

Previously, this line 52 was not reported as uncovered. I consider that to be an
error, and have updated the expected output correspondingly. This could indicate
that the new parser is in fact more robust than the old parser when it comes to
template specialization sections.

Other than that, the tests were only updated to account for the different parser
APIs.

Internally, the new parser uses a lot of doctests.

Future directions
-----------------

The new parser extracts *all* available data, only to throw it away. It might
now become feasible to make use of more of this data. In particular:

* handling template specialization sections properly
* collecting block-level coverage data
* using the `working directory` metadata field

Conflicts with other development efforts
----------------------------------------

* <gcovr#503> report of excluded coverage

  Makes a small patch to the parser. The same effect can be achieved by adding a
  few lines in `_gather_coverage_from_line()`.

* <gcovr#484> tests with clang-10

  Touches neighboring lines. Will be reported as a merge conflict by Git, but
  there's no semantic conflict.

* <gcovr#474> abstract interface for reader/writer

  Small change in the parser code regarding `sys.exit(1)` (new parser:
  `raise SystemExit(1)`). It's worth noting that this is effectively unreachable
  code. Lines will only be reported if there was an exception, and if there was
  an exception it will 
6D47
be re-thrown.

* <gcovr#361> --no-markers to ignore exclusion markers

  Touches the exclusion handling code. This is of course totally changed by the
  new parser. But the new parser would make it even easier to implement that
  functionality.

* <gcovr#350> decision coverage

  Adds significant new parsing code, but most of it is outside of the
  gcov-parser. These changes could be ported with moderate effort to the new
  parser.
latk added a commit to latk/gcovr that referenced this pull request Sep 10, 2021
Summary: This commit introduces a completely rewritten parser for gcov's
human-readable report format. The implementation pays closer attention to the
upstream file format, e.g. with respect to the exact number formatting. Compared
to the previous parser, the data flow has been greatly clarified, making this
parser more obviously correct. The new parser will make it easier to react to
future changes in the format, and to make use of improvements from newer GCC
versions – such as the `Working directory` metadata.

Background
----------

The old parser had some problems that could be addressed by this rewrite.

* The old parser was very complex due to mixing low level parsing and high-level
  decisions about coverage exclusion.

  Fixed in the new parser: now, staging is split into multiple phases that
  clearly separate concerns. The low-level parsing phase produces a data model
  that is safe to use for the other phases.

* The old parser used an object-oriented design were parsing state was stored in
  instance fields. This created a huge "state space" that was difficult to
  track, potentially leading to bugs. For example, it was not immediately clear
  in the investigation of <gcovr#511> whether
  the problem was just that the parser forgot to update its line number
  correctly.

  Fixed in the new parser: using a more functional/procedural design, the data
  flows are very clear. State is tracked explicitly. By separating parsing
  phases, the state space is much smaller.

* To expand on the previous point: The old parser essentially consisted of
  multiple interleaved state machines. There was one state machine for
  processing coverage data, and an interleaved state machine for tracking active
  coverage exclusion markers. This interleaving made it difficult to understand
  whether the state transitions were correct.

  Fixed in the new parser: coverage exclusion patterns are collected in a
  separate phase, before the actual coverage data is processed.

* The old parser made use of very fragile parsing strategies, such as using
  `str.split()` excessively. This gave rise to fragile assumptions about the
  exact format. For example, the IndexError in
  <gcovr#226> was an example of wrong
  assumptions.

  The new parser uses regular expressions to parse tag lines, and only uses
  `str.split()` for the more structured source code lines. This is more
  self-documenting. Treatment of numerical values was aligned with the routines
  in the gcov source code. Should the format deviate in the future, the regexes
  will fail to match, making it possible to detect and fix the errors. (Until
  then, `--gcov-ignore-parse-errors` can be used).

Design of the new parser
------------------------

The new parser is more complex in the sense that there is a lot more code. But
there is a clearer separation of concerns, and the parser was closely informed
by the gcov documentation and source code. As a result, I am confident that it
handles far more edge cases correctly, in particular relating to the handling of
numbers/percentages.

There are three items for external use:

**`parse_metadata(lines)`** creates a dict of values from the metadata lines.
The old parser handled the very first line of the report separately to extract
the filename. The new parser uses the same more robust parsing code for this
metadata.

**`ParserFlags`** is a flag-enum that describes various boolean features. A single
object with flags seems simpler to handle than multiple variables like
`exclude_throw_branches`.

**`parse_coverage(lines, ...)`** is the main function for parsing the coverage.
It performs multiple phases:

* Each input line is parsed/tokenized into an internal data model. The data
  model is strongly typed. The various classes like `_SourceLine` are
  implemented as NamedTuples, which is both very convenient and very
  memory-efficient.

  Relevant items: `_parse_line()`, `_parse_tag_line()`, data model

* Exclusion markers are extracted from source code lines and arranged into a
  data structure for later lookup.

  Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()`

* Parsed lines are processed to populate a `FileCoverage` model. At this stage,
  exclusions are applied. The state space is very small, with only four
  variables that have to be tracked.

  Relevant items: `_ParserState`, `_gather_coverage_from_line()`,
  `_line_noncode_and_count()`, `_function_can_be_excluded()`,
  `_branch_can_be_excluded()`, `_add_coverage_for_function()`

* Warnings are reported, and any potential errors re-raised. This is equivalent
  to the previous parser.

  Relevant items: `_report_lines_with_errors()`

Impact on tests
---------------

The new parser is almost completely bug-compatible with the old parser. This is
e.g. visible in the potentially unintuitive handling of the `noncode` status.
The new separation between low-level parsing and high-level decisions makes it
more clear what is actually going on.

There was a significant change in the **Nautilus parser test**. The input file
contains the following pattern:

    ------------------
        #####:   52:foo() ? bar():

Previously, this line 52 was not reported as uncovered. I consider that to be an
error, and have updated the expected output correspondingly. This could indicate
that the new parser is in fact more robust than the old parser when it comes to
template specialization sections.

In the **excl-branch test**, gcovr will encounter gcov input as the following
when using GCC-8 or later:

        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    call    0 never executed
    call    1 never executed
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------

The old parser associated the `function` annotations with line 11. This was
clearly incorrect. The test reference was updated to associate the destructor
with line 9.

Other than that, the tests were only updated to account for the different parser
APIs.

Internally, the new parser uses a lot of doctests.

Future directions
-----------------

The new parser extracts *all* available data, only to throw it away. It might
now become feasible to make use of more of this data. In particular:

* handling template specialization sections properly
* collecting block-level coverage data
* using the `working directory` metadata field

Conflicts with other development efforts
----------------------------------------

* <gcovr#503> report of excluded coverage

  Makes a small patch to the parser. The same effect can be achieved by adding a
  few lines in `_gather_coverage_from_line()`.

* <gcovr#484> tests with clang-10

  Touches neighboring lines. Will be reported as a merge conflict by Git, but
  there's no semantic conflict.

* <gcovr#474> abstract interface for reader/writer

  Small change in the parser code regarding `sys.exit(1)` (new parser:
  `raise SystemExit(1)`). It's worth noting that this is effectively unreachable
  code. Lines will only be reported if there was an exception, and if there was
  an exception it will be re-thrown.

* <gcovr#361> --no-markers to ignore exclusion markers

  Touches the exclusion handling code. This is of course totally changed by the
  new parser. But the new parser would make it even easier to implement that
  functionality.

* <gcovr#350> decision coverage

  Adds significant new parsing code, but most of it is outside of the
  gcov-parser. These changes could be ported with moderate effort to the new
  parser.
latk added a commit to latk/gcovr that referenced this pull request Sep 10, 2021
Summary: This commit introduces a completely rewritten parser for gcov's
human-readable report format. The implementation pays closer attention to the
upstream file format, e.g. with respect to the exact number formatting. Compared
to the previous parser, the data flow has been greatly clarified, making this
parser more obviously correct. The new parser will make it easier to react to
future changes in the format, and to make use of improvements from newer GCC
versions – such as the `Working directory` metadata.

Background
----------

The old parser had some problems that could be addressed by this rewrite.

* The old parser was very complex due to mixing low level parsing and high-level
  decisions about coverage exclusion.

  Fixed in the new parser: now, staging is split into multiple phases that
  clearly separate concerns. The low-level parsing phase produces a data model
  that is safe to use for the other phases.

* The old parser used an object-oriented design were parsing state was stored in
  instance fields. This created a huge "state space" that was difficult to
  track, potentially leading to bugs. For example, it was not immediately clear
  in the investigation of <gcovr#511> whether
  the problem was just that the parser forgot to update its line number
  correctly.

  Fixed in the new parser: using a more functional/procedural design, the data
  flows are very clear. State is tracked explicitly. By separating parsing
  phases, the state space is much smaller.

* To expand on the previous point: The old parser essentially consisted of
  multiple interleaved state machines. There was one state machine for
  processing coverage data, and an interleaved state machine for tracking active
  coverage exclusion markers. This interleaving made it difficult to understand
  whether the state transitions were correct.

  Fixed in the new parser: coverage exclusion patterns are collected in a
  separate phase, before the actual coverage data is processed.

* The old parser made use of very fragile parsing strategies, such as using
  `str.split()` excessively. This gave rise to fragile assumptions about the
  exact format. For example, the IndexError in
  <gcovr#226> was an example of wrong
  assumptions.

  The new parser uses regular expressions to parse tag lines, and only uses
  `str.split()` for the more structured source code lines. This is more
  self-documenting. Treatment of numerical values was aligned with the routines
  in the gcov source code. Should the format deviate in the future, the regexes
  will fail to match, making it possible to detect and fix the errors. (Until
  then, `--gcov-ignore-parse-errors` can be used).

Design of the new parser
------------------------

The new parser is more complex in the sense that there is a lot more code. But
there is a clearer separation of concerns, and the parser was closely informed
by the gcov documentation and source code. As a result, I am confident that it
handles far more edge cases correctly, in particular relating to the handling of
numbers/percentages.

There are three items for external use:

**`parse_metadata(lines)`** creates a dict of values from the metadata lines.
The old parser handled the very first line of the report separately to extract
the filename. The new parser uses the same more robust parsing code for this
metadata.

**`ParserFlags`** is a flag-enum that describes various boolean features. A single
object with flags seems simpler to handle than multiple variables like
`exclude_throw_branches`.

**`parse_coverage(lines, ...)`** is the main function for parsing the coverage.
It performs multiple phases:

* Each input line is parsed/tokenized into an internal data model. The data
  model is strongly typed. The various classes like `_SourceLine` are
  implemented as NamedTuples, which is both very convenient and very
  memory-efficient.

  Relevant items: `_parse_line()`, `_parse_tag_line()`, data model

* Exclusion markers are extracted from source code lines and arranged into a
  data structure for later lookup.

  Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()`

* Parsed lines are processed to populate a `FileCoverage` model. At this stage,
  exclusions are applied. The state space is very small, with only four
  variables that have to be tracked.

  Relevant items: `_ParserState`, `_gather_coverage_from_line()`,
  `_line_noncode_and_count()`, `_function_can_be_excluded()`,
  `_branch_can_be_excluded()`, `_add_coverage_for_function()`

* Warnings are reported, and any potential errors re-raised. This is equivalent
  to the previous parser.

  Relevant items: `_report_lines_with_errors()`

Impact on tests
---------------

The new parser is almost completely bug-compatible with the old parser. This is
e.g. visible in the potentially unintuitive handling of the `noncode` status.
The new separation between low-level parsing and high-level decisions makes it
more clear what is actually going on.

There was a significant change in the **Nautilus parser test**. The input file
contains the following pattern:

    ------------------
        #####:   52:foo() ? bar():

Previously, this line 52 was not reported as uncovered. I consider that to be an
error, and have updated the expected output correspondingly. This could indicate
that the new parser is in fact more robust than the old parser when it comes to
template specialization sections.

In the **excl-branch test**, gcovr will encounter gcov input as the following
when using GCC-8 or later:

        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    call    0 never executed
    call    1 never executed
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------

The old parser associated the `function` annotations with line 11. This was
clearly incorrect. The test reference was updated to associate the destructor
with line 9.

Other than that, the tests were only updated to account for the different parser
APIs.

Internally, the new parser uses a lot of doctests.

Future directions
-----------------

The new parser extracts *all* available data, only to throw it away. It might
now become feasible to make use of more of this data. In particular:

* handling template specialization sections properly
* collecting block-level coverage data
* using the `working directory` metadata field

Conflicts with other development efforts
----------------------------------------

* <gcovr#503> report of excluded coverage

  Makes a small patch to the parser. The same effect can be achieved by adding a
  few lines in `_gather_coverage_from_line()`.

* <gcovr#484> tests with clang-10

  Touches neighboring lines. Will be reported as a merge conflict by Git, but
  there's no semantic conflict.

* <gcovr#474> abstract interface for reader/writer

  Small change in the parser code regarding `sys.exit(1)` (new parser:
  `raise SystemExit(1)`). It's worth noting that this is effectively unreachable
  code. Lines will only be reported if there was an exception, and if there was
  an exception it will be re-thrown.

* <gcovr#361> --no-markers to ignore exclusion markers

  Touches the exclusion handling code. This is of course totally changed by the
  new parser. But the new parser would make it even easier to implement that
  functionality.

* <gcovr#350> decision coverage

  Adds significant new parsing code, but most of it is outside of the
  gcov-parser. These changes could be ported with moderate effort to the new
  parser.
@Spacetown Spacetown merged commit aa7c0a1 into gcovr:master Sep 13, 2021
@Spacetown Spacetown deleted the add_clang branch September 13, 2021 20:30
latk added a commit to latk/gcovr that referenced this pull request Sep 15, 2021
Summary: This commit introduces a completely rewritten parser for gcov's
human-readable report format. The implementation pays closer attention to the
upstream file format, e.g. with respect to the exact number formatting. Compared
to the previous parser, the data flow has been greatly clarified, making this
parser more obviously correct. The new parser will make it easier to react to
future changes in the format, and to make use of improvements from newer GCC
versions – such as the `Working directory` metadata.

Background
----------

The old parser had some problems that could be addressed by this rewrite.

* The old parser was very complex due to mixing low level parsing and high-level
  decisions about coverage exclusion.

  Fixed in the new parser: now, staging is split into multiple phases that
  clearly separate concerns. The low-level parsing phase produces a data model
  that is safe to use for the other phases.

* The old parser used an object-oriented design were parsing state was stored in
  instance fields. This created a huge "state space" that was difficult to
  track, potentially leading to bugs. For example, it was not immediately clear
  in the investigation of <gcovr#511> whether
  the problem was just that the parser forgot to update its line number
  correctly.

  Fixed in the new parser: using a more functional/procedural design, the data
  flows are very clear. State is tracked explicitly. By separating parsing
  phases, the state space is much smaller.

* To expand on the previous point: The old parser essentially consisted of
  multiple interleaved state machines. There was one state machine for
  processing coverage data, and an interleaved state machine for tracking active
  coverage exclusion markers. This interleaving made it difficult to understand
  whether the state transitions were correct.

  Fixed in the new parser: coverage exclusion patterns are collected in a
  separate phase, before the actual coverage data is processed.

* The old parser made use of very fragile parsing strategies, such as using
  `str.split()` excessively. This gave rise to fragile assumptions about the
  exact format. For example, the IndexError in
  <gcovr#226> was an example of wrong
  assumptions.

  The new parser uses regular expressions to parse tag lines, and only uses
  `str.split()` for the more structured source code lines. This is more
  self-documenting. Treatment of numerical values was aligned with the routines
  in the gcov source code. Should the format deviate in the future, the regexes
  will fail to match, making it possible to detect and fix the errors. (Until
  then, `--gcov-ignore-parse-errors` can be used).

Design of the new parser
------------------------

The new parser is more complex in the sense that there is a lot more code. But
there is a clearer separation of concerns, and the parser was closely informed
by the gcov documentation and source code. As a result, I am confident that it
handles far more edge cases correctly, in particular relating to the handling of
numbers/percentages.

There are three items for external use:

**`parse_metadata(lines)`** creates a dict of values from the metadata lines.
The old parser handled the very first line of the report separately to extract
the filename. The new parser uses the same more robust parsing code for this
metadata.

**`ParserFlags`** is a flag-enum that describes various boolean features. A single
object with flags seems simpler to handle than multiple variables like
`exclude_throw_branches`.

**`parse_coverage(lines, ...)`** is the main function for parsing the coverage.
It performs multiple phases:

* Each input line is parsed/tokenized into an internal data model. The data
  model is strongly typed. The various classes like `_SourceLine` are
  implemented as NamedTuples, which is both very convenient and very
  memory-efficient.

  Relevant items: `_parse_line()`, `_parse_tag_line()`, data model

* Exclusion markers are extracted from source code lines and arranged into a
  data structure for later lookup.

  Relevant items: `_find_excluded_ranges()`, `_make_is_in_any_range()`

* Parsed lines are processed to populate a `FileCoverage` model. At this stage,
  exclusions are applied. The state space is very small, with only four
  variables that have to be tracked.

  Relevant items: `_ParserState`, `_gather_coverage_from_line()`,
  `_line_noncode_and_count()`, `_function_can_be_excluded()`,
  `_branch_can_be_excluded()`, `_add_coverage_for_function()`

* Warnings are reported, and any potential errors re-raised. This is equivalent
  to the previous parser.

  Relevant items: `_report_lines_with_errors()`

Impact on tests
---------------

The new parser is almost completely bug-compatible with the old parser. This is
e.g. visible in the potentially unintuitive handling of the `noncode` status.
The new separation between low-level parsing and high-level decisions makes it
more clear what is actually going on.

There was a significant change in the **Nautilus parser test**. The input file
contains the following pattern:

    ------------------
        #####:   52:foo() ? bar():

Previously, this line 52 was not reported as uncovered. I consider that to be an
error, and have updated the expected output correspondingly. This could indicate
that the new parser is in fact more robust than the old parser when it comes to
template specialization sections.

In the **excl-branch test**, gcovr will encounter gcov input as the following
when using GCC-8 or later:

        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    call    0 never executed
    call    1 never executed
    ------------------
    Bar::~Bar():
    function Bar::~Bar() called 0 returned 0% blocks executed 0%
        #####:    9:    virtual ~Bar()
        #####:   10:    {} // ...
    ------------------

The old parser associated the `function` annotations with line 11. This was
clearly incorrect. The test reference was updated to associate the destructor
with line 9.

Other than that, the tests were only updated to account for the different parser
APIs.

Internally, the new parser uses a lot of doctests.

Future directions
-----------------

The new parser extracts *all* available data, only to throw it away. It might
now become feasible to make use of more of this data. In particular:

* handling template specialization sections properly
* collecting block-level coverage data
* using the `working directory` metadata field

Conflicts with other development efforts
----------------------------------------

* <gcovr#503> report of excluded coverage

  Makes a small patch to the parser. The same effect can be achieved by adding a
  few lines in `_gather_coverage_from_line()`.

* <gcovr#484> tests with clang-10

  Touches neighboring lines. Will be reported as a merge conflict by Git, but
  there's no semantic conflict.

* <gcovr#474> abstract interface for reader/writer

  Small change in the parser code regarding `sys.exit(1)` (new parser:
  `raise SystemExit(1)`). It's worth noting that this is effectively unreachable
  code. Lines will only be reported if there was an exception, and if there was
  an exception it will be re-thrown.

* <gcovr#361> --no-markers to ignore exclusion markers

  Touches the exclusion handling code. This is of course totally changed by the
  new parser. But the new parser would make it even easier to implement that
  functionality.

* <gcovr#350> decision coverage

  Adds significant new parsing code, but most of it is outside of the
  gcov-parser. These changes could be ported with moderate effort to the new
  parser.
@Spacetown Spacetown mentioned this pull request Sep 15, 2021
@Spacetown Spacetown added QA related to testing, build infrastructure, etc Internal change Internal change not visible to user and removed needs review labels Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internal change Internal change not visible to user QA related to testing, build infrastructure, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0