replace test XML diffing: yaxmldiff instead of pyutilib #495

latk · 2021-06-13T15:10:59Z

Pyutilib is a wonderful collection of Python helpers developed by the same people who created gcovr. But as gcovr is converging towards more standard Python practices, we've only been using it for XML diffing.

It turns out that XML diffing is a very popular problem, but most libraries do something that's too clever for our tests or don't produce readable output.

The lxml.doctestcompare module is very good but annotates differences inline, e.g. produces this when comparing <a/> with :

Expected:
  <a></a>

Got:
  <b></b>

Diff:
  <a (got: b)></a (got: b)>

In large documents, this can be hard to spot.

There are additional issues, such as special <any/> tags, any="" attributes, and ... ellipsis that are useful for doctests, but not useful for comparing two actual XML documents.

Another tool is xmldiff which is also very good, except that it outputs diff as instructions that are more machine-readable than human-accessible. For the same problem, it will output:

[rename, /a[1], b]

So because I'm dissatisfied with available tools, I've written Yet Another XML Differ (yaxmldiff). It uses lxml and tries to approximate a unified diff:

- <a/>
+ <b/>

On larger problems, it will try to collapse irrelevant sub-trees and attributes. For example, here is a diff between the gcc-5 and gcc-8 reference files for the simple1 test:

  <coverage branch-rate="0.5" complexity="0.0"
-   branches-covered="4"
+   branches-covered="2"
-   branches-valid="8"
+   branches-valid="4"
-   line-rate="0.8"
+   line-rate="0.7777777777777778"
-   lines-covered="8"
+   lines-covered="7"
-   lines-valid="10"
+   lines-valid="9"
-   timestamp=""
+   timestamp="1601411485"
-   version=""
+   version="gcovr 5.0"
  >
    <sources>
    ...
    </sources>
    <packages>
      <package branch-rate="0.5" complexity="0.0" name=""
-       line-rate="0.8"
+       line-rate="0.7777777777777778"
      >
        <classes>
          <class branch-rate="0.5" complexity="0.0" filename="..." name="..."
-           line-rate="0.8"
+           line-rate="0.7777777777777778"
          >
            <methods/>
            <lines>
              <line .../>
              <line ...>
              ...
              </line>
              <line .../>
              <line .../>
              <line .../>
              <line branch="..." hits="1"
-               number="17"
+               number="14"
              />
              <line ...>
              ...
              </line>
              <line branch="..." hits="0"
-               number="22"
+               number="19"
             />
              <line .../>
-             <line ...>...</line>
            </lines>
          </class>
        </classes>
      </package>
    </packages>
  </coverage>

This tool did spot a problem in the existing reference files. In the GCC-5 reference for the "shadow" test, Pyutilib considered the following two attributes equivalent, likely because it tries to compare things that look like numbers as numbers:

condition-coverage="50% (1/2)"
condition-coverage="50% (2/4)"

The reference file was updated manually.

This also hints at the drawback of yaxmldiff: it treats everything as text, and cannot apply rounding like the Pyutilib functions.

Links:

yaxmldiff on PyPI: https://pypi.org/project/yaxmldiff/
yaxmldiff on GitHub: https://github.com/latk/yaxmldiff.py

This PR contains two commits: one for the actual change, one to re-format the test_gcovr.py file. The reformatting will be redundant once #493 is merged.

codecov · 2021-06-13T15:19:30Z

Codecov Report

Merging #495 (5d69cfb) into master (07d419e) will increase coverage by 0.30%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #495      +/-   ##
==========================================
+ Coverage   95.40%   95.71%   +0.30%     
==========================================
  Files          20       20              
  Lines        2481     2472       -9     
  Branches      428      424       -4     
==========================================
- Hits         2367     2366       -1     
+ Misses         50       46       -4     
+ Partials       64       60       -4

Flag	Coverage Δ
ubuntu-18.04	`95.10% <100.00%> (+0.30%)`	⬆️
windows-2019	`95.26% <100.00%> (+0.30%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
gcovr/tests/test_gcovr.py	`98.01% <100.00%> (+4.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07d419e...5d69cfb. Read the comment docs.

Spacetown

LGFM. The suggestions are because this part shall never be reached in the CI..

gcovr/tests/test_gcovr.py

Pyutilib is a wonderful collection of Python helpers developed by the same people who created gcovr. But as gcovr is converging towards more standard Python practices, we've only been using it for XML diffing. It turns out that XML diffing is a very popular problem, but most libraries do something that's too clever for our tests or don't produce readable output. The `lxml.doctestcompare` module is very good but annotates differences *inline*, e.g. produces this when comparing `<a/>` with ``: Expected: <a></a> Got: Diff: <a (got: b)></a (got: b)> In large documents, this can be hard to spot. There are additional issues, such as special `<any/>` tags, `any=""` attributes, and `...` ellipsis that are useful for doctests, but not useful for comparing two actual XML documents. Another tool is xmldiff which is also very good, except that it outputs diff as instructions that are more machine-readable than human-accessible. For the same problem, it will output: [rename, /a[1], b] So because I'm dissatisfied with available tools, I've written Yet Another XML Differ (yaxmldiff). It uses lxml and tries to approximate a unified diff: - <a/> + On larger problems, it will try to collapse irrelevant sub-trees and attributes. For example, here is a diff between the gcc-5 and gcc-8 reference files for the simple1 test: <coverage branch-rate="0.5" complexity="0.0" - branches-covered="4" + branches-covered="2" - branches-valid="8" + branches-valid="4" - line-rate="0.8" + line-rate="0.7777777777777778" - lines-covered="8" + lines-covered="7" - lines-valid="10" + lines-valid="9" - timestamp="" + timestamp="1601411485" - version="" + version="gcovr 5.0" > <sources> ... </sources> <packages> <package branch-rate="0.5" complexity="0.0" name="" - line-rate="0.8" + line-rate="0.7777777777777778" > <classes> <class branch-rate="0.5" complexity="0.0" filename="..." name="..." - line-rate="0.8" + line-rate="0.7777777777777778" > <methods/> <lines> <line .../> <line ...> ... </line> <line .../> <line .../> <line .../> <line branch="..." hits="1" - number="17" + number="14" /> <line ...> ... </line> <line branch="..." hits="0" - number="22" + number="19" /> <line .../> - <line ...>...</line> </lines> </class> </classes> </package> </packages> </coverage> This tool did spot a problem in the existing reference files. In the GCC-5 reference for the "shadow" test, Pyutilib considered the following two attributes equivalent, likely because it tries to compare things that look like numbers as numbers: condition-coverage="50% (1/2)" condition-coverage="50% (2/4)" The reference file was updated manually. This also hints at the drawback of yaxmldiff: it treats everything as text, and cannot apply rounding like the Pyutilib functions. Links: * yaxmldiff on PyPI: <https://pypi.org/project/yaxmldiff/> * yaxmldiff on GitHub: <https://github.com/latk/yaxmldiff.py>

The driver for the end-to-end integration tests contains some checks that can't fire in a successful run, so ignore their coverage to massage our codecov figures. Co-authored-by: Michael Förderer <40258682+Spacetown@users.noreply.github.com>

latk added Type: Enhancement QA related to testing, build infrastructure, etc dependencies Pull requests that update a dependency file labels Jun 13, 2021

latk added this to the UpcomingRelease milestone Jun 13, 2021

latk force-pushed the xml-testing branch from c8f7c63 to c9920c4 Compare June 13, 2021 15:16

Spacetown approved these changes Jun 13, 2021

View reviewed changes

gcovr/tests/test_gcovr.py Outdated Show resolved Hide resolved

gcovr/tests/test_gcovr.py Outdated Show resolved Hide resolved

gcovr/tests/test_gcovr.py Outdated Show resolved Hide resolved

gcovr/tests/test_gcovr.py Outdated Show resolved Hide resolved

latk force-pushed the xml-testing branch from d611a97 to 48e57d4 Compare June 13, 2021 21:02

latk and others added 3 commits June 13, 2021 23:10

sort requirements.txt

5d69cfb

latk force-pushed the xml-testing branch from 5883379 to 5d69cfb Compare June 13, 2021 21:12

latk merged commit 84143ba into gcovr:master Jun 13, 2021

latk deleted the xml-testing branch June 13, 2021 21:21

Spacetown added the Internal change Internal change not visible to user label Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace test XML diffing: yaxmldiff instead of pyutilib #495

replace test XML diffing: yaxmldiff instead of pyutilib #495

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

replace test XML diffing: yaxmldiff instead of pyutilib #495

replace test XML diffing: yaxmldiff instead of pyutilib #495

Uh oh!

Conversation

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!