test-result produces incorrect counts #44

alsora · 2021-11-20T11:24:20Z

Hi, I'm running colcon test-result in my workspace but it looks like tests are counted multiple times.

$ colcon build
$ colcon test

then

$ colcon test-result --test-result-base ./_ws/build
_ws/build/hello_world/Testing/20211120-1117/Test.xml: 5 tests, 0 errors, 1 failure, 0 skipped
_ws/build/hello_world/test_resultshello_world/test_hello.gtest.xml: 1 test, 0 errors, 1 failure, 0 skipped

Summary: 19 tests, 0 errors, 2 failures, 0 skipped

My workspace only has 5 tests and only 1 of them is failing.

similarly other commands

$ colcon test-result --test-result-base ./_ws/build --all
_ws/build/hello_world/Testing/20211120-1117/Test.xml: 5 tests, 0 errors, 1 failure, 0 skipped
_ws/build/hello_world/test_results/hello_world/cppcheck.xunit.xml: 4 tests, 0 errors, 0 failures, 0 skipped
_ws/build/hello_world/test_results/hello_world/cpplint.xunit.xml: 4 tests, 0 errors, 0 failures, 0 skipped
_ws/build/hello_world/test_results/hello_world/lint_cmake.xunit.xml: 1 test, 0 errors, 0 failures, 0 skipped
_ws/build/hello_world/test_results/hello_world/test_hello.gtest.xml: 1 test, 0 errors, 1 failure, 0 skipped
_ws/build/hello_world/test_results/hello_world/uncrustify.xunit.xml: 4 tests, 0 errors, 0 failures, 0 skipped

Summary: 19 tests, 0 errors, 2 failures, 0 skipped

If i inspect the test logs however

$ cat _ws/log/latest_test/hello_world/stdout.log
.....
The following tests passed:
	cppcheck
	uncrustify
	cpplint
	lint_cmake

80% tests passed, 1 tests failed out of 5

Label Time Summary:
cppcheck      =   0.50 sec*proc (1 test)
cpplint       =   0.46 sec*proc (1 test)
gtest         =   0.23 sec*proc (1 test)
lint_cmake    =   0.42 sec*proc (1 test)
linter        =   1.80 sec*proc (4 tests)
uncrustify    =   0.42 sec*proc (1 test)

Total Test time (real) =   2.04 sec

The following tests FAILED:
	  5 - test_hello (Failed)

The text was updated successfully, but these errors were encountered:

cottsay · 2021-12-09T23:07:15Z

The way ament orchestrates the invocation of tests is to add them as a CTest and then make a single ctest invocation. Both CTest and gtest (and many other testing suites) produce results files. The Test.xml in the list was produced by CTest, while the individual .gtest.xml and .xunit.xml files were produced by the individual testing suites, which can have further granularity in what constitutes a single test. For example, gtest considers each test function within the file to be a separate test, and ament_uncrustify considers each c++ file to be a separate test.

We want the individual test results to give us more information about what went wrong, such as the name of the gtest function that failed. This is pretty important when looking for trends and flaky tests when you're dealing with a large codebase. We want the top-level CTest result because there may be tests which don't produce their own result file, and we want colcon test-result to notice if they failed.

So the CTest results often (but not always) contain a "duplicate" instance, and individual suites often break what CTest considers to be a single test into multiple instances.

You're right, colcon test-result is certainly over-reporting the raw count here. Given that we don't have a rigid definition of what constitutes a single test "instance", I'm not sure how that count could be reliably used anyway.

alsora · 2021-12-09T23:17:02Z

Thanks for the explanation!

I agree that how to count the tests can be debated, but double counting them shouldn't be an option.
The utility is reporting 2 failures but there is only 1 and it's a gtest with a single test case inside.
I don't think there should be much space for interpretations here =)

cottsay · 2021-12-09T23:22:04Z

The utility is reporting 2 failures but there is only 1 and it's a gtest with a single test case inside.

I'm not sure how to fix this. I'm not aware of any mechanism to tell CTest not to report the results because the invoked process does its own reporting, and I don't see any reliable way that colcon test-result could correlate which one of the CTest invocations produced one either so the counts can be adjusted.

If you see something I don't, please consider providing a PR.

alsora · 2021-12-09T23:38:48Z

Ok, I'll have a look and see if I get any idea.
Honestly what I was looking for was just a way to condense the various

80% tests passed, 1 tests failed out of 5

From the individual packages stdout.log.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-result produces incorrect counts #44

test-result produces incorrect counts #44

alsora commented Nov 20, 2021

cottsay commented Dec 9, 2021

alsora commented Dec 9, 2021

cottsay commented Dec 9, 2021

alsora commented Dec 9, 2021

test-result produces incorrect counts #44

test-result produces incorrect counts #44

Comments

alsora commented Nov 20, 2021

cottsay commented Dec 9, 2021

alsora commented Dec 9, 2021

cottsay commented Dec 9, 2021

alsora commented Dec 9, 2021