Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 #1300

Open
wants to merge 338 commits into
base: v2
Choose a base branch
from
Open

V2 #1300

wants to merge 338 commits into from

Conversation

yvettep321
Copy link

Thank you

LebedevRI and others added 30 commits September 19, 2018 15:59
My knowledge of python is not great, so this is kinda horrible.

Two things:
1. If there were repetitions, for the RHS (i.e. the new value) we were always using the first repetition,
    which naturally results in incorrect change reports for the second and following repetitions.
    And what is even worse, that completely broke U test. :(
2. A better support for different repetition count for U test was missing.
    It's important if we are to be able to report 'iteration as repetition',
    since it is rather likely that the iteration count will mismatch.

Now, the rough idea on how this is implemented now. I think this is the right solution.
1. Get all benchmark names (in order) from the lhs benchmark.
2. While preserving the order, keep the unique names
3. Get all benchmark names (in order) from the rhs benchmark.
4. While preserving the order, keep the unique names
5. Intersect `2.` and `4.`, get the list of unique benchmark names that exist on both sides.
6. Now, we want to group (partition) all the benchmarks with the same name.
   ```
   BM_FOO:
       [lhs]: BM_FOO/repetition0 BM_FOO/repetition1
       [rhs]: BM_FOO/repetition0 BM_FOO/repetition1 BM_FOO/repetition2
   ...
   ```
   We also drop mismatches in `time_unit` here.
   _(whose bright idea was it to store arbitrarily scaled timers in json **?!** )_
7. Iterate for each partition
7.1. Conditionally, diff the overlapping repetitions (the count of repetitions may be different.)
7.2. Conditionally, do the U test:
7.2.1. Get **all** the values of `"real_time"` field from the lhs benchmark
7.2.2. Get **all** the values of `"cpu_time"` field from the lhs benchmark
7.2.3. Get **all** the values of `"real_time"` field from the rhs benchmark
7.2.4. Get **all** the values of `"cpu_time"` field from the rhs benchmark
          NOTE: the repetition count may be different, but we want *all* the values!
7.2.5. Do the rest of the u test stuff
7.2.6. Print u test
8. ???
9. **PROFIT**!

Fixes google#677
The State constructor should not be part of the public API. Adding a
utility method to BenchmarkInstance allows us to avoid leaking the
RunInThread method into the public API.
Ok, so, i'm still trying to get to the state when it will be a trivial change to report all the separate iterations.
The old code (LHS of the diff) was rather convoluted i'd say.
I have tried to refactor it a bit into *small* logical chunks, with proper comments.
As far as i can tell, i preserved the intent of the code, what it was doing before.
The road forward still isn't clear, but i'm quite sure it's not with the old code :)
For several versions now, CMake by default refers to macOS’ Clang as AppleClang instead of just Clang, which would fail STREQUAL. Fixed by changing it to MATCHES.
As prevously written, "--benchmark_color=auto" was treated as true,
because IsTruthyFlagValue("auto") returned true.  The fix is to
rely on IsColorTerminal test only if the flag value is "auto",
and fall back to IsTruthyFlagValue otherwise.  I also integrated
force_no_color check into the same block.
* Fix SOURCE_DIR in HandleGTest.cmake

If benchmark added as cmake subproject, HandleGTest throws an error as  does return absolute source dir.
Change it to , so it will be refering to it's own source dir.
If benchmark added as cmake subproject, HandleGTest throws an error as  does return absolute source dir.
Change it to , so it will be refering to it's own source dir.

Also see PR google#703.
…le#707)

That is the real purpose of that bool. A follow-up change will
make it consider something else other than repetitions.
google#708)

It is better to let the RunBenchmarks(), report() decide
whether to actually *only* output aggregates or not,
depending on whether there are actually aggregates.

It's subtle indeed.

Previously, `BenchmarkRunner()` always said that "if there are no repetitions,
then you should never output only the repetitions". And the `report()` simply assumed
that the `report_aggregates_only` bool it received makes sense, and simply used it.

Now, the logic is the same, but the blame has shifted.
`BenchmarkRunner()` always propagates what those benchmarks would have wanted
to happen wrt the aggregates. And the `report()` lambda has to actually consider
both the `report_aggregates_only` bool, and it's meaningfulness.

To put it in the context of the patch series - if the repetition count was `1`,
but `*_report_aggregates_only` was set to `true`, and we capture each iteration separately,
then we will compute the aggregates, but then output everything, both the iteration,
and aggregates, despite `*_report_aggregates_only` being set to `true`.
It is incorrect to say that an aggregate is computed over
run's iterations, because those iterations already got averaged.
Similarly, if there are N repetitions with 1 iterations each,
an aggregate will be computed over N measurements, not 1.
Thus it is best to simply use the count of separate reports.

Fixes google#586.
s390 has another line structure for processor-field.
It should be differently parsed.
This is the copy of patch proposed to LLVM's copy of benchmark via
https://reviews.llvm.org/D52998.
Used that example as a snippet, and it took a moment to notice
what needed to be changed to make it compile..
std::tmpnam is deprecated and its use is discouraged. For our purposes
in the tests, we really just need a file name which is unlikely to
exist.

This patch converts the tests to using a dummy random file name
generator, which should hopefully avoid name conflicts.
)

Unit-tests fail to build due to the following errors:

/home/cfx/Dev/google-benchmark/benchmark.git/test/string_util_gtest.cc:12:5: required from here
/home/cfx/Applications/googletest-1.8.1/include/gtest/gtest.h:1444:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
   if (lhs == rhs) {
       ~~~~^~~~~~

Fixes google#741
* Adding Host Name and test

* Addressing Review Comments

* Adding Test for JSON Reporter

* Adding HOST_NAME_MAX for MacOS systems

* Adding Explaination for MacOS HOST_NAME_MAX Addition

* Addressing Peer Review Comments

* Adding codecvt in windows header guard

* Changing name SystemInfo and adding empty message incase host name fetch fails

* Adding Comment on Struct SystemInfo
As pointed out in IRC, these are not documented.
Some benchmarks are particularly sensitive and they run in less than
a nanosecond. In order for the console reporter to provide meaningful
output for such benchmarks it needs to be able to display the times
using more resolution than a single nanosecond.

This patch changes the console reporter to print at least three
significant digits for all results.

Unlike the initial attempt, this patch does not align the decimal point.
dominichamon and others added 28 commits May 30, 2021 09:58
* Fix argument order in StrSplit

* Update AUTHORS, CONTRIBUTORS
…erve()

It takes the whole total new capacity, not the increase.
It may be useful for those wishing to further post-process JSON results,
but it is mainly geared towards better support for run interleaving,
where results from the same family may not be close-by in the JSON.

While we won't be able to do much about that for outputs,
the tools can and perhaps should reorder the results to that
at least in their output they are in proper order, not run order.

Note that this only counts the families that were filtered-in,
so if e.g. there were three families, and we filtered-out
the second one, the two families (which were first and third)
will have family indexes 0 and 1.
Much like it makes sense to enumerate all the families,
it makes sense to enumerate stuff within families.
Alternatively, we could have a global instance index,
but i'm not sure why that would be better.

This will be useful when the benchmarks are run not in order,
for the tools to sort the results properly.
While the current variant works, it assumes that all the instances of
a single family will be run together, with nothing inbetween them.
Naturally, that won't work once the runs may be interleaved.
Currently, the tooling just keeps the whatever benchmark order
that was present, and this is fine nowadays, but once the benchmarks
will be optionally run interleaved, that will be rather suboptimal.

So, now that i have introduced family index and per-family instance index,
we can define an order for the benchmarks, and sort them accordingly.

There is a caveat with aggregates, we assume that they are in-order,
and hopefully we won't mess that order up..
Based on original implementation by Hai Huang @haih-g in
google#1105
Currently the lifetime of a single BenchmarkRunner is constrained
to a RunBenchmark(), but that will have to change for interleaved
benchmark execution, because we'll need to keep it around to not
forget how much repetitions of an instance we've done.
…e#1169)

* Fix leak in test, and provide path to remove leak from library

* make doc change
…le#1051) (google#1163)

Inspired by the original implementation by Hai Huang @haih-g
from google#1105.

The original implementation had design deficiencies that
weren't really addressable without redesign, so it was reverted.

In essence, the original implementation consisted of two separateable parts:
* reducing the amount time each repetition is run for, and symmetrically increasing repetition count
* running the repetitions in random order

While it worked fine for the usual case, it broke down when user would specify repetitions
(it would completely ignore that request), or specified per-repetition min time (while it would
still adjust the repetition count, it would not adjust the per-repetition time,
leading to much greater run times)

Here, like i was originally suggesting in the original review, i'm separating the features,
and only dealing with a single one - running repetitions in random order.

Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output,
and indeed `compare.py` has been updated to do that: google#1168.
* Enable various sanitizer builds in github actions

* try with off the shelf versions

* nope

* specific version?

* rats

* oops

* remove msan for now

* reorder so env is set before building libc++
* Use modern clang/libc++ for sanitizers

* update ubuntu

* new llvm builds differently

* clang, not clang-3.8

* just build what we need
Some downstream projects (e.g. V8) treat warnings as errors and cannot roll
the latest changes.
…#1179)

This can be used together with ArgsProduct() to allow multiple ranges
with different multipliers and mixing dense and sparse ranges.

Example:
BENCHMARK(MyTest)->ArgsProduct({
  CreateRange(0, 1024, /*multi=*/32),
  CreateRange(0, 100, /*multi=*/4),
  CreateDenseRange(0, 4, /*step=*/1)
});

Co-authored-by: Jen-yee Hong <pcmantw@google.com>
* Add missing trailing commas

Fixes google#1181

* Better trailing commas
This avoids clashes with other libraries that might define the same flags.
@google-cla google-cla bot added the cla: no label Dec 6, 2021
@dmah42
Copy link
Member

dmah42 commented Dec 6, 2021

is this bringing the v2 branch up to date with main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.