Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Add reporting design #6

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

spbnick
Copy link
Contributor

@spbnick spbnick commented Jul 2, 2018

Here's a very early, incomplete draft of the attempt to redesign skt reporting, so that e-mail sending can be moved to sktm, and also so it's easier to test skt, and have it running different stages on different hosts.

It's only a sketch of a beginning, but it's something we can already discuss :) Bring it on!

@spbnick spbnick force-pushed the reporting_design branch from 6030829 to 67e8dea Compare July 2, 2018 16:38
* Configuration type: "tinyconfig", "rh-configs" (works only if Makefile
supports "rh-configs" target), other configuration-generating target
supported by the Makefile in "source".
* Shell glob, relative to the "source" root, matching a configuration file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could build RPMs and use the 'target' option to avoid this globbing. :)

* "kernel.tar.gz" - the built kernel tarball
* "kernel.config" - the used kernel configuration
* "build.log" - diagnostics output of the building stage
* "build.done" - empty file, created only if build stage succeeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest having a kernelrelease.txt that contains the kernel release text. That would make it easy to parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I expect those are not all the files we need, just a sketch. That can be a YAML file as well. What do you want a kernelrelease.txt for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we need that string from make kernelrelease when we make tarballs and such? If not, then ignore that request. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, let's see where this takes us and what the code needs :)

@veruu
Copy link

veruu commented Jul 3, 2018

Some thoughts, as I've been thinking about possible reporting implementation in sktm:

  • I'd prefer the success files to have .pass suffixes instead of .done, simply because I find it more clear. Not a requirement at all ;)
  • Can we include build.arch file which would contain the string representing architecture the kernel was built for, to make it easier to include in the report?
  • Can we rename the kernel config to build.config? It would be neat to have the output file names prefixed with the stage they were created in, for easier manipulation.
  • Runner section is missing in the pull, but I think it should be enough to add a runner or beaker file which would contain the list of jobs and sktm would go over the results. I don't think we should get the result XMLs directly since we'd need to retrieve links to logs etc. and include beaker logic anyways so there's no reason to go half-way.
  • Can we include a merge.info file which would contain the info about the merge (I'm thinking a simple json dump, a list of patches applied, baserepo, commit etc. I can implement this based on what data is needed). I was thinking sktm's reporting should be plugged into baseline and patchwork testing (the merge data should be available there) and have the ability for a standalone command "report results in this directory" as well (here the info dump would be needed).
  • I agree with @major that we need the kernel version string (for console log data). Going with the stage naming, we can do build.release file for that. But, since this is additional info we'd need to retrieve from build stage (so far architecture and version release string), I'd propose a build.info file which would be equivalent of the above merge.info just for build stage.

Additional ideas? Comments?

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

@veruu, thank you for your ideas and comments!

OK, I added another commit adding a little boiler-plate run stage description. Simply didn't have time to add it earlier. We can squash all the commits in the end, just thought it might be easier for everyone to see what's changing. Actually, how about we just merge this and continue making our own PRs changing this so it's easier to see what each of us is talking about?

Now, onto answering your points.
We gotta decide what those empty files will actually mean. Options:

  • "The stage ran". No matter which result we got, the file gets created. In this case "pass" would not be appropriate. We can use such files in reporting to determine if we should report about the stage.

    OTOH, we can simply resort to detecting if any of the stage-produced files are present to determine if it ran. This can be made simpler, if we e.g. put all stage files into a sub-directory. I.e. make "merge", "build", and "run" sub-directories in the output directory, and then put files like "log", or "kernel.tar.gz", or "config" in them. This might be less friendly to humans trying to take a glance at how things are doing, since they'll all be in separate directories, but maybe not.

  • "The stage completed successfully". The file gets created only if merge/build/tests succeeded/passed. This forces us into detecting if the stage ran at all some other way. E.g. by checking stage files, or, if we go with stage sub-directories, just by checking for the directory.

    This might make them redundant, though, and so leave space for contradicting data. Because, e.g. for tests we will maybe have junit data, which has pass/fail information, and for the kernel we have the presence of the tarball. That could lead into situations, where e.g. junit says "fail", but the file says "pass", or there's a kernel, but no "pass" file due to a bug.

    So, the question here is rather who does the interpretation of the result? If it's skt, then it should produce everything: the nice text to put in the mail, how to interpret results, etc., and the empty file marking the stage as success. Then sktm would only take the summary and decide on how to put together all the stages in the report. Or it could be sktm, which would look at all the files and generate the text, and then skt shouldn't create the empty "success" file to avoid contradiction. This means, though, that sktm will have to know rather a lot about what happens in skt.

    In any case, I'm not sure we should have "pass" in the file name, because skt is not the final test runner, and it's sktm who decides whether something passed or not. E.g. a repeated fail on one host, but not the other might indicate a host-related failure, and overall pass. So, I would rather call it a "success".

So far I like the approach of "stage sub-directories" (created by skt), so it's easier to determine which one have run (and also to avoid filename clashes between stages), and skt making stage report texts and the "success" file. Then sktm would glue them together and designate which stage means what.

OK, gotta submit this comment before my browser crashes, and continue in another one.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

  • Can we include build.arch file which would contain the string representing architecture the kernel was built for, to make it easier to include in the report?

I think that it should be sktm (or the pipeline) who decide which architectures to build/run. Since they decide that, they should know what the architecture was. I think skt shouldn't serve as a storage for what, whoever invokes it, decided or wanted to remember. Skt doesn't need to know the architecture, at this moment, strictly speaking (it only needs to know which environment variables and arguments to pass to make and what beakerjob.xml to use), so it shouldn't store it. If, OTOH, we decide to introduce that knowledge into skt, then, sure.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

  • Can we rename the kernel config to build.config? It would be neat to have the output file names prefixed with the stage they were created in, for easier manipulation.

Yeah, I think we can do that, or we can go with stage sub-directories and simply name it "config" (i.e. path relative to the output dir would be "build/config").

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

  • Runner section is missing in the pull, but I think it should be enough to add a runner or beaker file which would contain the list of jobs and sktm would go over the results. I don't think we should get the result XMLs directly since we'd need to retrieve links to logs etc. and include beaker logic anyways so there's no reason to go half-way.

I added a small section on "run", but we gotta take it much further.

I'm not sure what exactly you mean in the last sentence, but I think sktm should know nothing about Beaker, if possible. It would be great if skt provided an abstract interface to sktm, and it could be as simple as a text to include in the report (be it e-mail, or a web page), along with files to attach, with possible links to more data. That report can concern itself only with the stage run in question, and only speak about it, so that sktm could then meaningfully combine such reports.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

I'm aiming the previous comment at the idea of having support for different runners, e.g. libvirt or direct qemu, or such.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

  • Can we include a merge.info file which would contain the info about the merge (I'm thinking a simple json dump, a list of patches applied, baserepo, commit etc. I can implement this based on what data is needed). I was thinking sktm's reporting should be plugged into baseline and patchwork testing (the merge data should be available there) and have the ability for a standalone command "report results in this directory" as well (here the info dump would be needed).

This is again the question of "should skt store data in its output it doesn't care about"? I.e. skt doesn't care which patches were merged, once merge stage is done. It can use that information to e.g. produce a merge stage report, but neither build nor run stage care about them. OTOH, sktm knows which patches it asked to merge, and all the extra circumstances, and it should be able to associate those.

The "report results in this directory" would be interesting functionality, however, I think making skt store sktm's data would not hold in the long run anyway. If we go that route, we will end up with more and more data being stored by skt, and it being less and less relevant to it, as we build complex pipelines and combinations of skt runs. Instead, if sktm needs to quit during a pipeline and resume it later, it should create its own persistent storage scheme (in filesystem, or in database, or in both) and encapsulate skt stage outputs into it.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

  • I agree with @major that we need the kernel version string (for console log data). Going with the stage naming, we can do build.release file for that. But, since this is additional info we'd need to retrieve from build stage (so far architecture and version release string), I'd propose a build.info file which would be equivalent of the above merge.info just for build stage.

This is something that indeed only skt can know, because it's its job to actually build the kernel and only it can run make. Yes, something like build.release, or build.info, or (going with stage sub-directories) build/info would work.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 4, 2018

I would love to hear what you think about all this and am excited to find out what we'll come up with :) So, shall we perhaps merge this as is (or with tweaks to what's already there), and go on to submitting more PRs amending and discussing it, each with our own ideas? Do you think that would work? Otherwise I feel like we'll end up miscommunicating more through hand-waiving in the comments :)

@veruu
Copy link

veruu commented Jul 4, 2018

I really don't want to complicate things too much. If the stage successfully finished, get a .pass file, if there are any issues, .log one. sktm can then come and put the report together based on what files it sees. If the stage ran, there will always be one of these files present so a "done" file seems redundant. I really don't see how different files would be contradicting (except when there's a bug that needs to be fixed), skt knows if the stage succeeded or not and should output the results accordingly.

If the output files are always going to be named differently, do we really need to deal with subdirectories? It seems like additional complication to deal with on both sides.

I think that it should be sktm (or the pipeline) who decide which architectures to build/run.

Then we need to figure a way for the sktm/pipeline to get that info to the reports (if we want to report it). Should they be creating the files in the correct directories? We need to be able to associate the runs with the arches.

Skt doesn't need to know the architecture, at this moment, strictly speaking (it only needs to know which environment variables and arguments to pass to make and what beakerjob.xml to use), so it shouldn't store it. If, OTOH, we decide to introduce that knowledge into skt, then, sure.

The "report results in this directory" would be interesting functionality, however, I think making skt store sktm's data would not hold in the long run anyway. If we go that route, we will end up with more and more data being stored by skt, and it being less and less relevant to it

I'd say having skt describe "this is what I did" would be pretty useful. Then if needed, the end user can get to the directory and see exactly what was done, instead of trying to match a command from last week to the right results. As I said before, we can go with .info file which can be a dump of parameters skt was called with or something similar (and then we can get rid of the mess discussed above).

I don't want skt to store any data, I want it to output "this is what I did and these are the results", so it's easy to match the command and output without having to dig around too much. We already dealt with it not being easy to match reports to the originating runs, and I really don't want to do it again.

I'm aiming the previous comment at the idea of having support for different runners, e.g. libvirt or direct qemu, or such.

Good point. What would you say if we had all supporter runners create a same output, and use that (runner.results or whatever)? It could contain the list of tests that were run, and for each run, it's status and, if needed, links to logs or filenames which should be attached (which would be created in the directory). If each runner had the same style of output, sktm could deal with it easily, and based on the presence of .pass or .log file will know if the stage completed successfully or not (eg. the job could fail due to host issues but the results are still good).

@spbnick
Copy link
Contributor Author

spbnick commented Jul 5, 2018

Let me try to sell you one idea at a time :)

How about skt's interface to sktm for each stage was (however expressed) a success/failure/error status, a human-readable report text, and a list of files to attach to that report? And that's all?

I.e. sktm wouldn't really need to know anything about what a kernel config is, or a merge log, or a junit file? It seems to me that sktm doesn't really needs to know that. It just needs to know if merge failed, a build failed, or tests failed and make a report.

For example, an "skt merge" can fail and produce a report text like this (e.g. in a file merge/report.txt or merge.report.txt):

We applied the following patches:

  - [RHEL7.6 BZ 1595687 1/6] net/mlx5: E-Switch, Add callback to get representor device,
    grabbed from <HOSTNAME>/patch/358242

  - [RHEL7.6 BZ 1595687 2/6] net/mlx5: E-Switch, Move representors definition to a global scope,
    grabbed from <HOSTNAME>/patch/358243

  - [RHEL7.6 BZ 1595687 3/6] net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager,
    grabbed from <HOSTNAME>/patch/358246

on top of commit 291a12fd75d8 from the repository at
  git://git.repo.org/kernel.git


However, the application of the last patch above failed with the
following output:

    Applying: net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
    error: patch failed: drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:822
    error: drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: patch does not apply
    Patch failed at 0001 net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager

Please note that if there are subsequent patches in the series, they weren't
applied because of the error message stated above.

An skt build can fail and produce a report like this:

Please see build.log for details.

Skt can actually put specially-formatted references in the reports it produces instead of plain filenames, so that sktm can reformat them with any prefixes it sees necessary (and also get a list of attachments from them). E.g. skt can produce this report instead:

Please see [build.log] for details.

And if e.g. that was a build for ppc64le, sktm would take the build.log from the skt output directory, rename it to e.g. ppc64le.build.log, attach it to the e-mail, and replace [build.log] with the attached ppc64le.build.log in the report text. And if we were making a web-page report it could replace it with markup for the link instead.

For the skt run stage, skt can produce a report text detailing which tests were ran, and attaching any logs produced. E.g.:

The following tests ran:
    Boot test
    LTP tests

LTP tests failed, please see [LTP.log] and [console.log] for details.

Then sktm would be able to attach the (renamed) files and rewrite references in the report in a similar way.

When all the various stages from all architectures are collected, sktm would be able to put them together and summarize. E.g. something like this:

MERGE FAILED
<skt merge report piece from above>

or

BUILD FAILED for ppc64le
<skt build report piece from above>

I.e.

BUILD FAILED for ppc64le
Please see the attached ppc64le.build.log for details.

or

TESTS FAILED on ppc64le
<skt run report piece from above>

I.e.

TESTS FAILED on ppc64le
The following tests ran:
    Boot test
    LTP tests

LTP tests failed, please see the attached ppc64le.LTP.log and ppc64le.console.log for details.

(we can measure the distance between references and add "the attached " only if the previous one further than e.g. 100 characters, so we don't end up repeating it too often).

What do you think?

@veruu
Copy link

veruu commented Jul 5, 2018

That can work. I like the idea with the placeholders, especially when eg. multiple builds failed and are supposed to be reported in a single report, we'd need to distinguish between them. I'd still create the .pass files - that way it would be easier to see if the stage completed fine or not, both if we want to take a look in the directory and for the standalone command. If skt would create the reports itself, we can avoid the .info files. We'd need to create a smart way of passing the arch info from pipeline to sktm since sktm currently doesn't know anything about that, but that's a different problem.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 9, 2018

OK, I added a paragraph and output directory item describing the report files in another commit. Please take a look.

Now, regarding the .pass files. I wouldn't like to have "pass" in their name, because it is not for skt to decide what is a pass and what is a fail, ultimately. That has to be left to a test framework, such as sktm. I think stk's job should be to try to do merge/build/test and report if it failed or succeeded doing them.

Only the entity which actually gates something can give a verdict whether something (e.g. a patch series) "passed" or "failed" the tests. Skt lacks the context for that. I.e. it can assume that a "pass" is when a patch merges/builds/and passes tests, but when its results are going to be integrate into the sktm framework they will become confusing.

E.g. skt's "pass" can become a "fail" in overall sktm result, and vice versa (e.g. when testing for a faulty host). We can take special care to hide that in sktm, but it will eventually show up, either in an occasional confusing reports, in awkward code, or in an actual bug, where results would be interpreted incorrectly.

OTOH, "success" is neutral and is not giving any verdict. It just says that the merge/build/test stage succeeded. Anyway, that's my idea behind not using "pass" in the names of those files. I'll probably cringe a little when using "pass", but I'll easily survive that :)

@spbnick
Copy link
Contributor Author

spbnick commented Jul 9, 2018

Another thing about those files, is that just marking "success"/"pass" is not enough. We need to differentiate between "failures" (merge/build/test failures) and "errors" (infrastructure/code failures) too, and just presence/absence of the "success"/"pass" failure doesn't cover that.

How about we have either a "success" or "result" file encoding that? Let's say that each stage will create an executable .result file. I.e. merge.result, build.result, or run.result, and it should return either success (zero exit status), or failure (non-zero exit status) for the stage success/failure respectively. Then absence of that file (but presence of other stage files) would indicate there was an error.

A simple way to implement those would be simply having an executable file containing either true or false text and nothing else.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 9, 2018

We can name them merge.success, build.success, and run.success, or even merge.pass/build.pass/run.pass, but then that might create the illusion that the stage succeeded based on just the file existence, and would be confusing.

@veruu
Copy link

veruu commented Jul 9, 2018

I think stk's job should be to try to do merge/build/test and report if it failed or succeeded doing them.

I agree, and that's what I want to be marked clearly in the filename. If it's going to be .pass or .success doesn't matter to me, it means the same - that the stage passed (completed without failures). I was against using .done because it doesn't say the actual result and .pass was the first thing that made sense to me, but if you like .success more, I'm totally fine with that :)

E.g. skt's "pass" can become a "fail" in overall sktm result, and vice versa (e.g. when testing for a faulty host).

How? It's skt's BeakerRunner (or possibly a different runner in the future) which tries to account for host-specific issues, resubmits the jobs as needed, and reports success if the resubmitted jobs passed. I don't see how sktm would even go around finding out the failures are bound to a specific host.

that might create the illusion that the stage succeeded based on just the file existence, and would be confusing.

sktm should trust skt's reports, and not try to overwrite the results. If the stage reports success, then the stage succeeded and we shouldn't report otherwise. The only thing sktm should be putting together is aggregation of different runs.

<snip about differentiating between failures and errors>

I agree that we need to differentiate between them, but I don't think sktm should be the one doing it. I think that no files for that stage should be created and the return code should be checked in the pipeline if needed (and eg. re-run the stage later if we want to). sktm should be okay with creating reports that didn't go through every stage, and it doesn't need to know about failed infrastructure.

Or, another idea that just occurred to me, if we don't want to mess around removing the created files when the infrastructure fails, we can save the exit status in general into the .result file (0 is success, 1 is failure, 2 is infrastructure issue). On success/failure we can do it at the end of the stage, for infrastructure we can do it on caught exceptions. sktm can skip the report from that stage as it sees 2.

Hope I got everything :) thoughts?

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Riiight, I keep getting confused where that host retry logic is. Yes, sktm doesn't interpret skt results as anything but straight pass/fail.

I'm suggesting encoding infrastructure failure as a missing .result file because in the worst case scenario (skt crash) it simply won't be created. We don't really need that file for sktm/pipeline itself, because it will do with just the exit status. Skt doesn't really need it either. I thought it would be convenient to humans when inspecting stage artifacts. Having it as an executable returning exit status makes it easy to use in scripts, having contents of true or false makes it easy to interpret for humans without running. Let's call it .result, and not .success or .pass so that humans listing the files don't jump to a conclusion without looking inside, or running it.

Yeah, I don't want to remove files when stage fails, because we won't be able to do that in general (e.g. in case skt crashes with a segfault). Let the absence of the .result file speak for that ("hey, there was no result!"), and let`s clear the output files at the (re-)start of the stage for consistency.

Unless you have strong objections, let's move onto more important stuff. We can change it later. I feel like we're going way too deep into gory details and are slowing down.

@veruu
Copy link

veruu commented Jul 10, 2018

Let the absence of the .result file speak for that ("hey, there was no result!"), and let`s clear the output files at the (re-)start of the stage for consistency.

Okay, works for me. Glad that we are finally getting somewhere :D

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

OK, let's talk about the structure.

Below are two mock-ups of artifacts for a complete run through all stages, flat
vs. subdirs.

Flat

I chose the scheme of files sharing the same name for the same stage, but
different extensions, so it's easier to aggregate/separate them. I'm also not
sure about the run stage output. However, we can just say that extra
unspecified files are possible, and that we can refer to them from the report.

merge.source
merge.result
merge.log
merge.report
build.kernel.tar.gz
build.kernel.config
build.log
build.report
build.result
run.log
run.report
run.result

Subdirs

merge/
    source
    result
    log
    report
build/
    kernel.tar.gz
    kernel.config
    log
    report
    result
run/
    log
    report
    result

I like the subdirs approach better when several stages are in the same dir,
but when they're alone it looks a bit awkward. I.e. if you just run the merge
stage the output directory would contain the merge directory, and then all
the files for that. At this moment, I'm OK with either approach. Which one do
you like?

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Ah, one advantage of the "subdirs" approach, is that it's easier to determine/check if the stage ran or not.

@veruu
Copy link

veruu commented Jul 10, 2018

I prefer the first one, mainly because of multireports -- I was thinking we could have

  • all files in the single specified directory for a single-run report
  • merge results in the specified directory and build/run results in subdirectories, which we'd need to traverse to make sure we got them all (since the merge results are the same for all multiruns) - no recursing further, only checking the subdirs of the passed directory

I feel like the the subdirectories for stages would add too much nesting. I mean, we could deal with it if really needed but it's too much structure for little benefit (you can already see if the stage ran based on the file names). If you have alternative idea how to handle multireports please let me know, but this is the only smart idea I have right now :)

Thinking about what we discussed previously, do we need both .log and .report? I'd say that the report one is enough, and if there are any logs or other attachments that should be added they should be referenced in the report. Or listed separately in a .attach file. Logs are only created on failures, in case of merge failures the text is in the report while with build failures the logs are supposed to be attached (together with the kernel config) and run stage doesn't have any logs at all currently. So I'd leave the .log out completely.

Other than that, we already talked about the file naming so nothing to add there and I agree

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

I added description of .result files and exit status.

So far I agree, extra subdirs can be unwieldy, let's go with just flat directory and have the stage files start with the stage name and a dot. I'll add another commit describing that.

Right, how about we simply don't define the .log files, only define the .report file, and say that there can be extra files referenced by the report, for the start? Later we can add requirement of specific files to the documentation, if they're e.g. needed by other tools or for some other reason besides just reporting.

I don't think we need a separate .attach file. The references in the report text should be enough.

@spbnick spbnick force-pushed the reporting_design branch from de01860 to b08ac34 Compare July 10, 2018 13:18
@veruu
Copy link

veruu commented Jul 10, 2018

👍

@spbnick spbnick force-pushed the reporting_design branch from b08ac34 to ecc787d Compare July 10, 2018 13:34
spbnick added 2 commits July 10, 2018 16:37
We do not need to describe test result representation now that we have
reports.
@spbnick spbnick force-pushed the reporting_design branch from ecc787d to c882add Compare July 10, 2018 13:37
@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

OK, did that. What else do we need to decide/change?

@veruu
Copy link

veruu commented Jul 10, 2018

For skt, nothing else comes to my mind right now. For the reporting part, it would be nice to know which architecture the build/run belongs to (and without that ppc chaos), especially for multiruns. I'm assuming it would be best for the pipeline to pass these information from since sktm has no notion of architectures and skt's one is a bit skewed. Should the pipeline create the .arch file I mentioned earlier? Or do you have a better idea?

For the record, I don't think the architecture should be coded into the directory name - what if we want to have separate runs for AMD/Intel CPUs, the architecture is the same and the directories would just be overwritten.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Provided we're still going with the Jenkins pipeline deciding which builds to do and which tests to run, we need to give it a way of communicating those to sktm. I don't think skt should be involved in that.

I think we need to stop skt making junit outputs and let pipeline present skt output directories to sktm, along with their relations. I'll think about this for a bit. Write if you have any ideas!

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Technically, we have Jenkins build artifacts and JUnit results. The latter can be generated by the pipeline itself.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Build artifacts too, of course.

@veruu
Copy link

veruu commented Jul 10, 2018

I think we need to stop skt making junit outputs and let pipeline present skt output directories to sktm, along with their relations

WFM. More independent code if we remove the junit generation \o/

Provided we're still going with the Jenkins pipeline deciding which builds to do and which tests to run, we need to give it a way of communicating those to sktm. I don't think skt should be involved in that.

Also agreed. Or whatever pipeline we decide to go with. Neither sktm nor skt should be deciding by themselves what they'll run out of their own will.

The question stays how we pass that info. I'm advocating the .arch file since the pipeline will be creating the directories for specific runs, so it can create the file in them as well. We need a way to pass the directory to sktm's watcher on job completions, but passing the arch info the same way wouldn't work for the standalone report command which would be nice to have. Plus, the arch info is bound to the build so the data should be present with the stage data, not only for reports but for users as well (doing a bunch of builds and trying to find out which ones are for which architecture afterwards).

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

Still thinking about this and your ideas. However, how about a radical approach (similar to what we decided with skt): let the pipeline generate the report text, leaving only sending/publishing the results to sktm?

We can let skt provide all the necessary tools for manipulating its reports (e.g. adding prefixes to output files / references in reports), to make that easier. The pipeline report can be a text report with references plus a bunch of files (same as skt), only it will be single report for the whole run. The pipeline is the best one to know what it was testing and how all these stages are related to each other.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

The above would also work if the "pipeline" was e.g. just a shell script.

@veruu
Copy link

veruu commented Jul 10, 2018

I understand where you're coming from, but I feel that would make the pipeline even more complicated than it already is. I don't want to get too dependent on the pipeline, instead it should be just glue between sktm and skt, doing nothing except running skt's stages and getting their results back to sktm.

Yes, the pipeline knows what it runs, but then everybody who wants a custom pipeline file would need to implement the report merging there, and I'm really not a fan of it. Or we'd need to support various ones and implement them ourselves, writing X solutions for the same thing, which is pretty error prone (and I really don't want us writing the same thing multiple times, and then having to propagate a single change into each file etc). Especially if we still going with the ability to use different pipelines / run without jenkins. Supporting this with different solutions would be a disaster.

skt stage command replaces existing files completely. E.g. it will need to
remove all files that might have been created by the previous run of the same
command, before starting adding its own, so the output file set is not
inconsistent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we leave this to the administrator/operator/pipeline to clean up rather than make these applications aware that they need to remove or clear something. They should output whatever they need to without needing to know that something needs to be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@major, I think I understand where you're going with this, and I like the idea. However, since skt's output here is a particular set of files, which can be different depending on results, and which represent the results (i.e. which files are there changes meaning), I think it's right to let skt reset that set of files before output. It's kind-of similar to using O_TRUNC with open(2).

either the `true` or `false` strings, for the overall stage status of success
or failure respectively. If the file is missing, but any of the other output
files are present, the stage can be considered as incomplete, i.e. it must
have exprienced an error, and there was no conclusive result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/exprienced/experienced/


#### Exit status
Execution of each stage should return an exit status of zero, if the stage
completed succesfully, one if it failed, and something else if an error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/succesfully/successfully/

Build
-----

Produce a kernel tarball.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree here. A tarball is so much easier to move around.

Copy link
Contributor

@major major left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't mean to hit approve. ;)

@spbnick
Copy link
Contributor Author

spbnick commented Jul 10, 2018

@veruu, let me think about that and see if I can come up with some examples which can convince you it won't be that bad :) I might not be able to, and it might not be good, but I'll try :)

#### Output directory

* "build.kernel.tar.gz" - the built kernel tarball, as required by "run" input.
* "build.kernel.config" - the used kernel configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, we are always compressing the kernel config because it's huge. Given that we want just to attach any referenced files without additional logic, I think it would make sense to create a compressed build.kernel.config.gzfile. Similarly with anything we'd want to compress, although this is the only thing now.

If not, we'd need to pass additional data to the reporter to know which files should be compressed, which I think adds useless complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can compress them. Another approach can be to let the reporter decide whether to compress them or not, depending on the file type and/or size.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That occurred to me too, but we aren't really mentioning any file types as suffixes and determining the size or type would just add complexity we don't need. "Here are the results, merge and send them out". If it's really needed we can of course deal with finding out size and type, but that's something we can change if it turns out a better option in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we can remove this from the specification as well for now. The report will be referencing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 11, 2018

Thank you, Major, I fixed the typos. I also removed build.kernel.config from the output requirements, since that is only needed for the report, which can reference it.

then e.g. the resulting e-mail would read:

x86_64 build failed:
See the attached x86_64_build.log.gz for details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what do you plan skt knowing what to add here as prefixes since it doesn't know much about the arch (unless it's going to be a random string)? This is also super confusing, why do we need to suddenly nest reports when we decided that skt will generate a single text report file? TBH I don't see any value in this change, besides complicating things

@spbnick spbnick force-pushed the reporting_design branch from 7eb2189 to a97a826 Compare July 11, 2018 13:30
@spbnick
Copy link
Contributor Author

spbnick commented Jul 11, 2018

@veruu, OK I added a couple commits updating the "Report text" section. I propose adding support for including "reports" into other reports, with "namespacing". Please see the text before reading further. So that the pipeline (whether it's a Jenkins pipeline, or a shell script, or whatever) doesn't have to know how to assemble them or use any special tools. It just has to output the report text referring to all the skt reports in question.

We can require the Jenkins pipeline to produce a specifically-named artifact archive containing file "report" in this format, which will include all other reports, and files to attach. For example, we can have this structure:

report
merge.report
x86_64/
    build.report
    run.report
ppc64/
    build.report
    run.report
ppc64le/
    build.report
    run.report

The report file would then be able to refer to the specific reports as e.g. <:merge.report> or <x86_64:x86_64/build.report>.

That would only require the pipeline to be able to output a text file, which pretty much any programming language can do without difficulty. The overall job status can be returned as pass/fail/error. For Jenkins it would be SUCCESS/FAILURE/<ANYTHING_ELSE>, for a shell script it can be 0/1/<ANYTHING_ELSE>.

A nice perk would be that the pipeline would be taking care of all the branding and recipient-related niceties and we wouldn't have to supply any sort of e-mail templates and such to sktm.

What do you think?

@spbnick
Copy link
Contributor Author

spbnick commented Jul 11, 2018

Another nice thing is that only sktm would have to have the code for rendering these reports as both skt and the pipeline would only write them.

@spbnick
Copy link
Contributor Author

spbnick commented Jul 11, 2018

Oh, and the pipeline would be able to add its own files to the overall report easily.

@veruu
Copy link

veruu commented Jul 11, 2018

Just so I understand it correctly (please correct me if not, since I'm really confused by this proposal):

  • This doesn't get rid of the requirement of implementing multiple pipeline solutions if we want to support them.
  • "We can require the Jenkins pipeline to ...." so now we don't plan to support running without Jenkins at all?
  • Where does that "report" file comes from? If Jenkins is supposed to be creating it based on the what it sees, see my previous two points - it's too much logic for the pipeline which is just supposed to be a glue. If skt is supposed to do something with it, then I'm curious how it would create eg. the summary for multireporting.
  • Regarding the templates, if sktm should be responsible for sending the reports, then sktm should be the one which parses and puts the template together.

This is starting to move towards "let's replace sktm's responsibilities by the pipeline" direction, and I'm not a fan of it at all. Let me know your answers to my questions since I don't see them :)

@veruu
Copy link

veruu commented Jul 16, 2018

Okay, let me write up the directory structure that's currently expected by sktm/#108 pull.

For a single test run, just create a single directory with a random name, which is then passed to skt[m]. The directory is used as both input and output directory for all skt stages, and passed as a directory to report results from for sktm.

For multiple runs (which is what's most interesting), it's a bit more complicated, so I'll illustrate it with this example.

  1. Create a random directory. Let's call it parent_dir for the example. This will be the directory that is input and output for skt's merge stage, and passed to sktm's reporter at the end.
  2. For each separate run (here is where we branch out), create a subdirectory of the parent_dir. Let's say we want to have two different runs, one for x86_64 with Intel CPUs and one for AMD, so we call the subdirectories intel and amd. The names can be random but unique in regards to parent_dir contents.
  3. For build, the parent_dir with merge results is the input, and the associated subdirectory we created in 2. is the output. For runner, the subdirectory is both the input and output.

In the end, if all the stages completed, we'd end up with

parent_dir/
    merge.result
    merge.report
    <source_dir and anything else the merge stage wants to output>
    amd/
        build.result
        build.report
        <kernel config, built kernel etc>
        runner.result
        runner.report
        <any other runner output>
    intel/
        build.result
        build.report
        <kernel config, built kernel etc>
        runner.result
        runner.report
        <any other runner output>

Does it make sense?

@spbnick
Copy link
Contributor Author

spbnick commented Sep 4, 2018

Regarding your last example, yes, that's how it could look. Personally, I wouldn't put data from build/run stages under the directory from merge stage, but it could work that way too.

Regarding your previous comments/questions:

  • This doesn't get rid of the requirement of implementing multiple pipeline solutions if we want to support them.

We have agreed that we'll do report merging in sktm completely (I don't like that, but OK). I'll just try to answer what my idea was regarding making pipeline link the reports together, and having to deal with multiple Jenkins pipeline implementations.

I think it would work OK, since Jenkins pipeline would need to know what skt stages it runs and how many in parallel, and OTOH doesn't need to know about anything except the .report files (for merge/build/run stages), because the .report files include everything else necessary. The idea was that sktm would deal with the details of including and attaching the files, and skt would deal with formatting particular stage reports. The pipeline would only need to write the summary - what it actually did, and sktm wouldn't need to care about anything except call the mechanical report-assembly function and interpreting the pipeline result.

  • "We can require the Jenkins pipeline to ...." so now we don't plan to support running without Jenkins at all?

We absolutely plan supporting whatever our idea would be to replace the Jenkins pipeline. We can require that "alternative tester" to produce a similar tarball, or something else, our interface would be just PASS/FAIL/ERROR status and a report file which we can feed to that report-assembly function and produce an e-mail/HTML/whatever. Sktm won't need to know anything about any other sub-reports, or files they include.

  • Where does that "report" file comes from? If Jenkins is supposed to be creating it based on the what it sees, see my previous two points - it's too much logic for the pipeline which is just supposed to be a glue.

Jenkins is supposed to be creating the report file based on which skt commands/stages it ran, and only that, not what it sees from skt. It won't need to know what skt actually does, because skt would produce its own reports for the stages. The pipeline would only "link" them together through references.

If skt is supposed to do something with it, then I'm curious how it would create eg. the summary for multireporting.

I'll try to explain what I mean with a trivial example, which is much simpler than what we have, but I might be able to explain it more easily this way.

Let's say we have a "single-arch" pipeline (it doesn't have to be Jenkins, but we can imagine it's Jenkins). Just skt merge, skt build, and skt run, which can run in the same directory. On success, the pipeline would need to produce a report file like this (totally made up):

We tested your patch and it PASSED with flying colors!

<:build.report>

<:merge.report>

<:run.report>

Send more patches!
The Kernel CI team.

Which then could be rendered by sktm into something like this:

We tested your patch and it PASSED with flying colors!

We took the kernel from http://git.kernel.org/torvalds/linux.git
and applied the patch downloaded from http://patchwork.kernel.org/patch/235221

The build for x86_64 went without a hitch, but if you'd like
you can see the build log (attached in build.log).

Finally, we ran the following tests, which all PASSED:
  - /kernel/distribution/ltp/lite: PASS
  - /kernel/filesystems/nfs/connectathon: PASS
  - /kernel/standards/usex/1.9-29: PASS
  - /kernel/RHEL7/vm/hugepage/libhugetlbfs: PASS

Send more patches!
The Kernel CI team.
  • Regarding the templates, if sktm should be responsible for sending the reports, then sktm should be the one which parses and puts the template together.

Sure. It could have something like render_report(format, path) function which would accept the format to output in (e-mail/HTML) and a path to the top report file.

Anyway, these are just my musings, I'm not asking to change anything, let's proceed as planned for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants