Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] First proof of concept for Nextflow logs #45

Closed
wants to merge 1 commit into from

Conversation

ewels
Copy link

@ewels ewels commented Mar 12, 2024

Aim

Aim is to support log files from Nextflow - a workflow manager / analysis pipeline builder commonly used in bioinformatics research (eg. DNA sequencing and more). Every run produces very long and verbose .nextflow.log files which are great for debugging, but extremely difficult to read. Formatting would greatly help with this.

Continuing our discussion on Discord, here are the results of my very quick and dirty play with Nextflow logs this evening.

How it looks

CleanShot.2024-03-12.at.23.42.21.mov

Reproducing locally

Example log file: .nextflow.log (renamed to make GitHub happy, typically called .nextflow.log, or .nextflow.log.6 etc.)

Can generate yourself by running Nextflow:

# Install Nextflow
curl -s https://get.nextflow.io | bash
# Run a pipeline with a tiny test dataset
nextflow run nf-core/rnaseq -profile test,docker --outdir results
# Check the log
tl .nextflow.log

Discussion

  • Things I like
    • Formatting of the logs, I think that they look pretty good
    • Native toolong functionality, such as tailing, search, navigation etc.
  • Things I don't like
    • My code
    • The fact that I disabled a bunch of functionality for other log types to make this work.

These logs have a few additional bits of complexity:

  1. Log outputs can span multiple lines
    • Whilst looking at this during submitting the PR, I realise that anything without a timestamp is a continuation of the last log message that did. It should maintain the formatting from that log level (eg. dim for debug, red background for error etc).
  2. There are multiple format styles
    • Nextflow produces output in a bunch of standardised ways. I ended up writing 5 different formatters to handle the different cases
    • There can also be arbitrary unstructured output from multi-line strings. These additional 4 are effectively special cases of this.
  3. Some of the formatting rules being applied by the other formatters broke stuff / make the output look worse
    • I simply turn them off in this PR, but that's obviously not a real solution
    • This would need some kind of switch mechanism to tell toolong that we're looking at a Nextflow log file and only use that set of formatters for the rest of the session.

Going forward

This PR is just my way of asking for advice. I'm curious to see whether you think that this could / should live within the toolong codebase still, or whether it should somehow extend toolong as a separate package / plugin. Or whether it should be destroyed with fire and never be spoken of again 🔥

@abhi18av
Copy link

Thanks @ewels for being the Nextflow champion!

This would need some kind of switch mechanism to tell toolong that we're looking at a Nextflow log file and only use that set of formatters for the rest of the session.

Agreed - for the user experience pov, perhaps a $HOME/.tl.yaml file could be used for specifying the formatter level settings?

This could ideally allow the formatter specific development to proceed independently as only that particular scope with the $HOME/.tl.yaml file would be affected for turnining some features on/off.

@ewels
Copy link
Author

ewels commented Mar 14, 2024

Hi @willmcgugan!

After coming back to this, I think what would make most sense is:

  1. Add a new --language CLI option to force parsing log files in a specific format, which can turn different highlighters on and off
  2. Add some entry points or similar so that 3rd party tooling can add custom formatters without bloating the main toolong codebase

This would mean that the above code could move to somewhere like https://github.com/nf-core/tools and "just work" if a user has both toolong and nf-core installed (we would add toolong as a dependency to nf-core, so that would always be the case that way around). Then we can add to that custom code without bothering you and messing with the main codebase. Even the CLI flags could be handled this way if you wanted.

I can't figure out how to get the Nextflow formatters to work without disabling the other core formatters. Maybe you can, but otherwise the --language flag could force a specific set. This could also be set automatically based on filenames (these files are always called .nextflow.log*, though of course could be stdin etc).

How does this sound? If you like the idea, I can close this PR and open a different one with some entrypoints that I can start using to port the code into the nf-core codebase.

Cheers,

Phil

@ewels ewels mentioned this pull request Mar 22, 2024
@ewels
Copy link
Author

ewels commented Mar 22, 2024

Closing in favour of #47 and nf-core/tools#2895

@ewels ewels closed this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants