Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin descriptors with several instances with different arguments #138

Open
magicDGS opened this issue May 22, 2018 · 4 comments
Open

Plugin descriptors with several instances with different arguments #138

magicDGS opened this issue May 22, 2018 · 4 comments

Comments

@magicDGS
Copy link
Contributor

The normal use-case for plugin descriptors in GATK is to have a single instance of each class to apply to data, and provide twice the same class is not allowed (e.g., read/variant filters). Nevertheless, there are some cases that it makes sense to have two different instances for the same class to apply to the data with different arguments, which I do not find the way to implement with the current system.

As an example, let's say that we would like to implement a different ReadFilter plugin from GATK and a filter called IntegerTagReadFilter with two arugments: --int-filter-tag (String) and --int-filter-tag-value (Integer). Thus, the user should be able to provide the following command line: --myReadFilter IntegerTagReadFilter --int-filter-tag NM --int-filter-tag-value 2 --myReadFilter IntegerTagReadFilter --int-filter-tag AS --int-filter-tag-value 3.

This can be implemented as a single filter with the arguments being specified as List<String> and List<Integer> and follow the same implementation as GATK's ReadFilter. Nevertheless, this is not desirable in this case because we would like to keep every instance separated to be able to count the number of reads in each filter.

This is just a toy example, but it does not look that the plugin system allows this kind of implementations. Maybe I am missing something about it...

@cmnbroad
Copy link
Collaborator

cmnbroad commented May 22, 2018

@magicDGS In addition to plugin framework changes, I think this would require parser changes. Currently the parser assumes all command line argument names are unique and independent (in the sense that command line order doesn't matter). This would require some kind of name qualification or grouping mechanism so that --int-filter-tag NM --int-filter-tag-value 2 would target one instance of a class, and --int-filter-tag AS --int-filter-tag-value 3 would target a different instance.

@magicDGS
Copy link
Contributor Author

I see the problem with that, and I don't have any concrete solution. An idea is use plugin as tagged arguments or something similar. For example, --myReadFilter IntegerTagReadFilter:NM=2 --myReadFilter IntegerTagReadFilter:AS=3. This might require to change how plugins are handled and provide a way to populate @Argument from tag-like strings and show them like that in the cli help.

@cmnbroad
Copy link
Collaborator

If its mostly about the summary counts, it might be easier to explore ways to allow read filters to have custom counting and summary display behavior.

@magicDGS
Copy link
Contributor Author

@cmnbroad - I set that as an example, but I am implementing other plugins to compute statistics from reads. As a simple example, counting separately the number of reads with NM=2, NM=3 and AS=4 per window; as I said, I can always implement with List arguments, but I rather prefer to have an instance of each of them and a common implementation of a simple "counter" for a tag-value pair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants