-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor #31
Closed
Closed
Refactor #31
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added a connections parameter
…ecide if it is better to read the README or the website documentation
…at the plugins model can be specified by the user. Still need to determine how to do this but this is the first step
--- 1. Allowed the user to create an environmental variable called IBDCLUSTER_CUSTOM_PLUGINS. The user can turn on or off the stock plugins so that the program only uses the custom plugins or it can use the custom plugins with the main plugins.
Plugin configuration
…ronmental variables
Bumps [ipython](https://github.com/ipython/ipython) from 8.4.0 to 8.10.0. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](ipython/ipython@8.4.0...8.10.0) --- updated-dependencies: - dependency-name: ipython dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bump ipython from 8.4.0 to 8.10.0
Bumps [jupyter-core](https://github.com/jupyter/jupyter_core) from 4.10.0 to 4.11.2. - [Release notes](https://github.com/jupyter/jupyter_core/releases) - [Changelog](https://github.com/jupyter/jupyter_core/blob/main/CHANGELOG.md) - [Commits](jupyter/jupyter_core@4.10.0...4.11.2) --- updated-dependencies: - dependency-name: jupyter-core dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…4.11.2 Bump jupyter-core from 4.10.0 to 4.11.2
… version of program to 1.2.1 to represent this change
…eing improperly formatted. Also recreated the environment.yml and requirements.txt file so that it reflects the need for python version >= 3.10 not >= 3.8
---- 1. Added the ability for the user to implement a sliding window approach to determine loci of interest: The user can now pass the flag --sliding-window to the program. This value defaults to false. If true then the program expects the gene info file to just have 1 line and the program will create a range from the loci start to loci end with a step size of 1000. This range is inclusive of the end. The program then creates a Genes namedtuple that it uses in the genes_generator. 2. Updated the cluster.load_gene_info function: This function now accepts an argument called sliding_window which will be either true or false. The value of this argument is checked in a match statement. If True, then the program will read the only line in the genes file in to memory and then extract the full range of the loci. It will then create 1 MB steps to the inclusive end of the range. Note: the final step may not be 1 MB, but is not expected to be > 1 MB. It will then yield a Genes namedTuple where the name is now the first column in the genes info file with the range for each window appended to it. If sliding_window == False then the program behaves as before.
… the Genes namedtuple was not being created correctly when the sliding window option was true
--- 1. Created unit test for the cluster.main.load_gene_info method This method is responsible fo reading in information about the loci such as gene name, chromosome, start and end position from a gene info file provided by the user. The method returns a generator that yields a Genes object which is a namedTuple that has attributes for the information. Created a class, TestLociLoader, that has three methods: test_load_loci_info_no_sliding_window, test_values_of_sliding_windows_formed, test_number_of_sliding_windows_formed. * *test_load_loci_info_no_sliding_window*: test when the user doesn't use a sliding window. Checks to make sure the named tuple has the proper keys * *test_values_of_sliding_windows_formed*: test when the user uses a sliding window. Makes sure that the first window formed has the correct name, chromosome, start and end position * *test_number_of_sliding_windows_formed*: test the number of sliding windows formed for the range in the gene_info.txt file.
Sliding window
made sure the chr in Genes namedtuple is an integer
…oups for the user, testing, and development
…dependencies. The development dependencies are optional so the user will have to specify that those should be installed if they wish to install them.
--- 1. Removed the gene info file input: Replaced this flag with 2 flags: gene_position and gene_name. These can be provided by the user. The motivation is that the program was not designed in a way to run multiple gene targets from the gene info file so this file was restricted to one line anyways and was unnecessarily. This input is recorded in the log file so the user can still tell what they ran. The --gene-position flag expects the user to provided a string of the format "chromosome:start_position-end-position". The --gene-name flag allows the user to give a name to the gene. This flag defaults to test if the user doesn't provided anything. 2. Updated the load_gene_info function in the cluster.main file: This function now returns a list of Genes instead of a generator. The function uses a regular expression to pull out the chromosome number and start/end positions from the provide chromo_pos_str. Still retains the ability to create a sliding window every 1MB. 3. Add a callback to check the format of the --gene-position string: Added a function called check_gene_pos_str in callbacks.check_inputs.py. This function makes sure that the string is formatted as "chromosome:start_position-end_position". This format is checked using the re.split to make sure that the resulting split list is of length 3. If there is a format issue than a ValueError is raised with a message Things to do: --- These changes broke the unit test for the sliding window function. So this needs to be fixed
…window was being done in kilobases and not MB
Removed the need for the gene info file:
…on a set The append method was being called on a set in line line 157 of the case_file_parser.py. This method cannot be called for sets and therefore the program was crashing. The append method was switched to the add method which is appropriate for sets.
This project originally used typer because of some of the limited features of the argparse library. Now argparse has acceptable features so this push replaces all of the typer dependencies with argparse.
feat: switched from using typer to just argparse
…umn in a matrix Added a new property to the PhenotypeFileParser called specific_phenotype. This property allows the user to provided a specific phenotype name to the drive program using the argument '--phenotype-name'. This change allows the user to specify a specific phenotype column from a file without having to recreate a bunch of phenotype files.
…not identifying the appropriate index if the user only wished to find a specific haplotype The parser would attempt to get the appropriate index of where the phenotype value was in the list. By default it starts at zero but if the user only want to specify a specific column then a bug appeared where if the user specified a column such as 10, the parser would attempt to first get the value at index 0 which would return a none value. This none value would then cause the latter code to fail.
…ng ibd file chunks When the id columns were numeric instead of alpha-numeric, pandas would read in the columns as mixed type. This affected a downstream process were the ids and phase value are concatenated because an type error was being thrown.
…hared pairwise IBD segments and graph was improperly formed There was an edge case that when the redopd was formed it would be empty. This emptiness would be caused when none of the individuals in network.haplotypes shared pairwise segments. This empty dataframe would then be used in the generate_graph function and instead of failing the ig.Graph.DataFrame constructor would return an object with an empty list for the edgelist (.es) attribute. The Graphbase.community_walktrap function would later on try to use this attribute and since it was empty the code would fail. Now DRIVE checks to see if the redopd or redo_vs dataframes are empty. If they are empty DRIVE will continue onto the next network, except in debug mode where DRIVE will give a logging message explaining that a graph could not be constructed during the reclustering of that network.
…eflect the change in the code file
…because it is more efficient
refactor: fixed styling errors and a type annotation
Added an autofix option
Adding an autofix feature on the push
Update black_on_push.yml
Update black.yml
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
updated version of DRIVE that catches the error when the graphs cannot be properly constructed because individuals do not share pairwise IBD segments with each other