Refactor #31

jtb324 · 2024-02-23T18:24:26Z

updated version of DRIVE that catches the error when the graphs cannot be properly constructed because individuals do not share pairwise IBD segments with each other

Added a connections parameter

…ecide if it is better to read the README or the website documentation

…at the plugins model can be specified by the user. Still need to determine how to do this but this is the first step

--- 1. Allowed the user to create an environmental variable called IBDCLUSTER_CUSTOM_PLUGINS. The user can turn on or off the stock plugins so that the program only uses the custom plugins or it can use the custom plugins with the main plugins.

Plugin configuration

…ronmental variables

Bumps [ipython](https://github.com/ipython/ipython) from 8.4.0 to 8.10.0. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](ipython/ipython@8.4.0...8.10.0) --- updated-dependencies: - dependency-name: ipython dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>

Bump ipython from 8.4.0 to 8.10.0

Bumps [jupyter-core](https://github.com/jupyter/jupyter_core) from 4.10.0 to 4.11.2. - [Release notes](https://github.com/jupyter/jupyter_core/releases) - [Changelog](https://github.com/jupyter/jupyter_core/blob/main/CHANGELOG.md) - [Commits](jupyter/jupyter_core@4.10.0...4.11.2) --- updated-dependencies: - dependency-name: jupyter-core dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>

…4.11.2 Bump jupyter-core from 4.10.0 to 4.11.2

… version of program to 1.2.1 to represent this change

…eing improperly formatted. Also recreated the environment.yml and requirements.txt file so that it reflects the need for python version >= 3.10 not >= 3.8

---- 1. Added the ability for the user to implement a sliding window approach to determine loci of interest: The user can now pass the flag --sliding-window to the program. This value defaults to false. If true then the program expects the gene info file to just have 1 line and the program will create a range from the loci start to loci end with a step size of 1000. This range is inclusive of the end. The program then creates a Genes namedtuple that it uses in the genes_generator. 2. Updated the cluster.load_gene_info function: This function now accepts an argument called sliding_window which will be either true or false. The value of this argument is checked in a match statement. If True, then the program will read the only line in the genes file in to memory and then extract the full range of the loci. It will then create 1 MB steps to the inclusive end of the range. Note: the final step may not be 1 MB, but is not expected to be > 1 MB. It will then yield a Genes namedTuple where the name is now the first column in the genes info file with the range for each window appended to it. If sliding_window == False then the program behaves as before.

… the Genes namedtuple was not being created correctly when the sliding window option was true

--- 1. Created unit test for the cluster.main.load_gene_info method This method is responsible fo reading in information about the loci such as gene name, chromosome, start and end position from a gene info file provided by the user. The method returns a generator that yields a Genes object which is a namedTuple that has attributes for the information. Created a class, TestLociLoader, that has three methods: test_load_loci_info_no_sliding_window, test_values_of_sliding_windows_formed, test_number_of_sliding_windows_formed. * *test_load_loci_info_no_sliding_window*: test when the user doesn't use a sliding window. Checks to make sure the named tuple has the proper keys * *test_values_of_sliding_windows_formed*: test when the user uses a sliding window. Makes sure that the first window formed has the correct name, chromosome, start and end position * *test_number_of_sliding_windows_formed*: test the number of sliding windows formed for the range in the gene_info.txt file.

Sliding window

made sure the chr in Genes namedtuple is an integer

…oups for the user, testing, and development

…dependencies. The development dependencies are optional so the user will have to specify that those should be installed if they wish to install them.

--- 1. Removed the gene info file input: Replaced this flag with 2 flags: gene_position and gene_name. These can be provided by the user. The motivation is that the program was not designed in a way to run multiple gene targets from the gene info file so this file was restricted to one line anyways and was unnecessarily. This input is recorded in the log file so the user can still tell what they ran. The --gene-position flag expects the user to provided a string of the format "chromosome:start_position-end-position". The --gene-name flag allows the user to give a name to the gene. This flag defaults to test if the user doesn't provided anything. 2. Updated the load_gene_info function in the cluster.main file: This function now returns a list of Genes instead of a generator. The function uses a regular expression to pull out the chromosome number and start/end positions from the provide chromo_pos_str. Still retains the ability to create a sliding window every 1MB. 3. Add a callback to check the format of the --gene-position string: Added a function called check_gene_pos_str in callbacks.check_inputs.py. This function makes sure that the string is formatted as "chromosome:start_position-end_position". This format is checked using the re.split to make sure that the resulting split list is of length 3. If there is a format issue than a ValueError is raised with a message Things to do: --- These changes broke the unit test for the sliding window function. So this needs to be fixed

…window was being done in kilobases and not MB

Removed the need for the gene info file:

…on a set The append method was being called on a set in line line 157 of the case_file_parser.py. This method cannot be called for sets and therefore the program was crashing. The append method was switched to the add method which is appropriate for sets.

This project originally used typer because of some of the limited features of the argparse library. Now argparse has acceptable features so this push replaces all of the typer dependencies with argparse.

feat: switched from using typer to just argparse

…umn in a matrix Added a new property to the PhenotypeFileParser called specific_phenotype. This property allows the user to provided a specific phenotype name to the drive program using the argument '--phenotype-name'. This change allows the user to specify a specific phenotype column from a file without having to recreate a bunch of phenotype files.

…not identifying the appropriate index if the user only wished to find a specific haplotype The parser would attempt to get the appropriate index of where the phenotype value was in the list. By default it starts at zero but if the user only want to specify a specific column then a bug appeared where if the user specified a column such as 10, the parser would attempt to first get the value at index 0 which would return a none value. This none value would then cause the latter code to fail.

…ng ibd file chunks When the id columns were numeric instead of alpha-numeric, pandas would read in the columns as mixed type. This affected a downstream process were the ids and phase value are concatenated because an type error was being thrown.

…hared pairwise IBD segments and graph was improperly formed There was an edge case that when the redopd was formed it would be empty. This emptiness would be caused when none of the individuals in network.haplotypes shared pairwise segments. This empty dataframe would then be used in the generate_graph function and instead of failing the ig.Graph.DataFrame constructor would return an object with an empty list for the edgelist (.es) attribute. The Graphbase.community_walktrap function would later on try to use this attribute and since it was empty the code would fail. Now DRIVE checks to see if the redopd or redo_vs dataframes are empty. If they are empty DRIVE will continue onto the next network, except in debug mode where DRIVE will give a logging message explaining that a graph could not be constructed during the reclustering of that network.

…eflect the change in the code file

Refactor

… performance

…lower than open

…because it is more efficient

Refactor

refactor: fixed styling errors and a type annotation

Added an autofix option

Adding an autofix feature on the push

Update black_on_push.yml

Update black.yml

jtb324 and others added 30 commits August 15, 2022 12:00

Merge pull request jtb324#21 from jtb324/refactor

c9716f5

Added a connections parameter

moved the documentation link to the top of the README so people can d…

938ca73

…ecide if it is better to read the README or the website documentation

Merge branch 'main' of https://github.com/jtb324/IBDCluster

be9b542

moved the factory.py and loader.py to a different factory model so th…

79ebdb9

…at the plugins model can be specified by the user. Still need to determine how to do this but this is the first step

updated docstrings

1321e64

Merge pull request jtb324#22 from jtb324/plugin_configuration

dc79f62

Plugin configuration

updated the poetry minor number to reflect the new change to the envi…

e7e243c

…ronmental variables

Merge pull request jtb324#23 from jtb324/dependabot/pip/ipython-8.10.0

716d99a

Bump ipython from 8.4.0 to 8.10.0

Merge pull request jtb324#24 from jtb324/dependabot/pip/jupyter-core-…

bee0460

…4.11.2 Bump jupyter-core from 4.10.0 to 4.11.2

bumped pytest to version 7.2.1 to fix a security bug and then updated…

b47133c

… version of program to 1.2.1 to represent this change

updated the python version to 3.10

bc3493f

Recreated the lock file since there was an issue with the lock file b…

9c75cf9

…eing improperly formatted. Also recreated the environment.yml and requirements.txt file so that it reflects the need for python version >= 3.10 not >= 3.8

updated the lock file with the new dependencies

dea2e21

fixed a bug in the cluster.load_gene_info function where the name for…

377bbd0

… the Genes namedtuple was not being created correctly when the sliding window option was true

Merge pull request jtb324#25 from jtb324/sliding_window

0e12c64

Sliding window

made sure the chr in Genes namedtuple is an integer

efd5ca3

Merge pull request jtb324#26 from jtb324/sliding_window

42a5eb5

made sure the chr in Genes namedtuple is an integer

updated the pyproject.toml to use dependency groups. There are now gr…

f00c11d

…oups for the user, testing, and development

updated the pyproject.toml to use groups for testing and development …

46cec84

…dependencies. The development dependencies are optional so the user will have to specify that those should be installed if they wish to install them.

added toml to a dependency

8eeb920

changed to python 3.11 and adjusted dependencies for that

3f12f37

updated the python version to 3.11

5138fa1

fixed a typo in the IBDCluster.py and fixed a typo where the sliding …

b7855c4

…window was being done in kilobases and not MB

Merge pull request jtb324#27 from jtb324/hh_version_refactor

28e9556

Removed the need for the gene info file:

jtb324 and others added 12 commits September 27, 2023 12:23

docs(README.md): updated the readme to reflect current versioning

0e0886d

build: updated pyproject to have the correct patch number

e212f57

feat: switched from using typer to just argparse

cff903b

This project originally used typer because of some of the limited features of the argparse library. Now argparse has acceptable features so this push replaces all of the typer dependencies with argparse.

Merge branch 'main' into refactor

f6c3d6c

Merge pull request #5 from belowlab/refactor

192dbb4

feat: switched from using typer to just argparse

refactor(drive.py): fixed a typer in the recluster flag help message

3134c76

docs(pyproject.toml): updated version number and added a comment to r…

d741aab

…eflect the change in the code file

jtb324 added the bug Something isn't working label Feb 23, 2024

jtb324 and others added 16 commits February 23, 2024 12:25

Merge pull request #6 from belowlab/refactor

750acb9

Refactor

feat(case_file_parser.py): Changed the parser to use xopen for better…

0fd706a

… performance

refactor(pyproject.toml): updated the version number

94e6f06

refactor: removed the xopen dependencies because it might have been s…

56c622c

…lower than open

refactor(case_file_parser.py): changed to using pandas in the parser …

838c6c8

…because it is more efficient

Merge pull request #7 from belowlab/refactor

eeec05d

Refactor

refactor: fixed styling errors and a type annotation

246c563

Merge pull request #8 from belowlab/refactor

97b966b

refactor: fixed styling errors and a type annotation

Update black_on_push.yml

9a20da0

Added an autofix option

Update black_on_push.yml

5324e19

Adding an autofix feature on the push

Merge pull request #9 from belowlab/refactor

28ea380

Update black_on_push.yml

Update black.yml

3dcf31b

Merge pull request #10 from belowlab/refactor

1477ded

Update black.yml

ci(black.yml): added a autofix flag

0c01400

ci(black.yml): added an autofix option

9eda08a

ci: removed the black linting because it is not working at the moment

da79d05

jtb324 closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor #31

Refactor #31

jtb324 commented Feb 23, 2024

Refactor #31

Refactor #31

Conversation

jtb324 commented Feb 23, 2024