Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor #31

Closed
wants to merge 253 commits into from
Closed

Refactor #31

wants to merge 253 commits into from

Conversation

jtb324
Copy link
Owner

@jtb324 jtb324 commented Feb 23, 2024

updated version of DRIVE that catches the error when the graphs cannot be properly constructed because individuals do not share pairwise IBD segments with each other

jtb324 and others added 30 commits August 15, 2022 12:00
Added a connections parameter
…ecide if it is better to read the README or the website documentation
…at the plugins model can be specified by the user. Still need to determine how to do this but this is the first step
---
1. Allowed the user to create an environmental variable called
IBDCLUSTER_CUSTOM_PLUGINS. The user can turn on or off the stock plugins
so that the program only uses the custom plugins or it can use the
custom plugins with the main plugins.
Bumps [ipython](https://github.com/ipython/ipython) from 8.4.0 to 8.10.0.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](ipython/ipython@8.4.0...8.10.0)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [jupyter-core](https://github.com/jupyter/jupyter_core) from 4.10.0 to 4.11.2.
- [Release notes](https://github.com/jupyter/jupyter_core/releases)
- [Changelog](https://github.com/jupyter/jupyter_core/blob/main/CHANGELOG.md)
- [Commits](jupyter/jupyter_core@4.10.0...4.11.2)

---
updated-dependencies:
- dependency-name: jupyter-core
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
…4.11.2

Bump jupyter-core from 4.10.0 to 4.11.2
… version of program to 1.2.1 to represent this change
…eing improperly formatted. Also recreated the environment.yml and requirements.txt file so that it reflects the need for python version >= 3.10 not >= 3.8
----
1. Added the ability for the user to implement a sliding window approach
to determine loci of interest:

The user can now pass the flag --sliding-window to the program. This
value defaults to false. If true then the program expects the gene info
file to just have 1 line and the program will create a range from the
loci start to loci end with a step size of 1000. This range is inclusive
of the end. The program then creates a Genes namedtuple that it uses in
the genes_generator.

2. Updated the cluster.load_gene_info function:

This function now accepts an argument called sliding_window which will
be either true or false. The value of this argument is checked in a match
statement. If True, then the program will read the only line in the
genes file in to memory and then extract the full range of the loci. It
will then create 1 MB steps to the inclusive end of the range. Note: the
final step may not be 1 MB, but is not expected to be > 1 MB. It will
then yield a Genes namedTuple where the name is now the first column in
the genes info file with the range for each window appended to it. If
sliding_window == False then the program behaves as before.
… the Genes namedtuple was not being created correctly when the sliding window option was true
---
1. Created unit test for the cluster.main.load_gene_info method

This method is responsible fo reading in information about the loci such
as gene name, chromosome, start and end position from a gene info file
provided by the user. The method returns a generator that yields a Genes
object which is a namedTuple that has attributes for the information.
Created a class, TestLociLoader, that has three methods:
test_load_loci_info_no_sliding_window,
test_values_of_sliding_windows_formed,
test_number_of_sliding_windows_formed.

* *test_load_loci_info_no_sliding_window*: test when the user doesn't
use a sliding window. Checks to make sure the named tuple has the proper
keys

* *test_values_of_sliding_windows_formed*: test when the user uses a
sliding window. Makes sure that the first window formed has the correct
name, chromosome, start and end position

* *test_number_of_sliding_windows_formed*: test the number of sliding
windows formed for the range in the gene_info.txt file.
made sure the chr in Genes namedtuple is an integer
…dependencies. The development dependencies are optional so the user will have to specify that those should be installed if they wish to install them.
---
1. Removed the gene info file input:

Replaced this flag with 2 flags: gene_position and gene_name. These can
be provided by the user. The motivation is that the program was not
designed in a way to run multiple gene targets from the gene info file
so this file was restricted to one line anyways and was unnecessarily.
This input is recorded in the log file so the user can still tell what
they ran. The --gene-position flag expects the user to provided a string
of the format "chromosome:start_position-end-position". The --gene-name
flag allows the user to give a name to the gene. This flag defaults to
test if the user doesn't provided anything.

2. Updated the load_gene_info function in the cluster.main file:

This function now returns a list of Genes instead of a generator. The
function uses a regular expression to pull out the chromosome number and
start/end positions from the provide chromo_pos_str. Still retains the
ability to create a sliding window every 1MB.

3. Add a callback to check the format of the --gene-position string:

Added a function called check_gene_pos_str in callbacks.check_inputs.py.
This function makes sure that the string is formatted as
"chromosome:start_position-end_position". This format is checked using
the re.split to make sure that the resulting split list is of length 3.
If there is a format issue than a ValueError is raised with a message

Things to do:
---
These changes broke the unit test for the sliding window function. So
this needs to be fixed
…window was being done in kilobases and not MB
Removed the need for the gene info file:
jtb324 and others added 12 commits September 27, 2023 12:23
…on a set

The append method was being called on a set in line line 157 of the case_file_parser.py. This method cannot be called for sets and therefore the program was crashing. The append method was switched to the add method which is appropriate for sets.
This project originally used typer because of some of the limited features of the argparse library. Now argparse has acceptable features so this push replaces all of the typer dependencies with argparse.
feat: switched from using typer to just argparse
…umn in a matrix

Added a new property to the PhenotypeFileParser called specific_phenotype. This property allows the user to provided a specific phenotype name to the drive program using the argument '--phenotype-name'. This change allows the user to specify a specific phenotype column from a file without having to recreate a bunch of phenotype files.
…not identifying the appropriate index if the user only wished to find a specific haplotype

The parser would attempt to get the appropriate index of where the phenotype value was in the list. By default it starts at zero but if the user only want to specify a specific column then a bug appeared where if the user specified a column such as 10, the parser would attempt to first get the value at index 0 which would return a none value. This none value would then cause the latter code to fail.
…ng ibd file chunks

When the id columns were numeric instead of alpha-numeric, pandas would read in the columns as mixed type. This affected a downstream process were the ids and phase value are concatenated because an type error was being thrown.
…hared pairwise IBD segments and graph was improperly formed

There was an edge case that when the redopd was formed it would be empty. This emptiness would be caused when none of the individuals in network.haplotypes shared pairwise segments. This empty dataframe would then be used in the generate_graph function and instead of failing the ig.Graph.DataFrame constructor would return an object with an empty list for the edgelist (.es) attribute. The Graphbase.community_walktrap function would later on try to use this attribute and since it was empty the code would fail. Now DRIVE checks to see if the redopd or redo_vs dataframes are empty. If they are empty DRIVE will continue onto the next network, except in debug mode where DRIVE will give a logging message explaining that a graph could not be constructed during the reclustering of that network.
@jtb324 jtb324 added the bug Something isn't working label Feb 23, 2024
@jtb324 jtb324 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant