Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

Open
eriqande opened this issue Jul 30, 2024 · 4 comments

Comments

@eriqande
Copy link
Owner

Eric here...

I am running some stuff on the Tobique salmon using a ckmr object that includes the chromosomes (it is 12 microsats, with two of them on the same chromosome). When I do that, everything seems to work fine until I start going about doing the pairwise comparisons (actually...I am getting failures looking for close matching samples).

The issue seems to be here:

tidyr::pivot_wider(data = ., names_from = LocIdx, values_from = GenoIdx)

The problem is that LocIdx is reset for each chromosome, so this ends up not being a matrix with unique Loci in it.

For now, the workaround is to create a ckmr object with everything on the same locus, and then use that when doing the pairwise comps. But that is a huge PITA.

I think this can be fixed like this...

I think that I should be able to just modify reindex_markers() so that it gives each locus a unique serial index throughout the
whole genome, so that the numbers don't start up at 1 again, on each new chromosome. The only thing that would change, I think, would be the names of the loci that are used internally (i.e., the chrom.Locus.pos nomenclature. But I don't think
that this would break anything. In fact I don't think LocIdx plays into that at all, anyway.

I need to implement this and test it and make sure it is working.

@arianacerreta
Copy link

Hi Eric,
I found the same issue with some SNP microhap data that I have been working with. It all worked great until I tried to identify close matching samples. I'll let you know what my workaround ends up being if I figure out one.

@eriqande
Copy link
Owner Author

eriqande commented Sep 4, 2024 via email

@arianacerreta
Copy link

arianacerreta commented Sep 4, 2024

Hi Eric,

I was following along your tutorial. Since you mentioned that it might have been the LocIdx that was the problem, I modified the reindex_markers function:

reindex_markers<- function(M){
  M %>% dplyr::ungroup() %>% dplyr::arrange(Chrom, Pos, desc(Freq)) %>% 
    #dplyr::group_by(Chrom) %>% 
    dplyr::mutate(locidx = as.integer(factor(Locus, levels = unique(Locus)))) %>% 
    dplyr::group_by(Chrom, Locus) %>% 
    dplyr::mutate(alleidx = as.integer(factor(Allele, levels = unique(Allele))), newfreq = Freq/sum(Freq)) %>% 
    dplyr::select(-AlleIdx, -LocIdx, -Freq) %>% 
    rename(Freq = newfreq, AlleIdx = alleidx, LocIdx = locidx) %>% 
    dplyr::ungroup()
}

That seemed to give unique identifiers for each unique locus, even when I had loci on the same chromosomes.

The find_close_matching_genotypes still threw an error after this, so I double checked the function create_integer_genotype_matrix and it ran with no problem separately. So, I decided instead of calling create_integer_genotype_matrix within find_close_matching_genotypes, I would save the integer matrix separately. Then, I created the matchers object using the source code for the find_close_matching_genotypes function. My work flow was as follows:

mat_GT<-create_integer_genotype_matrix(long_geno_sub,afreqs_ready)
max_mismatch<-5
matchers <- pairwise_geno_id(mat_GT, max_miss = max_mismatch) %>% 
  dplyr::arrange(num_mismatch) %>% dplyr::mutate(indiv_1 = rownames(S)[ind1], 
                                                 indiv_2 = rownames(S)[ind2]) %>% dplyr::select(indiv_1, 
                                                                                                indiv_2, dplyr::everything())

That is as far as I have gotten so far, but I'm going to keep working through the tutorials you have online with this data. I will let you know if I find anything else.

Ariana

@eriqande
Copy link
Owner Author

eriqande commented Sep 4, 2024

Thanks for the update Ariana. Also, for a better tutorial, that also discusses some of the things that can be done about physical linkage, please check out: https://eriqande.github.io/tws-ckmr-2022/kin-finding-lab.html

Cheers,

eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants