WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

eriqande · 2024-07-30T21:27:03Z

Eric here...

I am running some stuff on the Tobique salmon using a ckmr object that includes the chromosomes (it is 12 microsats, with two of them on the same chromosome). When I do that, everything seems to work fine until I start going about doing the pairwise comparisons (actually...I am getting failures looking for close matching samples).

The issue seems to be here:

CKMRsim/R/create_integer_genotype_matrix.R

Line 49 in b54e324

tidyr::pivot_wider(data = ., names_from = LocIdx, values_from = GenoIdx)

The problem is that LocIdx is reset for each chromosome, so this ends up not being a matrix with unique Loci in it.

For now, the workaround is to create a ckmr object with everything on the same locus, and then use that when doing the pairwise comps. But that is a huge PITA.

I think this can be fixed like this...

I think that I should be able to just modify reindex_markers() so that it gives each locus a unique serial index throughout the
whole genome, so that the numbers don't start up at 1 again, on each new chromosome. The only thing that would change, I think, would be the names of the loci that are used internally (i.e., the chrom.Locus.pos nomenclature. But I don't think
that this would break anything. In fact I don't think LocIdx plays into that at all, anyway.

I need to implement this and test it and make sure it is working.

The text was updated successfully, but these errors were encountered:

arianacerreta · 2024-09-04T19:16:48Z

Hi Eric,
I found the same issue with some SNP microhap data that I have been working with. It all worked great until I tried to identify close matching samples. I'll let you know what my workaround ends up being if I figure out one.

eriqande · 2024-09-04T20:12:39Z

Hi ariana, thanks for pinging me about this. Let me know if no simple workarounds work for you and I can fast-track the fix on this for you next week. Cheers, eric

…

On Wed, Sep 4, 2024 at 1:17 PM arianacerreta ***@***.***> wrote: Hi Eric, I found the same issue with some SNP microhap data that I have been working with. It all worked great until I tried to identify close matching samples. I'll let you know what my workaround ends up being if I figure out one. — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPQ4JWL2T7Z2QLVVFTKDEDZU5MDLAVCNFSM6AAAAABLXFXQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZG44TOOBVGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

arianacerreta · 2024-09-04T21:02:21Z

Hi Eric,

I was following along your tutorial. Since you mentioned that it might have been the LocIdx that was the problem, I modified the reindex_markers function:

reindex_markers<- function(M){
  M %>% dplyr::ungroup() %>% dplyr::arrange(Chrom, Pos, desc(Freq)) %>% 
    #dplyr::group_by(Chrom) %>% 
    dplyr::mutate(locidx = as.integer(factor(Locus, levels = unique(Locus)))) %>% 
    dplyr::group_by(Chrom, Locus) %>% 
    dplyr::mutate(alleidx = as.integer(factor(Allele, levels = unique(Allele))), newfreq = Freq/sum(Freq)) %>% 
    dplyr::select(-AlleIdx, -LocIdx, -Freq) %>% 
    rename(Freq = newfreq, AlleIdx = alleidx, LocIdx = locidx) %>% 
    dplyr::ungroup()
}

That seemed to give unique identifiers for each unique locus, even when I had loci on the same chromosomes.

The find_close_matching_genotypes still threw an error after this, so I double checked the function create_integer_genotype_matrix and it ran with no problem separately. So, I decided instead of calling create_integer_genotype_matrix within find_close_matching_genotypes, I would save the integer matrix separately. Then, I created the matchers object using the source code for the find_close_matching_genotypes function. My work flow was as follows:

mat_GT<-create_integer_genotype_matrix(long_geno_sub,afreqs_ready)
max_mismatch<-5
matchers <- pairwise_geno_id(mat_GT, max_miss = max_mismatch) %>% 
  dplyr::arrange(num_mismatch) %>% dplyr::mutate(indiv_1 = rownames(S)[ind1], 
                                                 indiv_2 = rownames(S)[ind2]) %>% dplyr::select(indiv_1, 
                                                                                                indiv_2, dplyr::everything())

That is as far as I have gotten so far, but I'm going to keep working through the tutorials you have online with this data. I will let you know if I find anything else.

Ariana

eriqande · 2024-09-04T22:15:43Z

Thanks for the update Ariana. Also, for a better tutorial, that also discusses some of the things that can be done about physical linkage, please check out: https://eriqande.github.io/tws-ckmr-2022/kin-finding-lab.html

Cheers,

eric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

eriqande commented Jul 30, 2024

arianacerreta commented Sep 4, 2024

eriqande commented Sep 4, 2024 via email

arianacerreta commented Sep 4, 2024 •

edited

Loading

eriqande commented Sep 4, 2024

WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9

Comments

eriqande commented Jul 30, 2024

I think this can be fixed like this...

arianacerreta commented Sep 4, 2024

eriqande commented Sep 4, 2024 via email

arianacerreta commented Sep 4, 2024 • edited Loading

eriqande commented Sep 4, 2024

arianacerreta commented Sep 4, 2024 •

edited

Loading