-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WTF??!! LocIdx starts at 1 on every locus, but that seems to break create_integer_genotype_matrix() #9
Comments
Hi Eric, |
Hi ariana,
thanks for pinging me about this. Let me know if no simple workarounds
work for you and I can fast-track the fix on this for you next week.
Cheers,
eric
…On Wed, Sep 4, 2024 at 1:17 PM arianacerreta ***@***.***> wrote:
Hi Eric,
I found the same issue with some SNP microhap data that I have been
working with. It all worked great until I tried to identify close matching
samples. I'll let you know what my workaround ends up being if I figure out
one.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPQ4JWL2T7Z2QLVVFTKDEDZU5MDLAVCNFSM6AAAAABLXFXQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZG44TOOBVGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Eric, I was following along your tutorial. Since you mentioned that it might have been the LocIdx that was the problem, I modified the reindex_markers function: reindex_markers<- function(M){
M %>% dplyr::ungroup() %>% dplyr::arrange(Chrom, Pos, desc(Freq)) %>%
#dplyr::group_by(Chrom) %>%
dplyr::mutate(locidx = as.integer(factor(Locus, levels = unique(Locus)))) %>%
dplyr::group_by(Chrom, Locus) %>%
dplyr::mutate(alleidx = as.integer(factor(Allele, levels = unique(Allele))), newfreq = Freq/sum(Freq)) %>%
dplyr::select(-AlleIdx, -LocIdx, -Freq) %>%
rename(Freq = newfreq, AlleIdx = alleidx, LocIdx = locidx) %>%
dplyr::ungroup()
} That seemed to give unique identifiers for each unique locus, even when I had loci on the same chromosomes. The find_close_matching_genotypes still threw an error after this, so I double checked the function create_integer_genotype_matrix and it ran with no problem separately. So, I decided instead of calling create_integer_genotype_matrix within find_close_matching_genotypes, I would save the integer matrix separately. Then, I created the matchers object using the source code for the find_close_matching_genotypes function. My work flow was as follows: mat_GT<-create_integer_genotype_matrix(long_geno_sub,afreqs_ready)
max_mismatch<-5
matchers <- pairwise_geno_id(mat_GT, max_miss = max_mismatch) %>%
dplyr::arrange(num_mismatch) %>% dplyr::mutate(indiv_1 = rownames(S)[ind1],
indiv_2 = rownames(S)[ind2]) %>% dplyr::select(indiv_1,
indiv_2, dplyr::everything()) That is as far as I have gotten so far, but I'm going to keep working through the tutorials you have online with this data. I will let you know if I find anything else. Ariana |
Thanks for the update Ariana. Also, for a better tutorial, that also discusses some of the things that can be done about physical linkage, please check out: https://eriqande.github.io/tws-ckmr-2022/kin-finding-lab.html Cheers, eric |
Eric here...
I am running some stuff on the Tobique salmon using a ckmr object that includes the chromosomes (it is 12 microsats, with two of them on the same chromosome). When I do that, everything seems to work fine until I start going about doing the pairwise comparisons (actually...I am getting failures looking for close matching samples).
The issue seems to be here:
CKMRsim/R/create_integer_genotype_matrix.R
Line 49 in b54e324
The problem is that LocIdx is reset for each chromosome, so this ends up not being a matrix with unique Loci in it.
For now, the workaround is to create a ckmr object with everything on the same locus, and then use that when doing the pairwise comps. But that is a huge PITA.
I think this can be fixed like this...
I think that I should be able to just modify
reindex_markers()
so that it gives each locus a unique serial index throughout thewhole genome, so that the numbers don't start up at 1 again, on each new chromosome. The only thing that would change, I think, would be the names of the loci that are used internally (i.e., the
chrom.Locus.pos
nomenclature. But I don't thinkthat this would break anything. In fact I don't think LocIdx plays into that at all, anyway.
I need to implement this and test it and make sure it is working.
The text was updated successfully, but these errors were encountered: