Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO results #54

Open
wjq1981 opened this issue Jul 28, 2023 · 2 comments
Open

GO results #54

wjq1981 opened this issue Jul 28, 2023 · 2 comments

Comments

@wjq1981
Copy link

wjq1981 commented Jul 28, 2023

Hi, when I use the command "deepgoplus --data-root /hdd/Software/deepgoplus/data/ --in-file test.fasta ---out-file out.file" the result is just an id number, how can I get the corresponding How can I get the corresponding GO number?

@Rohit-Satyam
Copy link

Rohit-Satyam commented Oct 4, 2023

Hi @wjq1981

If I understood correctly you want to convert GO IDs to Description right? If so here is a little trick in R to get corresponding GO terms that I tried yesterday:

library(tidyverse)
## Will read tsv as 19 column Dataframe
deepgoplus <- read_tsv("deepgoplus_results.tsv", col_names = FALSE)
## Replacing the last colum tab separated values to semicolon separated values
deepgoplus[,ncol(deepgoplus)] <- gsub("\t",";", deepgoplus$X19)

## Aggregate columns into a single colum
deepgoplus <- unite(deepgoplus, col='go_term', c(2:ncol(deepgoplus)), sep=';')

## Converting the GO terms per row to corresponding description using pRoloc::goidToTerm function
invisible(lapply(1:nrow(deepgoplus), function(x){

## Taking one row at a time and converting it to dataframe
    tempdf<-data.frame(go=(stringr::str_split(deepgoplus[x,2], pattern = "[;]")[[1]])) %>%
    separate(go, into = c('go', 'conf'), sep = "\\|")
	
## Converting GO IDs to description
    tempdf$desc <- pRoloc::goIdToTerm(tempdf$go, names = TRUE, keepNA = TRUE)

## At this point you can also subset most meaningful and high confidence GO terms
    tempdf<-subset(tempdf, conf == max(conf))

## Update the second column with the high confidence go term description
  deepgoplus[x,2] <<- paste(tempdf$desc, tempdf$conf, sep="|",collapse = ";")
}))

Now I would like to mention that most of the high confidence GO terms provided by deepgoplus are useless like in screenshot show below and are not very informative when working with hypothetical protein.

image

I also tried to update the GO database, but the script keeps on failing as well the following error

./update.py
Checking new release date...
ERROR:root:'x-uniprot-release'

@wjq1981
Copy link
Author

wjq1981 commented Oct 5, 2023

Hi @wjq1981

If I understood correctly you want to convert GO IDs to Description right? If so here is a little trick in R to get corresponding GO terms that I tried yesterday:

library(tidyverse)
## Will read tsv as 19 column Dataframe
deepgoplus <- read_tsv("deepgoplus_results.tsv", col_names = FALSE)
## Replacing the last colum tab separated values to semicolon separated values
deepgoplus[,ncol(deepgoplus)] <- gsub("\t",";", deepgoplus$X19)

## Aggregate columns into a single colum
deepgoplus <- unite(deepgoplus, col='go_term', c(2:ncol(deepgoplus)), sep=';')

## Converting the GO terms per row to corresponding description using pRoloc::goidToTerm function
invisible(lapply(1:nrow(deepgoplus), function(x){

## Taking one row at a time and converting it to dataframe
    tempdf<-data.frame(go=(stringr::str_split(deepgoplus[x,2], pattern = "[;]")[[1]])) %>%
    separate(go, into = c('go', 'conf'), sep = "\\|")
	
## Converting GO IDs to description
    tempdf$desc <- pRoloc::goIdToTerm(tempdf$go, names = TRUE, keepNA = TRUE)

## At this point you can also subset most meaningful and high confidence GO terms
    tempdf<-subset(tempdf, conf == max(conf))

## Update the second column with the high confidence go term description
  deepgoplus[x,2] <<- paste(tempdf$desc, tempdf$conf, sep="|",collapse = ";")
}))

Now I would like to mention that most of the high confidence GO terms provided by deepgoplus are useless like in screenshot show below and are not very informative when working with hypothetical protein.

image

I also tried to update the GO database, but the script keeps on failing as well the following error

./update.py
Checking new release date...
ERROR:root:'x-uniprot-release'

Thank you so much for your help. I will give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants