Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate rows in GTEx v8 PFB export #113

Open
ianfore opened this issue Dec 21, 2020 · 0 comments
Open

Duplicate rows in GTEx v8 PFB export #113

ianfore opened this issue Dec 21, 2020 · 0 comments

Comments

@ianfore
Copy link
Collaborator

ianfore commented Dec 21, 2020

Following the instructions on Accessing GTEX v8 phenotypic data here.

When running PyPFB on the export of the GTEx data the sequencing.tsv file contains duplicates of almost all rows. The other tsvs do not have duplicates. It is not clear if the duplicates exist in the PFB file or are generated when PyPFB converts it to tsv. Given the sequencing file is the only one that shows this problem the more likely guess is that the duplication is present in the PFB.

46 rows in sequencing.tsv do not appear to be duplicates. These are the sequencing files related to the project as a whole rather than to samples (see parent_type). This also suggests the duplicates are present in the PFB/Avro and are not generated by PyPFB.

See also the related pull request which deals with linking between tsvs.

@ianfore ianfore changed the title Duplicate rows in GTEx v8 export Duplicate rows in GTEx v8 PFB export Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant