Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For data files transferred from EGA to Collab, add EGA repo to 'File copy' section of the portal file index #460

Open
junjun-zhang opened this issue Jan 19, 2018 · 4 comments
Labels

Comments

@junjun-zhang
Copy link

Here is one such file: https://dcc.icgc.org/repositories/files/FI743257. It is originated from EGA, we transferred to Collaboratory, but the file page only shows this file exist in Collab but not in EGA.

We need a way to let the portal repo indexer know addition copy of the file exists in EGA as well. This could be as easy as detecting whether dataBundleId starts with EGA, if so, there must be a copy of the file exist in EGA.

We may also need additional EGA specific information for the file copy, such as repoFileId, in this case, we need EGAFxxxxx ID to be populated, so will need a way to pass it to indexer.

@junjun-zhang junjun-zhang added the enhancement New feature or request label Feb 10, 2018
@Fgerthoffert Fgerthoffert added the backlog AB:1:Backlog label Feb 27, 2018
@junjun-zhang
Copy link
Author

junjun-zhang commented Mar 5, 2018

@baminou can you please mirror all of the EGA file transfer git repositories hosted internally under http://142.1.177.124/jt-hub to the public github repo under https://github.com/icgc-dcc/?

There should be 5 or 6 of them.

@junjun-zhang
Copy link
Author

The goal is to add a new fileCopy entry for files already transferred from EGA to Collaboratory. We first identify the SONG Analysis in Collab, this can be done by Analysis ID. The ID takes form of EGAZ000000000 or EGZR000000000. Then we will need two fields specific to EGA fileCopy entry: repoDataSetIds and repoFileId. Below we give detail how values for these fields can be found.

The git repos for ega transfer jobs have been mirrored from our internal server to GitHub:

The files contained needed information are:

1. Job JSON file with EGA Dataset ID, pattern for file path/name: ega-file-transfer-to-collab-*-jtracker/blob/master/ega-file-transfer-to-collab.*.jtracker/job_state.completed/job.*/job.*.json. Fields: 'bundle_id' (use as 'repoDataBundleId', same as SONG's Analysis ID), 'ega_dataset_id' (use as 'repoDataSetIds').
2. Uploaded data file to Collab, pattern for file path/name: ega-file-transfer-to-collab-*-jtracker/blob/master/ega-file-transfer-to-collab.*.jtracker/job_state.completed/job.*/task_state.completed/worker.*/task.upload.*/task.upload.{ega_file_id}.json. Use the value {ega_file_id} in the file name as 'repoFileId'

Example files:

@junjun-zhang
Copy link
Author

Just to give two examples here:

@junjun-zhang junjun-zhang added in progress AB:3:In Progress review AB:4:Review and removed backlog AB:1:Backlog in progress AB:3:In Progress labels Mar 14, 2018
@Fgerthoffert Fgerthoffert added backlog AB:1:Backlog and removed review AB:4:Review labels Jun 27, 2018
@rosibaj
Copy link
Contributor

rosibaj commented Mar 7, 2019

EGA indexing to be investigated in future; do not work on this until we know more about those specs, OR until we have more EGA data to transfer to collaboratory.

@rosibaj rosibaj removed the backlog AB:1:Backlog label Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants