Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean database from undeleted replicated subjects (NO PRIVATE DATA HERE) #2531

Open
julien-louis opened this issue Dec 4, 2024 · 2 comments
Assignees

Comments

@julien-louis
Copy link
Collaborator

ON OFSEP

(count subjects in the datasets schema that have no parent subject in studies schema, then count linked exams, acq and ds)

mysql@debian2 ofsep-qualif-datasets> select count(id) from datasets.subject where datasets.subject.id not in (select id from studies.subject);
+-----------+
| count(id) |
+-----------+
|        61 |
+-----------+
1 row in set (0.015 sec)


mysql@debian2 ofsep-qualif-datasets> select count(*) from examination where subject_id in (select id from datasets.subject where datasets.subject.id not in (select id from studies.subject));
+----------+
| count(*) |
+----------+
|      493 |
+----------+
1 row in set (0.066 sec)

mysql@debian2 ofsep-qualif-datasets> select count(*) from dataset_acquisition where examination_id in (select id from examination where subject_id in (select id from datasets.subject where datasets.subject.id no
t in (select id from studies.subject)));
+----------+
| count(*) |
+----------+
|     5949 |
+----------+
1 row in set (0.061 sec)

mysql@debian2 ofsep-qualif-datasets> select count(*) from dataset where dataset_acquisition_id in (select id from dataset_acquisition where examination_id in (select id from examination where subject_id in (sele
ct id from datasets.subject where datasets.subject.id not in (select id from studies.subject))));
+----------+
| count(*) |
+----------+
|     7101 |
+----------+
1 row in set (0.131 sec)
@julien-louis
Copy link
Collaborator Author

ON NEURINFO

mysql@debian2 neurinfo-qualif-datasets> select count(id) from datasets.subject where datasets.subject.id not in (select id from studies.subject);
+-----------+
| count(id) |
+-----------+
|        58 |
+-----------+
1 row in set (0.037 sec)

mysql@debian2 neurinfo-qualif-datasets> select count(*) from examination where subject_id in (select id from datasets.subject where datasets.subject.id not in (select id from studies.subject));
+----------+
| count(*) |
+----------+
|       31 |
+----------+
1 row in set (0.069 sec)

mysql@debian2 neurinfo-qualif-datasets> select count(*) from dataset_acquisition where examination_id in (select id from examination where subject_id in (select id from datasets.subject where datasets.subject.id not in (select id from studies.subject)));
+----------+
| count(*) |
+----------+
|      342 |
+----------+
1 row in set (0.034 sec)

mysql@debian2 neurinfo-qualif-datasets> select count(*) from dataset where dataset_acquisition_id in (select id from dataset_acquisition where examination_id in (select id from examination where subject_id in (select id from datasets.subject where datasets.subject.id not in (select id from studies.subject))));
+----------+
| count(*) |
+----------+
|      114 |
+----------+
1 row in set (0.051 sec)

@julien-louis
Copy link
Collaborator Author

Hints on dependencies

mysql> SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME IN ('subject_id') AND TABLE_SCHEMA='datasets';
+------------------+
| TABLE_NAME       |
+------------------+
| dataset          |
| examination      |
| shanoir_metadata |
| subject_study    |
+------------------+
4 rows in set (0.01 sec)

mysql> SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME IN ('examination_id') AND TABLE_SCHEMA='datasets';
+-----------------------------+
| TABLE_NAME                  |
+-----------------------------+
| dataset_acquisition         |
| extra_data_file_path        |
| instrument_based_assessment |
| shanoir_metadata            |
+-----------------------------+
4 rows in set (0.00 sec)

mysql> SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME IN ('dataset_acquisition_id') AND TABLE_SCHEMA='datasets';
+------------+
| TABLE_NAME |
+------------+
| dataset    |
+------------+
1 row in set (0.00 sec)

mysql> SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME IN ('dataset_id') AND TABLE_SCHEMA='datasets';
+-----------------------------+
| TABLE_NAME                  |
+-----------------------------+
| channel                     |
| dataset_expression          |
| dataset_property            |
| dataset_tag                 |
| event                       |
| input_of_dataset_processing |
| processing_resource         |
| related_datasets            |
| shanoir_metadata            |
+-----------------------------+
9 rows in set (0.00 sec)

@julien-louis julien-louis changed the title Clean database fomr undeleted replicated subjects Clean database from undeleted replicated subjects (NO PRIVATE DATA HERE) Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants