Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CSV Outputs to a more suitable format (HDF5) #83

Open
Luxxii opened this issue Aug 15, 2024 · 1 comment · May be fixed by #85
Open

Change CSV Outputs to a more suitable format (HDF5) #83

Luxxii opened this issue Aug 15, 2024 · 1 comment · May be fixed by #85
Assignees
Milestone

Comments

@Luxxii
Copy link
Member

Luxxii commented Aug 15, 2024

We discussed the issue of too complex csv-files, which are note human readable. Therefore we could entirely switch to a different format. We settled on HDF5!

@di-hardt
Copy link
Collaborator

From our discussion about mzQC compliance

Base_Peak_Intensity_Max --> No DA LOCAL:01 (single value (intensity) highest peak in the MS1()/2?) map)
Base_Peak_Intensity_Max_Up_To_105 --> No DA LOCAL:02 (single value highest peak (intensity) in the MS1()/2?) map up to a retention time)
MS1_TIC_Change_Q .... ---> MS:40000057 --> Note on OBO which are fixed size --> Create issue for variable length as for MS:40000061
MS1-TIC_Q... ---> MS:40000058 --> note to OBO they are fixed size --> create issue for variable length like MS:40000061
MS1_Density_Q... --> MS:40000061
MS1_Freq_Max ---> MS:40000065
MS2_Density_Q... --> MS:40000062
MS1_Freq_Max ---> MS:40000066
MS2_Prec_Z_1-5 and also more ----> MS:40000063
MS2_Prec_unknown ---> first in MS:40000063 ---> This value does not exist for the time being ---> Create issue and possibly add to MS:40000063 (charge 0?)
RT_MS1_Q_000-100 --> MS:40000055 --> are also fixed again --> create issue for variable length as with MS:40000061
RT_MS2_Q_000-100 --> MS:40000056 --> are also fixed again --> create issue for variable length like MS:40000061
RT_TIC_Q_000-100 --> MS:40000054 --> are also fixed again --> create issue for variable length like MS:40000061
RT_duration --> MS:40000070 --> The minimum must be added here (first spectrum the RT) (perhaps also specify this, since we have everything: MS:40000067)

SPIKEINS
---> Create table type --> which can then display this in general --> then as issue 

THERMO|BRUKER difficult (perhaps leave as is)


Total_Ion_current_Max --> LOCAL:03 (single values)
Total_Ion_current_Max_up_to105 --> LOCAL:04 (single values)
accumulated_Ms1_Tic --> MS:40000029
accumulated_Ms2_Tic --> MS:40000030
feature_data --> omit
filteres_psms_ppm_error --> LOCAL:05 --> ppm error for each individual identified psm.   --> calculate individual values for MS:40000178 / MS:40000179 (the table is of course still included)
ms1_map_intens/mz/rt --> LOCAL:06 --> raw data from which you make a plot --> we turn it into a table (which is precisely defined in the metadata)
ms1_rt/tic array --> recalculate with ms2 with it --> then store under MS:40000029 --> create query/issue whether it should also be done individually
ms2_rt/tic/mz array --> see above
num_feature_charge_ --> LOCAL:06 ---> create issue and request --> are called quantification data points there
num_feature_ident_charge_ --> LOCAL:07 ---> Create and request issue --> are called quantification data points there
number_of_filtered_peptides --> MS:1003250 --> We take the peptidoforms here (doesn't quite fit either) 
number_of_filtered_psms --> we take MS:1003251 (note that we may have several hits per spectrum for the psms, which is not quite correct) --> create issue that also works / exists with psms
number_of_proteins --> FDR-filtered --> MS:1003327
number_ungrouped_proteins --> FDR-filtered --> Local:08 --> Create issue, because it is missing 
pia_output_z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants