Skip to content
trishorts edited this page Feb 14, 2022 · 1 revision

The mass accuracy of MS1 and MS2 spectra gathered during a proteomics experiment can vary significantly over the course of a single run and over the course of several runs. Systematic drift, random noise, and changes in temperature and other environmental conditions can contribute to this variation. Therefore, spectral mass calibration prior to the final analysis can improve peptide identification accuracy. MetaMorpheus uses a machine-learning algorithm to calibrate both MS1 and MS2 spectra. The process begins with a preliminary search of the uncalibrated file to identify a set of confident peptide spectral matches. Mass spectral peaks of confident PSMs are the calibration points, accompanied by several additional values, including: the difference between observed m/z and theoretical m/z (the “m/z error”), the absolute m/z, the retention time, the total ion current, and the ion injection time. All of these values serve as input to a random forest machine-learning algorithm that performs a regression analysis to model the m/z error as a function of the above explanatory variables. This function is used to shift the m/z of all peaks in all scans in the run. The calibrated spectra file is then used for a complete proteomics analysis.

Creating a New Calibrate Task

  • Load .raw or .mzML data files
  • Load database(s)
  • Select "New Calibrate Task" tab
  • Search Parameters: Make the appropriate adjustments to the settings. Try to use standard search tolerance (15ppm precursor and 25ppm product for high resolution orbitrap data)
  • Modifications: Use standard modifications (e.g. fixed carbamidomethylation, variable oxidation and localize all modifications checked if using .xml database)
  • Select "Add the Calibration Task"
  • Run all tasks!

What should I expect?

The calibration task can take several minutes. Confident peptide identifications are made and used to calibrate the MS1 and MS2 spectra in the vicinity of that scan. The "vicinity" of the scan changes depending on the density of identifications. The calibration function considers retention time, m/z and charge state.

Output

A calibrated .mzML file is produced for each input data file with the text "-calib" appended to the original filename. File format for each .mzML file is version 1.1.0

A .toml file is produced for each calibrated .mzML file and shares the same filename. The .toml file contains recommended precursor and product mass tolerances for each file, which are equal to four times the interquartile range of the mass errors observed in PSMs at a 1% FDR. As long as the .toml file is located in the same directory as the .mzML file, the recommended tolerances specified in the .toml will be used.

A results file is generated that specifies the median mass difference and interquartile range for PSM delta mass before and after calibration.

Troubleshooting

Input files are occasionally so badly calibrated that no training points are found. This causes a failure to calibrate. A good remedy to this problem is to increase the parent mass tolerance somewhat. In practice, we often do a standard search of the uncalibrated data using a liberal parent mass tolerance. Then we create a histogram with ppm mass error on the x-axis to see the range of identifications. Then we set the calibration tolerance accordingly.

Clone this wiki locally