-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathsetup.properties
491 lines (403 loc) · 34.2 KB
/
setup.properties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
# ALGORITHM MAIN PARAMETERS, THRESHOLDS AND FILTERS:
#####################################################
inputType = FILE_TYPE
# This parameter takes into consideration the type/format of data for input into the program. The possible values for the file type are:
# CENSUS_CHRO - for data that was acquired based on isobaric isotopologue labeling of peptides
# CENSUS_OUT - from ip2 for data that was acquired in silac format.
# SEPARATED_VALUES - tab delimited data file containing the following columns:
# Column 1: PSM identifier
# Column 2: peptide sequence
# Column 3: ratio value (not log values!!).
# Column 4: ratio weight. This value can be any value measuring the quality of the ratio measurement. For example, for iTRAQ ratios, it could be the intensity of the higher peak contributing to the ratio; for SILAC it could be a measurement of the MS1 area divided by the fitting curve quality.
# Column 5: protein accession [OPTIONAL]. This column can be avoided if a fasta file is provided with the 'fastaFile' parameter.
isobaricRatioType = Rc/Ri
# This paramter defines which type of ratio is used when using isobaric isotopologues approach, that is, when inputType = CENSUS_CHRO.
# If this parameter is used with other inputType different than CENSUS_CHRO, it will throw an error. Similarly, if this parameter is empty or is not present when inputType - CENSUS_CHRO, it will throw an error.
quantChannels = TEXT/TEXT
# In case of having input files containing more than 2 channels (Light over Heavy), you should specify here which ones you want to use for getting the ratios of the analysis
# The format is a ratio of supported labels (separating them by '/')
# Possible values of labels are: LIGHT, HEAVY, MEDIUM, TMT_6PLEX_126, TMT_6PLEX_127, TMT_6PLEX_128, TMT_6PLEX_129, TMT_6PLEX_130, TMT_6PLEX_131, N14, N15
# Default value if not provided: LIGHT/HEAVY
ignorePTMs = TRUE/FALSE
# In case this toggle is set to TRUE, peptides modified and not modified will be considered as the same peptide. Otherwise, they will be considered as different and therefore, they will be displayed in different peptide nodes.
collapseIndistinguishablePeptides = TRUE/FALSE
# In case this toggle is set to TRUE peptide nodes of individual peptides that are connected to exactly the same protein nodes are collapsed into one single peptide node. It reduces peptide node redundancies in connected components.
# Default value if not provided: TRUE
collapsePeptidesBySites = TEXT, TEXT, ...
# In case of having site specific quantitative data, you can specify here in which aminoacid(s) you get the ratios for, and therefore PCQ will perform differently 2 things:
# - PCQ will parse isobaric isotopologues ratios accordingly to the sites (discarding isobaric ratios from ions that do not exclusively contribute to the site specificity of the quantitation).
# - PCQ will collapse peptide nodes by sites in proteins. So different peptides that cover the same site of the same proteins, are collapsed together.
# The values of this property is a comma separated list of aminoacid letters, or just one aminoacid letter if the quantitation has only been performed in one particular aminoacid.
# If this property is not present or is empty, the peptides will be collapsed by how they are shared with proteins
collapsePeptidesByPTMs = TEXT,TEXT
# In case of having a PTM analysis you may want to quantify the PTM positions in the proteins. In other words, quantitative values for peptides containing the same PTM will be averaged together even when they are not coming from peptides with the same sequence.
# For that purpose, this parameter will specify which PTMs you want to use to collapse quantitative values.
# The value of this parameter is a comma separated string collection in which each string is like:
# - "+79.96@PST", meaning, ptms with a mass shift of +79.96 at P,S or T aminoacids
# - "K", meaning, any ptm at K aminoacid
# - "+79.96", meaning, any aminoacid with mass shift of +79.96
# Default value if not provided: FALSE
createProteinPTMStates = TRUE/FALSE
# In case of enabling this parameter (TRUE), protein nodes connected to peptides with modifications will be split to represent the different modifications states that a protein may have, keeping all the possibilities based on their peptides containing PTMs.
# For example, for a protein p with 3 different PTM sites (PTM1-3), there will be 2^3 = 8 possible protein nodes: non modified p, p with PTM1,p with PTM2,p with PTM3, p with PTM1 and PTM2, p with PTM1 and PTM3, p with PTM2 and PTM3, p with PTM1 and PTM2 and PTM3.
# In case of having collapsePeptidesByPTMs=FALSE, this parameter will be ignored.
# Default value if not provided: FALSE
maxNumPTMsPerProtein = NUMERIC_VALUE
# In case of setting parameters collapsePeptidesByPTMs and createProteinPTMStates set to TRUE, PCQ will collect all detected PTM sites for each protein and then, it will create a protein node with all combinations of PTMs that support the detected peptides.
# The example above shows how a protein with 3 different PTMs can derive in 8 different protein nodes. However, you can imagine that a protein with multiple PTM sites could end up generating a huge number of protein nodes which most of them are not interesting. For that reason, this parameter will limit when a protein with multiple PTMs will be split into multiple protein nodes, and that limit will be determined by the number of different PTM sites in the protein.
# Default value if not provided: 4
# Recommended values: any value less than 8.
collapseIndistinguishableProteins = TRUE/FALSE
# In case this toggle is set to TRUE protein nodes of individual proteins that are connected to exactly the same peptide nodes are collapsed into one single protein node. It reduces protein node redundancies in a connected component.
# Default value if not provided: TRUE
removeFilteredNodes = TRUE/FALSE
# In case this toggle is set to TRUE and some filters are applied, if some peptide node is filtered-out, it will be removed from the network, and therefore the network will be rearranged accordingly.
# In case of being set to FALSE, filtered-out nodes will remain in the output Cytoscape networks (.XGMML files), but they will be color in gray.
# In any case, filtered-out nodes will not be considered for the ratio integration algorithm or for the classification schematas.
onlyOneSpectrumPerChromatographicPeakAndPerSaltStep = TRUE/FALSE
# This parameter is only valid MS1-based quantitation data (like SILAC) and inputType = CENSUS_OUT.
# When this parameter is set TRUE, it means that that only the best PSM will be taken from each chromatographic peak and salt step irrespective whether the quantification was done based on the heavy or the light peak that was identified. It prevents that a chromatographic pair gets included in the calculations twice because it was identified in light as well as in heavy. Note that using DTASelect the parameter -t 1 should be used prior to quantification and use of PCQ.
ionsPerPeptideNodeThresholdOn = TRUE/FALSE
# Toggle that when set to TRUE a minimum amount of ions per peptide node is required in order to not being discarded.
# Default value if not provided: FALSE.
ionsPerPeptideNodeThreshold = NUMERIC_VALUE
# This parameter works with the parameter 'ionsPerPeptideNodeThresholdOn' to set the minimum amount of ions per peptide node to consider it in further analysis.
# This filter is only applied if fileType = CENSUS_CHRO, that is, for data acquired based on isobaric isotopologue labeling of peptides.
# Default value if not provided: 0.
psmsPerPeptideNodeThresholdOn = TRUE/FALSE
# Toggle that when set TRUE, checks if there is a threshold for minimum PSMs per peptide node.
# Default value if not provided: FALSE.
psmsPerPeptideNodeThreshold = NUMERIC_VALUE
# This parameter works with the parameter 'psmsPerPeptideNodeThresholdOn' to define a minimum number of PSMs per peptide node to consider it in further analysis.
# Default value if not provided: 0.
replicatesPerPeptideNodeThresholdOn = TRUE/FALSE
# Toggle that when set TRUE, checks if there is a threshold for minimum Replicates per peptide node.
# Default value if not provided: FALSE.
replicatesPerPeptideNodeThreshold = NUMERIC_VALUE
# This parameter works with the parameter 'replicatesPerPeptideNodeThresholdOn' to define a minimum number of Replicates per peptide node to consider it in further analysis.
# Default value if not provided: 0.
skipSingletons = TRUE/FALSE
# This parameter is only valid MS1-based quantitation data (like SILAC) and inputType = CENSUS_OUT.
# When this parameter is set to TRUE, the singletons (quantified peptides only detected in one of the two conditions) are skipped and not considered in further analysis.
# Default value if not provided: FALSE.
labelSwap = TRUE/FALSE
# This parameter should only be set to TRUE when a label swap experiment is analyzed, e.g. experiments are analyzed simultaneously in which sample 1 was labeled light and sample 2 heavy in one experiment(s) and sample 1 labeled heavy and sample 2 labeled light in another set of experiments. NOTE: use correct operators to denote experiments in parameter 'inputFiles'.
# Default value if not provided: FALSE.
useMajorityRulesForInfinities = TRUE/FALSE
# This parameter states how to proceed when averaging set of ratios containing INFINITY values (either positive or negative).
# If this parameter is set to true, the mayority rule is applied, which means that if there are more non-infinity ratios than infinities,
# then it will report the average of the ratios, otherwise report the infinity.
# More detailed cases are shown in these examples:
# - If useMajorityRulesForInfinities=TRUE and we have +INF, +INF, +INF, +2.5, +1.5, then the averaged value will be +INF.
# - If useMajorityRulesForInfinities=FALSE and we have +INF, +INF, +INF, +2.5, +1.5, then the averaged value will be +2.0.
# - If useMajorityRulesForInfinities=TRUE and we have +INF, +INF, +INF, +2.5, +1.5, +4.0 then the averaged value will be +4.0.
# - If useMajorityRulesForInfinities=FALSE and we have +INF, +INF, +INF, +2.5, then the return averaged will be +2.5.
# Default value if not provided: TRUE.
##########################################
# PEPTIDE NODE RATIO INTEGRATION ALGORITHM
##########################################
# Minimal parameters needed to perform the quantitative analysis within the protein-peptide network based on the publication Mol Cell Proteomics. 2011 Jan;10(1):M110.003335. doi: 10.1074/mcp.M110.003335. Epub 2010 Aug 31.
performRatioIntegration = TRUE/FALSE
# The parameter toggles whether PCQ does apply the SanXot algorithm to calculate the final peptide node ratios. If the parameter is set to FALSE, the final peptide node ratios will be calculated as average of all individual quantitative ratios (in case of SILAC data) or as Ion Count ratios (Rc) (see publication for details) in case of isobaric isotopologues quantitation.
# Default value if not provided: FALSE.
sanxotPath = FOLDERPATH
# If performRatioIntegration is enabled, SanXot scripts will be used in order to calculate the final peptide node ratios.
# SanXot scripts are available under request to the Jesús Vázquez Cobos proteomics group at the CNIC, Spain.
outliersRemovalFDR = NUMERIC_VALUE
# At each level of quantitative data integration (for example from PSM to peptide) outlier values are removed in case they are below a specified FDR. It allows to remove measurement values that are a result of an error of measurement.
# In case no numeric value is specified, the removal of outliers will not be performed.
# Examples: outliersRemovalFDR = 0.01
significantFDRThreshold = NUMERIC_VALUE
# This parameters refers to a threshold applied in the final ratio of the peptide node calculated after the ratio integration algorithm. The algorithm will report a FDR associated to each final peptide node ratio, and this threshold will determine if the peptide node is considered as significantly changing.
# If this parameter is set, some statistics about the number of significantly regulated peptide nodes will be available in the summary file and an additional network file (XGMML) will be create containing the cluster that contains at least one peptide node with a FDR under the threshold.
# If not provided, no statistics and additional network file is created.
# Example: significantFDRThreshold = 0.05
####################################
# PROTEIN PAIR ANALYSIS PARAMETERS
####################################
applyClassificationsByProteinPairs = TRUE/FALSE
# If this parameter is set to TRUE, the protein pair analysis is performed. If is set to FALSE, all the other parameters in this section will be ignored.
# Default value if not provided: FALSE.
thresholdForSignificance = NUMERIC_VALUE
# This parameter dictates the threshold of the fold chance (NOT as log2 value) in determining significantly regulated or not in the protein pair analysis. This parameter is for marking significance and the higher the value, the larger the difference in ratios of peptides or proteins required to be considered significantly regulated.
# This parameters is ignored if applyClassificationsByProteinPairs = FALSE.
# Default value if not provided: 2.
statisticalTestForProteinPairApplied = TRUE/FALSE
# If this parameter is set to TRUE, a Iglewicz-Hoaglin statistical test is applied for detecting outliers in a protein pair analysis.
# This test will only be performed when having enough measurements (10 individual ratios) in the protein pair.
iglewiczHoaglinTest = NUMERIC_VALUE
# This parameter sets the value for the Iglewicz-Hoaglin test. Usually 3.5 is considered a standard threshold value. This is a test applied in the protein pair analsysis to check if there are any peptide node outliers in each protein pair. If the results from this test equal or is greater than the value of the variable, it will be considered an outlier in the protein pair analysis.
# The value calculated for this test is, giving a protein pair and testing the peptide node with ratio xi, the result of the test is calculated as Mi=0.6745(xi\u2212x~)/MAD where x~ is the median of the population and MAD is the median absolute deviation of the population, considering the population all the individual PSM-level ratios in the rest of the peptide nodes of the protein pair.
# Default value if not provided: 3.5
# This parameter is ignored if applyClassificationsByProteinPairs = FALSE.
uniquePepOnly = TRUE/FALSE
# This parameter, when set to TRUE, directs PCQ to compile protein pairs with peptide nodes that are unique within a connected component. When set to FALSE a 'protein pair centric' view is applied meaning that protein pairs are assembled independent of whether a peptide node for a protein pair is unique within the connected component.
# Default value if not provided: TRUE.
#######################################
# PEPTIDE SEQUENCE ALIGNMENT PARAMETERS
#######################################
# As an option for the user, peptide-to-peptide sequence alignments can be performed in order to detect analogous and orthologous sequences and cluster them in the same connect component. This could be useful for a multi-species comparison analysis.
makeAlignments = TRUE/FALSE
# This parameter, when set to TRUE, aligns the peptides by similarity and denotes the similarity by an edge within the output protein-peptide network (XGMML files).
# This parameter, as well as the rest of the parameters in this section are based of the Needleman-Wunsch alignment algorithm. This algorithm takes into consideration point mutations, deletions and insertions in peptide sequences and uses the BLOSUM62 matrix.
# Note that it is only executed in case peptide nodes are based on single, individual peptide sequences (e.g. the value of the parameter 'collapseIndistinguishablePeptides' is set to FALSE), otherwise PCQ will report an error.
# Default value if not provided: FALSE.
finalAlignmentScore = NUMERIC_VALUE
# This parameter sets the limit for the final alignment score from the Needleman-Wunsch alignment algorithm. Only alignments of peptide sequences with an score greater than that will be reported in the results.
# Default value if not provided: 30.
# Example: finalAlignmentScore = 30
sequenceIdentity = NUMERIC_VALUE
# This parameter sets the threshold on the sequence identify score from the Needleman-Wunsch alignment algorithm. Only alignments above that threshold will be considered in the analyisis.
# Default value if not provided: 0.8.
# Example: sequenceIdentity = 0.8
minConsecutiveIdenticalAlignment = NUMERIC_VALUE
# This value determines how many aminoacids have to consecutively be identical to be considered a valid alignment.
# Default value if not provided: 6.
# Example: minConsecutiveIdenticalAlignment = 6
############################################################################
# PSEA-Quant input file generator
############################################################################
writePSEAQuantInputFiles = TRUE/FALSE
# whether to write or not output files that will be able to serve as input files for the PSEA-Quant annotation enrichment analysis.
############################################################################
# PARAMETERS TO PROPERLY READ IN THE DATABASE AND ASSIGN PEPTIDES TO PROTEINS:
############################################################################
# In case the input files doesn't contains all the peptide-to-protein connections (because the software who generated them remove some of the subset proteins or performed a protein inference heuristic),
# you can use a protein sequence database in order to map all the possible peptide-to-protein connections without losing any isoform for example.
# Note: These parameters should be set according to the initial parameters used to search the database for identified peptide hits as well as how the search results were filtered.
enzymeArray = TEXT, TEXT, ...
# This parameter specify the aminoacids where protease cleaves. Note: amino acids that are recognPSEized as cleavage site are provided as comma-separated (',') values.
# Trypsin cleaves at K and R, so 'K,R' should set here.
# LysC cleaves at K only, so 'K' should be set here.
# Default value if not provided: K,R
# Example: enzymeArray = K,R
missedCleavages = NUMERIC_VALUE
# This parameter allows the user to select how many missed cleavages were allowed in the search of the spectra.
# Default value if not provided: 0.
# Example: missedCleavages = 2
semiCleavage = TRUE/FALSE
# This parameter allows the user to allow semi cleavaged peptides or not.
# Default value if not provided: FALSE
# Example: semiCleavage = TRUE
discardDecoys = JAVA_REGEXP
# Regular expression for discarding decoys from input files.
# If empty or not provided, no decoys will be discarded from input files.
# Example: discardDecoys = Reverse_
ignoreNotFoundPeptidesInDB = TRUE/FALSE
# If this parameter is set to TRUE, PCQ will check that all the peptides provided in the input files are found in the FASTA file (if provided). This is useful in order to check that the fasta in-silico digestion parameters are set correctly as well as in order to ensure that the correct database is being used.
# Default value if not provided: FALSE.
fastaFile = FILEPATH
# Full path in which a FASTA database file is located
# C-TERM OR N-TERM CLEAVAGE PARAMETER IS MISSING. --- ANY OTHER PARAMETERS MISSING HERE???????
peptideFilterRegexp = JAVA_REGEXP
# Regular expresion for discarding peptides from input files and fasta file indexing
lookInUniprotForProteoforms = TRUE/FALSE
# If this parameter is TRUE, PCQ will consider all the proteoforms for the input proteins that are available
# in UniprotKB. This will include available isoforms but also alternative products, mutations, sequence conflicts, etc...
# Even if these proteins were not searched before, they will be taken into account when building the network
# between proteins and peptides.
# Default value if not provided: FALSE.
##########################
# INPUT FILES AND FOLDERS:
##########################
inputFilePath = FILEPATH
# Full path of the folder in which all the input files are located.
# Default value if not provided: the system user's home folder.
# Example: inputFilePath = C:\\Users\\Salva\\Desktop\\data\\PINT projects\\Fibulin\\LacZ
inputFiles = EXP_NAME[FILE_NAME1, FILE_NAME2, ...] | EXP2_NAME[FILE_NAME4, FILE_NAME5, ...]
# This parameter specifies the quantification input files to be used and groups them into groups, in order to be treated properly. EXP_NAME is an internal tag for the group of files provided subsequently between brackets. Several groups are separated by '|'. Several file names are separated by ','. File names don't include path names, because they are expected to be found in the folder specified by the inputFilePath parameter.
# This parameter is required.
# Examples:
# DmDv[DmDv_rep1.xml, DmDv_rep2.xml, DmDv_rep3.xml] | DvDm[DvDm_rep1.xml, DvDm_rep2.xml, DvDm_rep3.xml]
# A_Embryos[A_Embryos_rep1.xml, A_Embryos_rep2.xml, A_Embryos_rep3.xml]
# isobaric_HEK[isobaric_HEK.xml]
# CFBEvsHSB[CFBEvsHBE.xml]
# Fibulin_LacZ[Fibulin_LacZ_rep1.txt, Fibulin_LacZ_rep2.txt, Fibulin_LacZ_rep3.txt]
inputIDFiles = EXP_NAME[FILE_NAME1, FILE_NAME2, ...] | EXP2_NAME[FILE_NAME4, FILE_NAME5, ...]
# This parameter specifies the identification input files to be used and groups them into groups, in order to be treated properly. EXP_NAME is an internal tag for the group of files provided subsequently between brackets. Several groups are separated by '|'. Several file names are separated by ','. File names don't include path names, because they are expected to be found in the folder specified by the inputFilePath parameter.
# This parameter is required.
# Examples:
# DmDv[DmDv_rep1.xml, DmDv_rep2.xml, DmDv_rep3.xml] | DvDm[DvDm_rep1.xml, DvDm_rep2.xml, DvDm_rep3.xml]
# A_Embryos[A_Embryos_rep1.xml, A_Embryos_rep2.xml, A_Embryos_rep3.xml]
# isobaric_HEK[isobaric_HEK.xml]
# CFBEvsHSB[CFBEvsHBE.xml]
# Fibulin_LacZ[Fibulin_LacZ_rep1.txt, Fibulin_LacZ_rep2.txt, Fibulin_LacZ_rep3.txt]
uniprotReleasesFolder = FILEPATH
# This parameter points to the folder where the UniProt annotations are going to be automatically stored in the machine were PCQ runs. UniProt annotations are retrieved from the latest version of UniProtKB in order to associate gene names, taxonomies, etc... that may not be available in the protein description line provided in the input files.
# This process may take some time and some disk space, depending on the dataset size. However, once the annotations are locally stored, PCQ will retrieve the annotations from the local resource, taking just a few seconds.
# Default value if not provided: the system user's home directory.
# Example: uniprotReleasesFolder = C:\\Users\\Salva\\Desktop\\tmp\\uniprotKB
uniprotVersion = FORMATED_DATE
# This parameter states the Uniprot release to use, corresponding to the subfolder name created in the folder stated by the uniprotReleasesFolder.
# Leave blank for getting latest version annotations. In that case, each month, UniProtKB releases a new version of the database and therefore PCQ will retrieve the information from Uniprot again and will store a new version of the annotations, creating a subfolder with the release date.
# Example: uniprotVersion = 2016_07
outputFilePath = FILEPATH
# This parameter points to the folder in which all output files will be generated on computer.
# Default value if not provided: system user's home folder.
# Example: outputFilePath = C:\\Users\\Salva\\Desktop\\data\\Fibulin\\LacZ\\quant
outputPrefix = TEXT
# Prefix appended to all output files.
# This parameter is required.
# Example: outputPrefix = Fibulin_LacZ_OLD_DB
outputSuffix = TEXT
# Suffix appended to all output files.
# This parameter is required.
# Example: outputSuffix = 3PSMsPerPeptideNode_NoSingletons
recognizeACCFormat = TRUE/FALSE
# if this parameter is set to TRUE, PCQ will try to extract a known formatted accession from the accessions in the input files.
# For example, having sp|P12345|P2P_HUMAN, it will be recognized as an Uniprot accession as P12345
# Set this parameter to FALSE, any accession will be parsed as it is. This will be appropriate for non standard accessions in order to save parsing time.
printPTMPositionInProtein = TRUE/FALSE
# if this parameter is set to TRUE, PCQ will print two new columns: one with the position of the PTM of interest on the protein sequence and another one with the position of the PTM on the peptide.
# Default value is not provided: TRUE
###############################
# SPECIES COMPARISON PARAMETERS
###############################
# These parameters are meant for experiments in which each of the sample used for relative quantification may be from a different species. If two different species have been labelled heavy and light, respectively, these parameters will allow the program to estimate the number of incorrect quantifications. Needs to match with the scientific annotation of the species in Uniprot.
ignoreTaxonomies = TRUE/FALSE
# When this is set to TRUE, taxonomies from proteins will be ignored and the following parameters will be also ignored.
# This parameter should be set to TRUE when analyzing single species samples if the number of proteins is big and the proteins have not UniprotKB accessions,
# because to parse taxonomy from protein description is expensive for big datasets.
lightSpecies = TEXT
# Species that was labeled light. When this parameter is set, some statistics are shown in the output summary file.
# Example: lightSpecies = Drosophila virilis
heavySpecies = TEXT
# Species that was labeled heavy. When this parameter is set, some statistics are shown in the output summary file.
# Example: heavySpecies = Drosophila virilis
##############################################
# GRAPHIC OPTIONS FOR CYTOSCAPE EXPORT NETWORK
##############################################
printOnlyFirstGene= TRUE/FALSE
# When this is set to TRUE, the program prints only the first gene when collapsing several proteins.
# This is applicable to the output files, but it is particularly useful for the tooltip on the proteins in the Cytoscape network visualization in order to avoid over-crowded tooltip text boxes.
# Color settings are encoded in hex-RGB color code
colorsByTaxonomy = TEXT, hex-RGB
# Defines the color of the protein nodes depending to the taxonomy of the protein(s) in the node. It is a comma separated string were the first string is the scientific name of the taxonomy and the second string is the hex-RGB color code. If more than one taxonomy is present, it should be concatenated after a comma (see example).
# Default value if not provided: WHITE color.
# Example: colorsByTaxonomy = Drosophila Virilis,#ff0000,Drosophila Melanogaster,#00ff00
colorMultiTaxonomy = hex-RGB
# Defines the color the protein nodes that contains more than one taxonomy.
# Default value if not provided: WHITE color.
# Example: colorMultiTaxonomy = #123456
colorAlignedPeptidesEdge = hex-RGB
# Defines the color of the edge which connects two peptide nodes that have been aligned (makeAlignments = TRUE).
# Default value if not provided: RED color.
proteinLabel = PROTEIN_LABEL_TYPE
# This parameter indicates which information will be displayed as the label of the protein nodes in the output Cytoscape networks. The possible values of this parameter are:
# ACC. The labels on the protein nodes will be as: 'P12345' or 'P12345, O12345'.
# ID. The labels on the protein nodes will be as: 'APO_HUMAN'
# GENE. The labels on the protein nodes will be as: 'ALDOA'
# Default value if not provided: ACC.
minimumRatioForColor = NUMERIC_VALUE
# This parameter states the lower value of a log2 ratio range defining the gradient of colors between colorRatioMin and colorRatioMax.
# Any log2 ratio value lower than this value will be showed with the color defined in parameter colorRatioMin.
# Default value if not provided: -10.
maximumRatioForColor = NUMERIC_VALUE
# This parameter states the higher value of a log2 ratio range defining the gradient of colors between colorRatioMin and colorRatioMax.
# Any log2 ratio value higher than this value will be showed with the color defined in parameter colorRatioMax.
# Default value if not provided: 10.
colorRatioMin = hex-RGB
# This parameter defines lower color of the color gradient for the peptide nodes.
# The peptides nodes will be shown with a color depending on their final peptide node ratio value and the ratio range defined by minimumRatioForColor and maximumRatioForColor.
# Default value if not provided: "#0000ff" (BLUE).
colorRatioMax = hex-RGB
# This parameter defines higher color of the color gradient for the peptide nodes.
# The peptides nodes will be shown with a color depending on their final peptide node ratio value and the ratio range defined by minimumRatioForColor and maximumRatioForColor.
# Default value if not provided: '#ff0000 (RED).
colorNonRegulatedPeptides = hex-RGB
# If provided, this parameter defines the color of the peptide nodes that result to be non significatively (up/down)-regulated.
# If not provided, the peptide nodes will be colored using the color scale defined by colorRatioMin and colorRatioMax.
proteinNodeWidth = NUMERIC_VALUE
# This parameters defines the width of the protein nodes in the Cytoscape network view.
# Default value if not provided: 70.
proteinNodeHeight = NUMERIC_VALUE
# This parameters defines the height of the protein nodes in the Cytoscape network view.
# Default value if not provided: 30.
peptideNodeWidth = NUMERIC_VALUE
# This parameters defines the width of the peptide nodes in the Cytoscape network view.
# Default value if not provided: 70.
peptideNodeHeight = NUMERIC_VALUE
# This parameters defines the height of the peptide nodes in the Cytoscape network view.
# Default value if not provided: 30.
proteinNodeShape = SHAPE_TYPE
# This parameter defines the shape of the protein nodes in the Cytoscape network view. The possible values of this parameter are:
# ELLIPSE, RECTANGLE, TRIANGLE, DIAMOND, HEXAGON, OCTAGON, PARALLELOGRAM, ROUNDRECT, VEE
# Default value if not provided: ELLIPSE.
peptideNodeShape = SHAPE_TYPE
# This parameter defines the shape of the peptide nodes in the Cytoscape network view. The possible values of this parameter were described above.
# Default value if not provided: ROUNDRECT.
showCasesInEdges = TRUE/FALSE
# If this parameter is set to TRUE, the edges will contain a label showing the classification cases for the protein pair analysis.
# This parameter will not have any effect if significantProteinPairAnalysis = FALSE.
# Default value if not provided: TRUE.
remarkSignificantPeptides = TRUE/FALSE
# If this parameter is set to TRUE, the border of significant peptides nodes and edges will be shown with the highlight color.
# Default value if not provided: TRUE.
uniprot_xpath = [XPATH, SUB_XPATH, COLUMNNAME]
# New columns to include in XGMML files, so that it can be queried in Cytoscape.
# Using the Uniprot XML structure (http://www.uniprot.org/docs/uniprot.xsd), you can specify an specific annotation to include in the XGMML files.
# An example of a Uniprot XML file can be found at: http://www.uniprot.org/uniprot/B4P3Q9.xml
# Values in this property are a list of triplets enclosed in brackets.
# Each triplet is composed by 3 elements (separated by commas):
# - XPATH: XPath to the XML element containing the information you want to include. It may have a condition value (see examples below)
# - SUB_XPATH: XPath over the latest (deepest) element in the previous XPATH which indicates the actual value to retrieve from the XML.
# - COLUMNNAME: name to assign to that property in the XGMML. It will appear as a new column in the data tables in Cytoscape.
#
# You can specify as many triplets as you want (no separation character): [tri,plet,1] [tri,plet,2]
#
# Example 1:
# Consider the following part of one Uniprot XML:
#
# <dbReference type="Proteomes" id="UP000002282">
# <property type="component" value="Unassembled WGS sequence"/>
# </dbReference>
# <dbReference type="GO" id="GO:0005739">
# <property type="term" value="C:mitochondrion"/>
# <property type="evidence" value="ECO:0000501"/>
# <property type="project" value="UniProtKB-SubCell"/>
# </dbReference>
# <dbReference type="GO" id="GO:0005524">
# <property type="term" value="F:ATP binding"/>
# <property type="evidence" value="ECO:0000501"/>
# <property type="project" value="UniProtKB-KW"/>
# </dbReference>
#
# You may be interested on including all the GO term names for your proteins, so you want a new column that for this particular protein
# will have the value: "C:mitochondrion, F:ATP binding"
#
# In order to tell to PCQ to retrieve those annotations you will have to write a triplet like:
# - XPATH: dbReference$type=GO
# - SUB_XPATH: property$value
# - COLUMNNAME: GO_Term_Name
# So it would be:
# uniprot_xpath = [dbReference$type=GO, property$value, GO_Term_Name]
#
# Note that the first XPATH indicates the element you are using to filter the annotations, saying that you want all 'dbReference' elements having
# an attribute 'type' equals to "GO". The second SUB_XPATH indicates the xpath starting on 'dbReference' element, for the data to retrieve
# so that you want the attribute 'value' of the subelement 'property' of the element 'dbReference'.
#
#
#
# Example 2:
# Consider the following part of one Uniprot XML:
#
# <comment type="subcellular location">
# <subcellularLocation>
# <location evidence="1">Mitochondrion</location>
# </subcellularLocation>
# </comment>
#
# You may be interested on including all the subcellular locations of your proteins, so you want a new column that for this particular protein
# will have the value: "Mitochondrion"
#
# In order to tell to PCQ to retrieve those annotations you will have to write a triplet like:
# - XPATH: comment$type=subcellular location
# - SUB_XPATH: subcellularLocation/location
# - COLUMNNAME: Subcellular_Location
# So it would be:
# uniprot_xpath = [comment$type=subcellular location, subcellularLocation/location, Subcellular_Location]
#
# Note that the first XPATH indicates the element you are using to filter the annotations, saying that you want all 'comment' elements having
# an attribute 'type' equals to "subcellular location". The second SUB_XPATH indicates the xpath starting on 'çomment' element, for the data
# to retrieve so that you want the text value (note that here there is no attribute notation '$') of the subelement location in the subelement
# subcellularLocation on the element 'comment'.