-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
305 lines (120 loc) · 6.47 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
=item --format
This may theoretically be any IO subsytem and the format understood by
that subsystem to parse the input file(s). IO subsytem and format must
be separated by a double colon. See below for which subsystems are
currently supported.
The default IO subsystem is TreeIO. 'Bio::' will automatically be
prepended if not already present. As of now the other supported
subsystem is ClusterIO. All input files must have the same format.
format used by Bio::TreeIO:
#newick Newick tree format
#nexus Nexus tree format
#nhx NHX tree format
#svggraph SVG graphical representation of tree
#tabtree ASCII text representation of tree
#lintree lintree output format
=item --fmtargs
Use this argument to specify initialization parameters for the parser
for the input format. The argument value is expected to be a string
with parameter names and values delimited by commas.
Usually you will want to protect the argument list from interpretation
by the shell, so surround it with double or single quotes.
If a parameter value contains a comma, escape it with a backslash
(which means you also must protect the whole argument from the shell
in order to preserve the backslash)
Examples:
# turn parser exceptions into warnings (don't try this at home)
--fmtargs "-verbose,-1"
# verbose parser with an additional path argument
--fmtargs "-verbose,1,-indexpath,/home/luke/warp"
# escape commas in values
--fmtargs "-myspecialchar,\,"
=item --pipeline
This is a sequence of Bio::Factory::SeqProcessorI (see
L<Bio::Factory::SeqProcessorI>) implementing objects that will be
instantiated and chained in exactly this order. This allows you to
write re-usable modules for custom post-processing of objects after
the stream parser returns them. See L<Bio::Seq::BaseSeqProcessor> for
a base implementation for such modules.
Modules are separated by the pipe character '|'. In addition, you can
specify initialization parameters for each of the modules by enclosing
a comma-separated list of alternating parameter name and value pairs
in parentheses or angle brackets directly after the module.
This option will be ignored if no value is supplied.
Examples:
# one module
--pipeline "My::SeqProc"
# two modules in the specified order
--pipeline "My::SeqProc|My::SecondSeqProc"
# two modules, the first of which has two initialization parameters
--pipeline "My::SeqProc(-maxlength,1500,-minlength,300)|My::SecondProc"
=item --seqfilter
This is either a string or a file defining a closure to be used as
sequence filter. The value is interpreted as a file if it refers to a
readable file, and a string otherwise. See add_condition() in
L<Bio::Seq::SeqBuilder> for more information about what the code will
be used for. The closure will be passed a hash reference with an
accumulated list of initialization paramaters for the prospective
object. It returns TRUE if the object is to be built and FALSE
otherwise.
Note that this closure operates at the stream parser level. Objects it
rejects will be skipped by the parser. Objects it accepts can still be
intercepted at a later stage (options --remove, --update, --noupdate,
--mergeobjs).
Note that not necessarily all stream parsers support a
Bio::Factory::ObjectBuilderI (see L<Bio::Factory::ObjectBuilderI>)
object. Email bioperl-l@bioperl.org to find out which ones do. In
fact, at the time of writing this, only Bio::SeqIO::genbank supports
it.
This option will be ignored if no value is supplied.
=item --mergeobjs
This is also a string or a file defining a closure. If provided, the
closure is called if a look-up for the unique key of the new object
was successful. Hence, it will never be called without supplying
--lookup at the same time.
Note that --noupdate will B<not> prevent the closure from being
called. I.e., if you make changes to the database in your merge script
as opposed to only modifying the object, --noupdate will B<not>
prevent those changes. This is a feature, not a bug. Obviously,
modifications to the in-memory object will have no effect with
--noupdate since the database won't be updated with it.
The closure will be passed three arguments: the object found by
lookup, the new object to be submitted, and the Bio::DB::DBAdaptorI
(see L<Bio::DB::DBAdaptorI>) implementing object for the desired
database. If the closure returns a value, it must be the object to be
inserted or updated in the database (if $obj->primary_key returns a
value, the object will be updated). If it returns undef, the script
will skip to the next object in the input stream.
The purpose of the closure can be manifold. It was originally
conceived as a means to customarily merge attributes or associated
objects of the new object to the existing (found) one in order to
avoid duplications but still capture additional information (e.g.,
annotation). However, there is a multitude of other operations it can
be used for, like physically deleting or altering certain associated
information from the database (the found object and all its associated
objects will implement Bio::DB::PersistentObjectI, see
L<Bio::DB::PersistentObjectI>). Since the third argument is the
persistent object and adaptor factory for the database, there is
literally no limit as to the database operations the closure could
possibly do.
This option will be ignored if no value is supplied.
=item --logchunk
If supplied with an integer argument n greater than zero, progress
will be logged to stderr every n entries of the input file(s). Default
is no progress logging.
=item --debug
Turn on verbose and debugging mode. This will produce a *lot* of
logging output, hence you will want to capture the output in a
file. This option is useful if you get some mysterious failure
somewhere in the events of loading or updating a record, and you would
like to see, e.g., precisely which SQL statement fails. Usually you
turn on this option because you've been asked to do so by a person
responding after you posted your problem to the Bioperl mailing list.
=item -u, -z, or --uncompress
Uncompress the input file(s) on-the-fly by piping them through
gunzip. Gunzip must be in your path for this option to work.
=item more args
The remaining arguments will be treated as files to parse and load. If
there are no additional arguments, input is expected to come from
standard input.
=back