-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathsetupDatasetsXml.html
15815 lines (14849 loc) · 942 KB
/
setupDatasetsXml.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>ERDDAP™ - Working with the datasets.xml File</title>
<meta charset="UTF-8">
<link rel="shortcut icon" href="https://coastwatch.pfeg.noaa.gov/erddap/images/favicon.ico">
<link href="../images/erddap2.css" rel="stylesheet" type="text/css">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<table class="compact nowrap" style="width:100%; background-color:#128CB5;">
<tr>
<td style="text-align:center; width:80px;"><a rel="bookmark"
href="https://www.noaa.gov/"><img
title="National Oceanic and Atmospheric Administration"
src="../images/noaab.png" alt="NOAA"
style="vertical-align:middle;"></a></td>
<td style="text-align:left; font-size:x-large; color:#FFFFFF; ">
<strong>ERDDAP™</strong>
<br><small><small><small>Easier access to scientific data</small></small></small>
</td>
<td style="text-align:right; font-size:small;">
<br>Brought to you by
<a title="National Oceanic and Atmospheric Administration" rel="bookmark"
href="https://www.noaa.gov">NOAA</a>
<a title="National Marine Fisheries Service" rel="bookmark"
href="https://www.fisheries.noaa.gov">NMFS</a>
<a title="Southwest Fisheries Science Center" rel="bookmark"
href="https://www.fisheries.noaa.gov/about/southwest-fisheries-science-center">SWFSC</a>
<a title="Environmental Research Division" rel="bookmark"
href="https://www.fisheries.noaa.gov/about/environmental-research-division-southwest-fisheries-science-center">ERD</a>
</td>
</tr>
</table>
<div class="standard_width">
<!-- <br><button type="button" onClick="history.go(-1);return true;">Back</button> -->
<h1 style="text-align:center">Working with the datasets.xml File</h1>
[This web page will only be of interest to ERDDAP™ administrators.]
<p>After you have followed the ERDDAP™
<a rel="help" href="https://erddap.github.io/setup.html">installation instructions</a>,
you must edit the datasets.xml file
in <i>tomcat</i>/content/erddap/ to describe the datasets that your ERDDAP™ installation will serve.
<h2><a class="selfLink" id="TableOfContents" href="#TableOfContents" rel="bookmark">Table of Contents</a></h2>
<ul>
<li><a rel="help" href="#introduction"><strong>Introduction</strong></a> (Please read all of this.)
<ul>
<li><a rel="help" href="#effort">Some Assembly Required</a>
<li><a rel="help" href="#DataProviderForm">Data Provider Form</a>
<li><a rel="help" href="#Tools">Tools</a>
<li><a rel="help" href="#basicStructure">The basic structure of the datasets.xml file</a>
<li><a rel="help" href="#xinclude">XInclude</a>
<br>
</ul>
<li><a rel="help" href="#notes">Notes</a> (Please read all of this.)
<ul>
<li><a rel="help" href="#useCtrlF">Use Ctrl-F To Find Things On This Web Page</a>
<li><a rel="help" href="#InternalLinks">Internal Links</a>
<li><a rel="help" href="#ChoosingTheDatasetType">Choosing the Dataset Type</a>
<li><a rel="help" href="#ServingTheDataAsIs">Serving the Data As Is</a>
<li><a rel="help" href="#encodingSpecialCharacters">Encoding Special Characters</a>
<li><a rel="help" href="#noSyntaxErrors">XML doesn't tolerate syntax errors.</a>
<li><a rel="help" href="#diagnoseProblems">Other Ways To Help Diagnose Problems With Datasets</a>
<li><a rel="help" href="#LLAT">The longitude, latitude, altitude (or depth), and time (LLAT)
variables are special.</a>
<li><a rel="help" href="#dataStructures">Why just two basic data structures?</a>
<li><a rel="help" href="#differentDimensions">What if the grid variables in the source dataset DON'T share the
same axis variables?</a>
<li><a rel="help" href="#projections">Projected Gridded Data</a>
<li><a rel="help" href="#dataTypes">Data Types</a>
<li><a rel="help" href="#MediaFiles">Media Files</a>
<li><a rel="help" href="#AwsS3Files">AWS S3 Files</a>
<li><a rel="help" href="#NcML">NcML</a>
<li><a rel="help" href="#NCO">NCO</a>
<li><a rel="help" href="#limits">Limits to the Size of a Dataset</a>
<li><a rel="help" href="#switchToACDD13">Switch to ACDD-1.3</a>
<li><a rel="help" href="#Zarr">Zarr</a>
<br>
</ul>
<li><a rel="help" href="#datasetTypes"><strong>List of Dataset Types</strong></a> (Read as needed)
<br>
<li><a rel="help" href="#datasetDescriptions">Detailed Descriptions of Dataset Types</a> (Read as needed)
<br>
<li><a rel="help" href="#details">Details</a> (Read as needed)
<ul>
<li><a rel="help" href="#cacheMinutes"><kbd><cacheMinutes></kbd></a>
<li><a rel="help" href="#convertInterpolateDatasetIDVariableExample"><kbd><convertInterpolateDatasetIDVariableExample></kbd></a>
<li><a rel="help" href="#convertInterpolateDatasetIDVariableList"><kbd><convertInterpolateDatasetIDVariableList></kbd></a>
<li><a rel="help" href="#convertToPublicSourceUrl"><kbd><convertToPublicSourceUrl></kbd></a>
<li><a rel="help" href="#dataImagePngBase64"><kbd>data:image/png;base64</kbd></a>
<li><a rel="help" href="#drawLandMask"><kbd><drawLandMask></kbd></a>
<li><a rel="help" href="#graphBackgroundColor"><kbd><graphBackgroundColor></kbd></a>
<li><a rel="help" href="#ipAddressMaxRequests"><kbd><ipAddressMaxRequests></kbd></a>
<li><a rel="help" href="#ipAddressMaxRequestsActive"><kbd><ipAddressMaxRequestsActive></kbd></a>
<li><a rel="help" href="#ipAddressUnlimited"><kbd><ipAddressUnlimited></kbd></a>
<li><a rel="help" href="#loadDatasetsMinMinutes"><kbd><loadDatasetsMinMinutes></kbd></a>
<li><a rel="help" href="#loadDatasetsMaxMinutes"><kbd><loadDatasetsMaxMinutes></kbd></a>
<li><a rel="help" href="#logLevel"><kbd><logLevel></kbd></a>
<li><a rel="help" href="#partialRequestMaxBytes"><kbd><partialRequestMaxBytes></kbd></a>
<li><a rel="help" href="#partialRequestMaxCells"><kbd><partialRequestMaxCells></kbd></a>
<li><a rel="help" href="#requestBlacklist"><kbd><requestBlacklist></kbd></a>
<li><a rel="help" href="#slowDownTroubleMillis"><kbd><slowDownTroubleMillis></kbd></a>
<li><a rel="help" href="#standardText">Standard Text</a>
<li><a rel="help" href="#subscriptionEmailBlacklist"><kbd><subscriptionEmailBlacklist></kbd></a>
<li><a rel="help" href="#unusualActivity"><kbd><unusualActivity></kbd></a>
<li><a rel="help" href="#updateMaxEvents"><kbd><updateMaxEvents></kbd></a>
<li><a rel="help" href="#user"><kbd><user></kbd></a>
<li><a rel="help" href="#dataset"><kbd><dataset></kbd></a>
<ul>
<li><a rel="help" href="#datasetID"><kbd>datasetID="..."</kbd></a>
<li><a rel="help" href="#active"><kbd>active="..."</kbd></a>
<li><a rel="help" href="#accessibleTo"><kbd><accessibleTo></kbd></a>
<li><a rel="help" href="#graphsAccessibleTo"><kbd><graphsAccessibleTo></kbd></a>
<li><a rel="help" href="#accessibleViaFiles"><kbd><accessibleViaFiles></kbd></a>
<li><a rel="help" href="#accessibleViaWMS"><kbd><accessibleViaWMS></kbd></a>
<li><a rel="help" href="#addVariablesWhere"><kbd><addVariablesWhere></kbd></a>
<li><a rel="help" href="#defaultDataQuery"><kbd><defaultDataQuery></kbd></a>
<li><a rel="help" href="#defaultGraphQuery"><kbd><defaultGraphQuery></kbd></a>
<li><a rel="help" href="#fgdcFile"><kbd><fgdcFile></kbd></a>
<li><a rel="help" href="#iso19115File"><kbd><iso19115File></kbd></a>
<li><a rel="help" href="#onChange"><kbd><onChange></kbd></a>
<li><a rel="help" href="#reloadEveryNMinutes"><kbd><reloadEveryNMinutes></kbd></a>
<li><a rel="help" href="#updateEveryNMillis"><kbd><updateEveryNMillis></kbd></a>
<li><a rel="help" href="#sourceCanConstrainStringEQNE"><kbd><sourceCanConstrainStringEQNE></kbd></a>
<li><a rel="help" href="#sourceCanConstrainStringGTLT"><kbd><sourceCanConstrainStringGTLT></kbd></a>
<li><a rel="help" href="#sourceCanConstrainStringRegex"><kbd><sourceCanConstrainStringRegex></kbd></a>
<li><a rel="help" href="#sourceNeedsExpandedFP_EQ"><kbd><sourceNeedsExpandedFP_EQ></kbd></a>
<li><a rel="help" href="#sourceUrl"><kbd><sourceUrl></kbd></a>
<li><a rel="help" href="#addAttributes"><kbd><addAttributes></kbd></a>
<li><a rel="help" href="#globalAttributes">Global Attributes / Global <kbd><addAttributes></kbd></a>
<!-- ul>
<li><a rel="help" href="#cdm_data_type">cdm_data_type</a>
<li><a rel="help" href="#globalDrawLandMask">drawLandMask</a>
<li><a rel="help" href="#history">history</a>
<li><a rel="help" href="#infoUrl">infoUrl</a>
<li><a rel="help" href="#institution">institution</a>
<li><a rel="help" href="#license">license</a>
<li><a rel="help" href="#sourceUrlAttribute">sourceUrl</a>
<li><a rel="help" href="#subsetVariables">subsetVariables</a>
<li><a rel="help" href="#summary">summary</a>
<li><a rel="help" href="#title">title</a>
</ul -->
<li><a rel="help" href="#axisVariable"><kbd><axisVariable></kbd></a>
<li><a rel="help" href="#dataVariable"><kbd><dataVariable></kbd></a>
<li><a rel="help" href="#variableAttributes">Variable Attributes / Variable <kbd><addAttributes></kbd></a>
<!-- ul>
<li><a rel="help" href="#actual_range">actual_range</a>
<li><a rel="help" href="#colorBar">Color Bar Attributes</a>
<li><a rel="help" href="#data_min">data_min and data_max</a>
<li><a rel="help" href="#variableDrawLandMask">drawLandMask</a>
<li><a rel="help" href="#ioos_category">ioos_category</a>
<li><a rel="help" href="#long_name">long_name</a>
<li><a rel="help" href="#missing_value">missing_value and _FillValue</a>
<li><a rel="help" href="#scale_factor">scale_factor and add_offset</a>
<li><a rel="help" href="#standard_name">standard_name</a>
<li><a rel="help" href="#units">units</a>
</ul -->
<li><a rel="help" href="#removeMVRows"><kbd><removeMVRows></removeMVRows></kbd></a>
<br>
</ul>
</ul>
<li><a rel="bookmark" href="#contact">Contact</a>
</ul>
<br>
<hr>
<h2><a class="selfLink" id="introduction" href="#introduction" rel="bookmark">Introduction</a></h2>
<p><a class="selfLink" id="effort" href="#effort" rel="bookmark"><strong>Some Assembly Required</strong></a>
<br>Setting up a dataset in ERDDAP™ isn't just a matter of pointing to the dataset's
directory or URL. You have to write a chunk of XML for datasets.xml which describes the dataset.
<ul>
<li>For gridded datasets, in order to make the dataset conform to ERDDAP's data structure for gridded data,
you have to identify a subset of the dataset's variables which share the same dimensions.
(<a rel="help" href="#dataStructures">Why?</a> <a rel="help" href="#differentDimensions">How?</a>)
<li>The dataset's current metadata is imported automatically.
But if you want to modify that metadata or add other metadata, you have to specify it in datasets.xml.
And ERDDAP™ needs other metadata, including <a rel="help" href="#globalAttributes">global attributes</a>
(such as infoUrl, institution,
sourceUrl, summary, and title) and <a rel="help" href="#variableAttributes">variable attributes</a>
(such as long_name and units).
Just as the metadata that is currently in the dataset adds descriptive information to the dataset,
the metadata requested by ERDDAP™ adds descriptive information to the dataset.
The additional metadata is a good addition to your dataset and helps ERDDAP™ do a better job of
presenting your data to users who aren't familiar with it.
<li>ERDDAP™ needs you to do special things with the
<a rel="help" href="#LLAT">longitude, latitude, altitude (or depth), and time variables</a>.
</ul>
If you buy into these ideas and expend the effort to create the XML for datasets.xml,
you get all the advantages of ERDDAP™, including:
<ul>
<li>Full text search for datasets
<li>Search for datasets by category
<li>Data Access Forms (<i>datasetID</i>.html) so you can request a subset of data in lots of different file formats
<li>Forms to request graphs and maps (<i>datasetID</i>.graph)
<li>Web Map Service (WMS) for gridded datasets
<li>RESTful access to your data
</ul>
Making the datasets.xml takes considerable effort for the first few datasets, but <strong>it gets easier</strong>.
After the first dataset, you can often re-use a lot of your work for the next dataset.
Fortunately, ERDDAP™ comes with two <a rel="help" href="#Tools">Tools</a> to help you create the XML for each
dataset in datasets.xml.
<br>If you get stuck, please send an email with the details to <kbd>erd dot data at noaa dot gov</kbd>.
<br>Or, you can join the <a rel="help"
href="#ERDDAPMailingList">ERDDAP™ Google Group / Mailing List</a>
and post your question there.
<p><a class="selfLink" id="DataProviderForm" href="#DataProviderForm" rel="bookmark"><strong>Data Provider Form</strong></a>
<br>When a data provider comes to you hoping to add some data to your ERDDAP,
it can be difficult and time consuming to collect all of the metadata
(information about the dataset) needed to add the dataset into ERDDAP.
Many data sources (for example, .csv files, Excel files, databases)
have no internal metadata,
so ERDDAP™ has a Data Provider Form which gathers metadata
from the data provider and gives the data provider
some other guidance, including extensive guidance for
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm1.html#databases"
>Data In Databases</a>.
The information submitted is converted into the datasets.xml format and then
emailed to the ERDDAP™ administrator (you) and written (appended) to
<i>bigParentDirectory</i>/logs/dataProviderForm.log .
Thus, the form semi-automates the process of getting a dataset into ERDDAP,
but the ERDDAP™ administrator still has to complete the datasets.xml chunk
and deal with getting the data file(s) from the provider or connecting to the database.
<p>The submission of actual data files from external sources is a huge security risk,
so ERDDAP™ does not deal with that. You have to figure out a solution that
works for you and the data provider, for example, email (for small files),
pull from the cloud (for example, DropBox or Google Drive),
an sftp site (with passwords), or sneakerNet (a USB thumb drive or external hard drive).
You should probably only accept files from people you know.
You will need to scan the files for viruses and take other security precautions.
<p>There isn't a link in ERDDAP™ to the Data Provider Form
(for example, on the ERDDAP™ home page).
Instead, when someone tells you they want to have their data served by your ERDDAP,
you can send them an email saying something like:
<br><kbd>Yes, we can get your data into ERDDAP. To get started,
please fill out the form at https://<i>yourUrl</i>/erddap/dataProviderForm.html (or http:// if https:// isn't enabled).
<br>After you finish, I'll contact you to work out the final details.
</kbd>
<br>If you just want to look at the form (without filling it out),
you can see the form on ERD's ERDDAP:
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm.html"
>Introduction</a>,
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm1.html"
>Part 1</a>,
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm2.html"
>Part 2</a>,
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm3.html"
>Part 3</a>, and
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/dataProviderForm4.html"
>Part 4</a>.
These links on the ERD ERDDAP™ send information to me, not you, so don't submit
information with them unless you actually want to add data to the ERD ERDDAP.
<p>If you want to remove the Data Provider Form from your ERDDAP™, put
<br><kbd><dataProviderFormActive>false</dataProviderFormActive></kbd>
<br>in your setup.xml file.
<p>The impetus for this was NOAA's 2014 <a rel="help"
href="https://www.glerl.noaa.gov/review2016/reviewer_docs/NOAA_PARR_Plan_v5.04.pdf"
>Public Access to Research Results (PARR) directive<img
src="../images/external.png" alt=" (external link)"
title="This is a link to an external website."/></a>,
which requires that all NOAA environmental data funded through taxpayer dollars
be made available via a data service (not just files) within 12 months of creation.
So there is increased interest in using ERDDAP™ to make datasets available via a service ASAP.
We needed a more efficient way to deal with a large number of data providers.
<p>Feedback/Suggestions?
This form is new, so please email erd dot data at noaa dot gov
if you have any feedback or suggestions for improving this.
<p><a class="selfLink" id="Tools" href="#Tools" rel="bookmark"><strong>Tools</strong></a>
<br>ERDDAP™ comes with two command line programs which are tools to help you create the XML
for each dataset that you want your ERDDAP™ to serve.
Once you have set up ERDDAP™ and run it (at least one time),
you can find and use these programs in the <i>tomcat</i>/webapps/erddap/WEB-INF directory.
There are Linux/Unix shell scripts (with the extension .sh) and
Windows scripts (with the extension .bat) for each program.
[On Linux, run these tools as the same user (tomcat?) that will run Tomcat.]
When you run each program, it will ask you questions.
For each question, type a response, then press Enter.
Or press ^C to exit a program at any time.
<p><a class="selfLink" id="OldVersionOfJava" href="#OldVersionOfJava" rel="bookmark">Program won't run?</a>
<ul>
<li>If you get an <kbd>unknown program</kbd> (or similar) error message,
the problem is probably that the operating system couldn't find Java.
You need to figure out where Java is on your computer, then
edit the java reference in the .bat or .sh file that you are trying to use.
<li>If you get a <kbd>jar file not found</kbd> or <kbd>class not found</kbd>
error message, then Java couldn't find one of the classes listed in the
.bat or .sh file you are trying to use. The solution is to figure out
where that .jar file is, and edit the java reference to it in the .bat or .sh file.
<li>If you are using a version of Java that is too old for a program,
the program won't run and you will see an error message like
<br><kbd>Exception in thread "main" java.lang.UnsupportedClassVersionError:
<br><i>some/class/name</i>: Unsupported major.minor version <i>someNumber</i></kbd>
<br>The solution is to update to the most recent version of Java
and make sure the .sh or .bat file for the program is using it.
</ul>
<p><a class="selfLink" id="ErrorVsWarning" href="#ErrorVsWarning" rel="bookmark">The tools print various diagnostic messages:</a>
<ul>
<li>The word "ERROR" is used when something went so wrong that the procedure failed to complete.
Although it is annoying to get an error, the error forces you to deal with the problem.
<li>The word "WARNING" is used when something went wrong, but the procedure was able to be completed.
These are pretty rare.
<li>Anything else is just an informative message.
You can add <kbd>-verbose</kbd> to the
<a rel="help" href="#GenerateDatasetsXml">GenerateDatasetsXml</a> or
<a rel="help" href="#DasDds">DasDds</a>
command line to get
additional informative messages, which sometimes helps solve problems.
</ul>
<p>The two tools are a big help, but you still must read all of these instructions
on this page carefully and make important decisions yourself.
<ul>
<li><a class="selfLink" id="GenerateDatasetsXml" href="#GenerateDatasetsXml" rel="bookmark"><strong>GenerateDatasetsXml</strong></a>
is a command line program that can generate a rough draft
of the dataset XML for almost any type of dataset.
<p>We STRONGLY RECOMMEND that you use GenerateDatasetsXml instead of creating
chunks of datasets.xml by hand because:
<ul>
<li>GenerateDatasetsXml works in seconds. Doing this by hand is at least an hour's work,
even when you know what you're doing.
<li>GenerateDatasetsXml does a better job.
Doing this by hand requires extensive knowledge of
how ERDDAP™ works. It is unlikely that you will do a better job by hand.
(Bob Simons always uses GenerateDatasetsXml for the first draft, and he wrote ERDDAP.)
<li>GenerateDatasetsXml always generates a valid chunk of datasets.xml.
Any chunk of datasets.xml
that you write will probably have at least a few errors that prevent
ERDDAP™ from loading the dataset.
It often takes people hours to diagnose these problems.
Don't waste your time. Let GenerateDatasetsXml do the hard work.
Then you can refine the .xml by hand if you want.
</ul>
<p>When you use the GenerateDatasetsXml program:
<ul>
<li>On Windows, the first time you run GenerateDatasetsXml, you need to edit the
GenerateDatasetsXml.bat file with a text editor to change the path to the java.exe file
so that Windows can find Java.
<li>GenerateDatasetsXml first asks you to specify the EDDType
(Erd Dap Dataset Type)
of the dataset. See the
<a rel="help" href="#datasetTypes">List of Dataset Types</a>
(in this document)
to figure out which is type appropriate for the dataset you are working on.
In addition to the regular EDDTypes, there are also a few
<a rel="help" href="#SpecialPseudoDatasetTypes">Special/Pseudo Dataset Types</a>
(e.g., one which crawls a THREDDS catalog to generate a chunk of
datasets.xml for each of the datasets in the catalog).
<li>GenerateDatasetsXml then asks you a series of questions
specific to that EDDType.
The questions gather the information needed for ERDDAP™ to access the
dataset's source.
To understand what ERDDAP™ is asking for,
see the documentation for the EDDType that you specified
by clicking on the same dataset type in the
<a rel="help" href="#datasetTypes">List of Dataset Types</a>.
<p>If you need to enter a string with special characters (e.g.,
whitespace characters at the beginning or end, non-ASCII characters),
enter a
<a rel="help" href="https://www.json.org/json-en.html" >JSON-style string<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>
(with special characters escaped with \ characters).
For example, to enter just a tab character, enter "\t" (with the surrounding double quotes,
which tell ERDDAP™ that this is a JSON-style string.
<li>Often, one of your answers won't be what GenerateDatasetsXml needs.
You can then try again, with revised answers to the questions,
until GenerateDatasetsXml can successfully find and understand the source data.
<li>If you answer the questions correctly (or sufficiently correctly),
GenerateDatasetsXml will connect
to the dataset's source and gather basic information (for example, variable names and metadata).
<br>For datasets that are from local NetCDF .nc and related files,
GenerateDatasetsXml will often print the ncdump-like structure of the
file after it first reads the file. This may give you information
to answer the questions better on a subsequent loop through GenerateDatasetsXml.
<li>GenerateDatasetsXml will then generate a rough draft of the dataset XML for that dataset.
<li>Diagnostic information and the rough draft of the dataset XML will be written to
<i>bigParentDirectory</i>/logs/GenerateDatasetsXml.log .
<li>The rough draft of the dataset XML will be written to
<i>bigParentDirectory</i>/logs/GenerateDatasetsXml.out .
<li><a class="selfLink" id="GenerateDatasetsXml_0Files" href="#GenerateDatasetsXml_0Files" rel="bookmark">"0 files" Error Message</a>
<br>If you run GenerateDatasetsXml or
<a rel="help" href="#DasDds">DasDds</a>,
or if you try to load an
EDDGridFrom...Files or EDDTableFrom...Files dataset in ERDDAP™,
and you get a "0 files" error message indicating that
ERDDAP™ found 0 matching files in the directory
(when you think that there are matching files in that directory):
<ul>
<li>Check that you have specified the full name of the directory.
And if you specified the sample filename, make sure you specified
the file's full name, including the full directory name.
<li>Check that the files really are in that directory.
<li>Check the spelling of the directory name.
<li>Check the fileNameRegex. It's really, really easy to make mistakes with regexes.
For test purposes, try the regex .* which should match all filenames.
(See this <a rel="help"
href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/regex/Pattern.html"
>regex documentation<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>
and
<a rel="help" href="https://www.vogella.com/tutorials/JavaRegularExpressions/article.html"
>regex tutorial<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>.)
<li>Check that the user who is running the program (e.g., user=tomcat (?) for Tomcat/ERDDAP)
has 'read' permission for those files.
<li>In some operating systems (for example, SELinux) and depending on system settings,
the user who ran the program must have 'read' permission for the
whole chain of directories leading to the directory that has the files.
</ul>
<li>If you have problems that you can't solve,
<a rel="help" href="#diagnoseProblems">send an email to Bob</a> with as much
information as possible.
Similarly, if it seems like the appropriate EDDType for a given dataset
doesn't work with that dataset, or if there is no appropriate EDDType,
please send an email to Bob with the details (and a sample file if relevant).
<br>
<li><a class="selfLink" id="EditingGDXOutput" href="#EditingGDXOutput" rel="bookmark"
><strong>You need to edit the output from GenerateDatasetsXml to make it better.</strong></a>
<br>
<ul>
<li>DISCLAIMER:
<br>THE CHUNK OF datasets.xml MADE BE GenerateDatasetsXml ISN'T PERFECT.
YOU MUST READ AND EDIT THE XML BEFORE USING IT IN A PUBLIC ERDDAP.
GenerateDatasetsXml RELIES ON A LOT OF RULES-OF-THUMB WHICH AREN'T ALWAYS CORRECT.
YOU ARE RESPONSIBLE FOR ENSURING THE CORRECTNESS OF THE XML THAT YOU
ADD TO ERDDAP'S datasets.xml FILE.
<p>(Fun Fact: I'm not shouting. For historical legal reasons, disclaimers must be written in all caps.)
<p>The output of GenerateDatasetsXml is a rough draft.
<br>You will almost always need to edit it.
<br>We've made and continue to make a huge effort
to make the output as ready-to-go as possible, but there are limits.
Often, needed information is simply not available from the source metadata.
<p>A fundamental problem is that we're asking a computer program (GenerateDatasetsXml)
to do a task where, if you gave the same task to 100 people,
you would get 100 different results. There is no single "right" answer.
Obviously, the program comes closest to reading Bob's mind (not yours),
but even so, it isn't an all-understanding AI program,
just a bunch of heuristics cobbled together to do an AI-like task.
(That day of an all-understanding AI program may come, but it hasn't yet.
If/when it does, we humans may have bigger problems. Be careful what you wish for.)
<li>For informational purposes, the output shows the global
sourceAttributes and variable sourceAttributes as comments.
ERDDAP™ combines sourceAttributes and addAttributes (which have
precedence) to make the combinedAttributes that are shown to the user.
(And other attributes are automatically added to longitude, latitude,
altitude, depth, and time variables when ERDDAP™ actually makes the dataset).
<br>
<li>If you don't like a sourceAttribute, overwrite it by adding an
addAttribute with the same name but a different value
(or no value, if you want to remove it).
<br>
<li>All of the addAttributes are computer-generated suggestions. Edit them!
If you don't like an addAttribute, change it.
<br>
<li>If you want to add other addAttributes, add them.
<br>
<li>If you want to change a destinationName, change it.
But don't change sourceNames.
<br>
<li>You can change the order of the dataVariables or remove any of them.
<br>
</ul>
<li>You can then use <a rel="help" href="#DasDds">DasDds</a>
(see below) to repeatedly test the XML for that dataset
to ensure that the resulting dataset appears as you want it to in ERDDAP.
<li>Feel free to make small changes to the datasets.xml chunk that was generated,
for example, supply a better <kbd>infoUrl, summary,</kbd> or <kbd>title</kbd>.
<li><a class="selfLink" id="doNotAddStandardNames" href="#doNotAddStandardNames" rel="bookmark">-doNotAddStandardNames</a> --
If you include <kbd>-doNotAddStandardNames</kbd>
as a command line parameter when you run generateDatasetsXml,
generateDatasetsXml will not add <kbd>standard_name</kbd> to the <kbd>addAttributes</kbd>
for any variables other than variables named <kbd>latitude, longitude, altitude, depth</kbd> or
<kbd>time</kbd> (which have obvious standard_names).
This can be useful if you are using the output from generateDatasetsXml directly in
ERDDAP™ without editing the output, because generateDatasetsXml often guesses
standard_names incorrectly. (Note that we always recommend that you
do edit the output before using it in ERDDAP.) Using this parameter
will have other minor related effects because the guessed standard_name
is often used for other purposes, e.g., to create a new long_name,
and to create the colorBar settings.
<li><a class="selfLink" id="ScriptingGenerateDatasetsXml" href="#ScriptingGenerateDatasetsXml" rel="bookmark">Scripting:</a>
As an alternative to answering the questions interactively
at the keyboard and looping to generate additional datasets,
you can provide command line arguments to answer
all of the questions to generate one dataset.
GenerateDatasetsXml will process those parameters,
write the output to the output file, and exit the program.
<p>To set this up, first use the program in interactive mode
and write down your answers. Here's a partial example:
<br>Let's say you run the script: <kbd>./GenerateDatasetsXml.sh</kbd>
<br>Then enter: <kbd>EDDTableFromAsciiFiles</kbd>
<br>Then enter: <kbd>/u00/data/</kbd>
<br>Then enter: <kbd>.*\.asc</kbd>
<br>Then enter: <kbd>/u00/data/sampleFile.asc</kbd>
<br>Then enter: <kbd>ISO-8859-1</kbd>
<p>To run this in a non-interactive way,
use this command line:
<br><kbd>./GenerateDatasetsXml.sh EDDTableFromAsciiFiles /u00/data/ .*\.asc /u00/data/sampleFile.asc
ISO-8859-1</kbd>
<br>So basically, you just list all the answers on the command line.
<br>This should be useful for datasets that change frequently in a way that
necessitates re-running GenerateDatasetsXml (notably EDDGridFromThreddsCatalog).
<p>Details:
<ul>
<li>If a parameter contains a space or some special character, then encode the
parameter as a
<a rel="help" href="https://www.json.org/json-en.html" >JSON-style string<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>, e.g.,
<kbd>"my parameter with spaces and two\nlines"</kbd>.
<li>If you want to specify an empty string as a parameter, use: <kbd>nothing</kbd>
<li>If you want to specify the default value of a parameter, use: <kbd>default</kbd>
<br>
</ul>
<li>GenerateDatasetsXml supports a -i<i>datasetsXmlName</i>#<i>tagName</i>
command line parameter which inserts the output into the specified datasets.xml file
(the default is <i>tomcat</i>/content/erddap/datasets.xml).
GenerateDatasetsXml looks for two lines in datasetsXmlName:
<br><kbd><!-- Begin GenerateDatasetsXml #<i>tagName someDatetime</i> --></kbd>
<br>and
<br><kbd><!-- End GenerateDatasetsXml #<i>tagName someDatetime</i> --></kbd>
<br>and replaces everything in between those lines with the new content, and changes the someDatetime.
<ul>
<li>The -i switch is only processed (and changes to datasets.xml are only made)
if you run GenerateDatasetsXml with command line arguments which specify all
the answers to all of the questions for one loop of the program. (See 'Scripting' above.)
(The thinking is: This parameter is for use with scripts.
If you use the program in interactive mode (typing info on the keyboard), you are
likely to generate some incorrect chunks of XML before you generate
the one you want.)
<li>If the Begin and End lines are not found, then those lines and the new content
are inserted right before </erddapDatasets>.
<li>There is also a -I (capital i) switch for testing purposes which
works the same as -i,
but creates a file called datasets.xml<i>DateTime</i> and doesn't make
changes to datasets.xml.
<li>Don't run GenerateDatasetsXml with -i in two processes at once.
There is a chance only one set of changes will be kept.
There may be serious trouble (for example, corrupted files).
</ul>
</ul>
If you use "GenerateDatasetsXml -verbose", it will print more diagnostic messages than usual.
<p><a class="selfLink" id="SpecialPseudoDatasetTypes" href="#SpecialPseudoDatasetTypes" rel="bookmark">Special/Pseudo Dataset Types</a>
<br>In general, the EDDType options in GenerateDatasetsXml
match of the EDD types described in this document
(see the
<a rel="help" href="#datasetTypes">List of Dataset Types</a>)
and
generate one datasets.xml chunk to create one dataset from one specific data source.
There are a few exceptions and special cases:
<ul>
<li>EDDGridFromErddap
<br>This EDDType generates all of the datasets.xml chunks needed to make
<a rel="help" href="#EDDGridFromErddap">EDDGridFromErddap</a> datasets
from all of the EDDGrid datasets in a remote ERDDAP.
You will have the option of keeping the original datasetIDs
(which may duplicate some datasetIDs already in your ERDDAP)
or generating new names which will be unique (but usually aren't as human-readable).
<br>
<li>EDDTableFromErddap
<br>This EDDType generates all of the datasets.xml chunks needed to make
<a rel="help" href="#EDDTableFromErddap">EDDTableFromErddap</a> datasets
from all of the EDDTable datasets in a remote ERDDAP.
You will have the option of keeping the original datasetIDs
(which may duplicate some datasetIDs already in your ERDDAP)
or generating new names which will be unique (but usually aren't as human-readable).
<br>
<li><a class="selfLink" id="EDDGridFromThreddsCatalog" href="#EDDGridFromThreddsCatalog" rel="bookmark">EDDGridFromThreddsCatalog</a>
<br>This EDDType generates all of the datasets.xml chunks needed for all of
the <a rel="help" href="#EDDGridFromDap">EDDGridFromDap</a> datasets
that it can find by crawling recursively through a THREDDS (sub) catalog.
There are many forms of THREDDS catalog URLs.
This option REQUIRES a THREDDS .xml URL with <kbd>/catalog/</kbd> in it, for example,
<br>https://oceanwatch.pfeg.noaa.gov/thredds/catalog/catalog.xml or
<br>https://oceanwatch.pfeg.noaa.gov/thredds/catalog/Satellite/aggregsatMH/chla/catalog.xml
<br>(a related .html catalog is at
<br>https://oceanwatch.pfeg.noaa.gov/thredds/Satellite/aggregsatMH/chla/catalog.html
, which is not acceptable for EDDGridFromThreddsCatalog).
<br>If you have problems with EDDGridFromThreddsCatalog:
<ul>
<li>Make sure the URL you are using is valid, includes <kbd>/catalog/</kbd>,
and ends with /catalog.xml .
<li>If possible, use a public IP address (for example, https://oceanwatch.pfeg.noaa.gov)
in the URL, not a local numeric IP address (for example, https://12.34.56.78).
If the THREDDS is only accessible via the local numeric IP address, you can use
<a rel="help" href="#convertToPublicSourceUrl"><kbd><convertToPublicSourceUrl></kbd></a>
so ERDDAP™ users see the public address, even though ERDDAP™ gets data from the
local numeric address.
<li>If you have problems that you can't solve,
<a rel="help" href="#diagnoseProblems">send an email to Bob</a> with as much
information as possible.
<li>The low level code for this now uses the Unidata netcdf-java
catalog crawler code (thredds.catalog classes)
so that it can handle all THREDDS catalogs
(which can be surprisingly complex)
Thanks to Unidata for that code.
<br>
</ul>
<li><a class="selfLink" id="EDDGridLonPM180FromErddapCatalog" href="#EDDGridLonPM180FromErddapCatalog" rel="bookmark">EDDGridLonPM180FromErddapCatalog</a>
<br>This EDDType generates the datasets.xml to make
<a rel="help" href="#EDDGridLonPM180">EDDGridLonPM180</a> datasets
from all of the EDDGrid datasets in an ERDDAP
that have any longitude values greater than 180.
<ul>
<li>If possible, use a public IP address (for example, https://oceanwatch.pfeg.noaa.gov)
in the URL, not a local numeric IP address (for example, https://12.34.56.78).
If the ERDDAP™ is only accessible via the local numeric IP address, you can use
<a rel="help" href="#convertToPublicSourceUrl"><kbd><convertToPublicSourceUrl></kbd></a>
so ERDDAP™ users see the public address, even though ERDDAP™ gets data from the
local numeric address.
<br>
</ul>
<li><a class="selfLink" id="EDDGridLon0360FromErddapCatalog" href="#EDDGridLon0360FromErddapCatalog" rel="bookmark">EDDGridLon0360FromErddapCatalog</a>
<br>This EDDType generates the datasets.xml to make
<a rel="help" href="#EDDGridLon0360">EDDGridLon0360</a> datasets
from all of the EDDGrid datasets in an ERDDAP
that have any longitude values less than 0.
<ul>
<li>If possible, use a public IP address (for example, https://oceanwatch.pfeg.noaa.gov)
in the URL, not a local numeric IP address (for example, https://12.34.56.78).
If the ERDDAP™ is only accessible via the local numeric IP address, you can use
<a rel="help" href="#convertToPublicSourceUrl"><kbd><convertToPublicSourceUrl></kbd></a>
so ERDDAP™ users see the public address, even though ERDDAP™ gets data from the
local numeric address.
<br>
</ul>
<li><a class="selfLink" id="EDDsFromFiles" href="#EDDsFromFiles" rel="bookmark">EDDsFromFiles</a>
<br>Given a start directory,
this traverses the directory and all subdirectories and tries
to create a dataset for each group of data files that it finds.
<ul>
<li>This assumes that when a dataset is found, the dataset includes all
subdirectories.
<li>If a dataset is found, similar sibling directories will be treated
as separate datasets
(for example, directories for the 1990's, the 2000's, the 2010's,
will generate separate datasets).
They should be easy to combine by hand -- just change the
first dataset's <kbd><fileDir></kbd> to the parent directory and delete all the
subsequent sibling datasets.
<li>This will only try to generate a chunk of datasets.xml for the most
common type of file extension in a directory (not counting .md5, which is ignored).
So, given a directory with 10 .nc files and 5 .txt files,
a dataset will be generated for the .nc files only.
<li>This assumes that all files in a directory with the same extension
belong in the same dataset. If a directory has some .nc files with SST data
and some .nc files with chlorophyll data, just one sample .nc
file will be read (SST? chlorophyll?) and just one dataset
will be created for that type of file. That dataset will probably fail
to load because of complications from trying to load two types
of files into the same dataset.
<li>If there are fewer than 4 files with the most common extension in a directory,
this assumes that they aren't data files and just skips the directory.
<li>If there are 4 or more files in a directory,
but this can't successfully generate a chunk of datasets.xml for the files
(for example, an unsupported file type), this will generate an
<a rel="help" href="#EDDTableFromFileNames">EDDTableFromFileNames</a>
dataset for the files.
<li>At the end of the diagnostics that this writes to the log file, just before
the datasets.xml chunks, this will print a table with a summary of information
gathered by traversing all the subdirectories. The table will
list every subdirectory and indicate the most common type of file extension,
the total number of files, and which type of dataset was created for
these files (if any). If you are faced with a complex, deeply nested
file structure, consider running GenerateDatasetsXml with EDDType=EDDsFromFiles
just to generate this information,
<li>This option may not do a great job of guessing the best EDDType for a given
group of data files, but it is quick, easy, and worth a try.
If the source files are suitable, it works well and is a good first
step in generating the datasets.xml for a file system with lots of
subdirectories, each with data files from different datasets.
<br>
</ul>
<li><a class="selfLink" id="EDDTableFromEML" href="#EDDTableFromEML" rel="bookmark">EDDTableFromEML and EDDTableFromEMLBatch</a>
<br>These special EDDType generates the datasets.xml to make an
<a rel="help" href="#EDDTableFromAsciiFiles">EDDTableFromAsciiFiles</a> dataset
from each of the tables described in an
<a rel="help"
href="https://knb.ecoinformatics.org/external//emlparser/docs/index.html"
>Ecological Metadata Language<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>
XML file.
The "Batch" variant works on all of the EML files in a local or remote directory.
Please see the separate
<a rel="help"
href="https://erddap.github.io/EDDTableFromEML.html"
>documentation for EDDTableFromEML</a>.
<br>
<li><a class="selfLink" id="EDDTableFromInPort" href="#EDDTableFromInPort" rel="bookmark">EDDTableFromInPort</a>
<br>This special EDDType generates the datasets.xml to make an
<a rel="help" href="#EDDTableFromAsciiFiles">EDDTableFromAsciiFiles</a> dataset
from the information in an
<a rel="help" href="https://inport.nmfs.noaa.gov/inport"
>inport-xml<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>
file.
If you can get access to the source data file (the inport-xml file should
have clues for where to find it), you can make a working dataset in ERDDAP.
<p>The following steps outline how to use GenerateDatasetsXml
with an inport-xml file in order to get a working dataset in ERDDAP.
<ol>
<li>Once you have access to the inport-xml file (either as a URL or a local file):
run GenerateDatasetsXml, specify <kbd>EDDType=EDDTableFromInPort</kbd>,
specify the inport-xml URL or full filename,
specify <kbd>whichChild=0</kbd>, and specify the other requested information (if known).
(At this point, you don't need to have the source data file or specify its name.)
The <kbd>whichChild=0</kbd> setting tells GenerateDatasetsXml to
write out the information for <strong>all</strong> of the
<entity-attribute-information><entity>'s in the inport-xml file
(if there are any).
It also prints out a <kbd>Background</kbd> information summary, including
all of the <kbd>download-url</kbd>'s listed in the inport-xml file.
<li>Look through all that information (including the <kbd>Background</kbd>
information that GenerateDatasetsXml prints)
and visit the <kbd>download-url</kbd>(s)
in order to try to find the source data file(s).
If you can find it(them), download it(them) into a directory that is
accessible to ERDDAP.
(If you can't find any source data files, there is no point in proceeding.)
<li>Run GenerateDatasetsXml again.
<br>If the source data file
corresponds to one of the inport-xml file's
<entity-attribute-information><entity>'s,
specify <kbd>whichChild=<i>thatEntity'sNumber</i></kbd> (e.g., 1, 2, 3, ...).
ERDDAP™ will try to match the column names in the source data file
to names in the entity information, and prompt to accept/reject/fix
any discrepancies.
<br>Or, if the inport-xml file doesn't have any
<entity-attribute-information><entity>'s, specify <kbd>whichChild=0</kbd>.
<li>In the chunk of datasets.xml that was made by GenerateDatasetsXml,
revise the
<a rel="help" href="#globalAttributes">global <addAttributes></a>
as needed/desired.
<li>In the chunk of datasets.xml that was made by GenerateDatasetsXml,
add/revise the
<a rel="help" href="#dataVariable"><dataVariable></a>
information as needed/desired to describe each of the variables.
Be sure you properly identify each variable's
<br><a rel="help" href="#sourceName"><sourceName></a>
(as it appears in the source),
<br><a rel="help" href="#destinationName"><destinationName></a>
(which has more limitations on allowed characters than sourceName),
<br><a rel="help" href="#units"><units></a>
(especially if it is a
<a rel="help" href="#timeStampVariable">time or timestamp variable</a>
where the units need to
specify the format), and
<br><a rel="help" href="#missing_value"><missing_value></a>,
<li>When you are close to finishing, repeatedly use the
<a rel="help" href="#DasDds">DasDds</a>
tool to quickly see if the dataset description is valid and if
the dataset will appear in ERDDAP™ as you want it to.
<br>
</ol>
It would be great if groups using InPort to document their datasets
would also use ERDDAP™ to make the actual data available:
<ul>
<li>ERDDAP™ is a solution that can be used right now so you can fulfill NOAA's
<a rel="help"
href="https://nosc.noaa.gov/EDMC/PD.DSP.php"
>Public Access to Research Results (PARR) requirements</a>
right now, not at some vague time in the future.
<li>ERDDAP™ makes the actual data available to users, not just the metadata.
(What good is metadata without data?)
<li>ERDDAP™ supports metadata (notably, the units of variables),
unlike some other data server software being considered.
(What good is data without metadata?)
To use software that doesn't support metadata is to invite the data to be
misunderstood and misused.
<li>ERDDAP™ is free and open-source software
unlike some other software being considered.
Ongoing development of ERDDAP™ is already paid for.
Support for ERDDAP™ users is free.
<li>ERDDAP's appearance can be easily customized to reflect
and highlight your group (not ERD or ERDDAP).
<li>ERDDAP™ offers a consistent way to access all datasets.
<li>ERDDAP™ can read data from many types of data files and from relational
databases.
<li>ERDDAP™ can deal with large datasets, including datasets where
the source data is in many data files.
<li>ERDDAP™ can write data to many types of data files, at the user's request,
including scientific data file types like netCDF, ESRI .csv, and ODV .txt.
<li>ERDDAP™ can make custom graphs and maps of subsets of the data,
based on the user's specifications.
<li>ERDDAP™ can deal with non-data datasets such as collections
of image, video, or audio files.
<li>ERDDAP™ has been installed and used at
<a rel="bookmark"
href="https://erddap.github.io/setup.html#organizations"
>more than 60 institutions around the world</a>.
<li>ERDDAP™ is listed as one of the data servers recommended for use within NOAA
in the
<a rel="bookmark"
href="https://www.ngdc.noaa.gov/wiki/index.php/Data_Access_Technical_Recommendations#Software_implementations">NOAA Data Access Procedural Directive<img
src="../images/external.png" alt=" (external link)"
title="This is a link to an external website."/></a>,
unlike some other software being considered.
<li>ERDDAP™ is a product of NMFS/NOAA, so using it within NMFS and NOAA
should be a point of pride for NMFS and NOAA.
</ul>
Please give ERDDAP™ a try. If you need help, please post a message in the ERDDAP™ Google group.
<br>
<li>addFillValueAttributes
<br>This special EDDType option isn't a dataset type. It is a tool
which can add _FillValue attributes to some variables in some datasets.
See
<a href="#addFillValueAttributes" rel="bookmark">addFillValueAttributes</a>.
<br>
<li><a class="selfLink" id="findDuplicateTime" href="#findDuplicateTime" rel="bookmark">findDuplicateTime</a>
<br>This special EDDType option isn't a dataset type.
Instead, it tells GenerateDatasetsXml to search through a collection of
gridded .nc (and related) files to find and print out a list of files with duplicate time values.
When it looks at the time values, it converts them from the original units
to "seconds since 1970-01-01" in case different files use different units strings.
You need to provide the starting directory (with or without the trailing slash),
the file name regular expression (e.g., .*\.nc ), and the name of the
time variable in the files.
<br>
<li><a class="selfLink" id="ncdump" href="#ncdump" rel="bookmark">ncdump</a>
<br>This special EDDType option isn't a dataset type.
Instead, it tells GenerateDatasetsXml to print an
<a rel="help"
href="https://linux.die.net/man/1/ncdump"
>ncdump<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>-like
printout of an .nc, .ncml, or .hdf file.
It actually uses the netcdf-java's
<a rel="help"
href="https://docs.unidata.ucar.edu/netcdf-java/5.4/javadoc/ucar/nc2/write/Ncdump.html"
>NCdump<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>,
which is a more limited tool than the C version of NCdump.
If you use this option, GenerateDatasetsXml will ask you to use one
of the options: "-h" (header), "-c" (coordinate vars),
"-vall" (default), "-v var1;var2", "-v var1(0,0:10,0:20)".
This is useful because, without ncdump it is hard to know what is in
an .nc, .ncml, or .hdf file and thus which EDDType you should specify for GenerateDatasetsXml.
For an .ncml file, this will print the ncdump output for the result
of the .ncml file changes applied to the underlying .nc or .hdf file.
<br>
</ul>
<li><a class="selfLink" id="DasDds" href="#DasDds" rel="bookmark"><strong>DasDds</strong></a> is a command line program that you can use
after you have created a first attempt at the XML for a new dataset in datasets.xml.
With DasDds, you can repeatedly test and refine the XML.
When you use the DasDds program:
<ol>
<li>On Windows, the first time you run DasDds, you need to edit the
DasDds.bat file with a text editor to change the path to the java.exe file
so that Windows can find Java.
<li>DasDds asks you for the datasetID for the dataset you are working on.
<li>DasDds tries to create the dataset with that datasetID.
<ul>
<li>DasDds always prints lots of diagnostic messages.
<br>If you use "DasDds -verbose", DasDds will print more diagnostic messages than usual.
<li>For safety, DasDds always deletes all of the cached dataset information (files)
for the dataset before trying
to create the dataset.
This is the equivalent of setting a
<a rel="help" href="https://erddap.github.io/setup.html#hardFlag">hard flag</a>
So for aggregated datasets, you might want to adjust the
fileNameRegex temporarily to limit the number of files the data constructor finds.
<li>If the dataset fails to load (for whatever reason),
DasDds will stop and show you the error message for the first error it finds.
<br><strong>Don't try to guess what the problem might be. Read the ERROR message carefully.</strong>
<br>If necessary, read the preceding diagnostic messages to find more clues and information, too.
<li><strong>Make a change to the dataset's XML to try to solve THAT problem</strong>
<br>and let DasDds try to create the dataset again.
<li><strong>If you repeatedly solve each problem, you will eventually solve all the problems</strong>
<br>and the dataset will load.
</ul>
<li>All DasDds output (diagnostics and results) are written to the screen and to
<i>bigParentDirectory</i>/logs/DasDds.log .
<li>If DasDds can create the dataset, DasDds will then show you the
<a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html#fileType_das"
>.das (Dataset Attribute Structure)</a>,
<a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html#fileType_dds"
>.dds (Dataset Descriptor Structure)</a>, and
<a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html#timeGaps"
>.timeGaps (time gaps)</a>
information for the dataset on your screen and write them to
<i>bigParentDirectory</i>/logs/DasDds.out .
<li>Often, you will want to make some small
change to the dataset's XML to clean up the dataset's metadata and rerun DasDds.
</ol>
</ul>
<a class="selfLink" id="ERDDAPlint" href="#ERDDAPlint" rel="bookmark"><strong>Bonus Third-Party Tool: ERDDAP-lint</strong></a>
<br>ERDDAP-lint is a program from Rob Fuller and Adam Leadbetter of the Irish Marine Institute
that you can use to improve the metadata of your ERDDAP™ datasets.
ERDDAP-lint "contains rules and a simple static web application for running
some verification tests against your ERDDAP™ server. All the tests are run in the web browser."
Like the
<a rel="help"
href="https://en.wikipedia.org/wiki/Lint_(software)"
>Unix/Linux lint tool<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>,
you can edit the existing rules or add new rules.
See <a rel="help"
href="https://github.com/IrishMarineInstitute/erddap-lint"
>ERDDAP-lint<img
src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>
for more information.
<p>This tool is especially useful for datasets that you created some time ago
and now want to bring up-to-date with your current metadata preferences.
For example, early versions of GenerateDatasetsXml didn't put any effort
into creating global creator_name, creator_email, creator_type, or creator_url
metadata. You could use ERDDAP-lint to identify the datasets that lack
those metadata attributes.
<p>Thanks to Rob and Adam for creating this tool
and making it available to the ERDDAP™ community.
<br>
<p><strong><a class="selfLink" id="basicStructure" href="#basicStructure" rel="bookmark">The Basic Structure of the datasets.xml File</a></strong>
<br>The required and optional tags allowed in a datasets.xml file
(and the number of times they may appear) are shown below.
In practice, your datasets.xml will have lots of <dataset>'s tags and
only use the other tags within <erddapDatasets> as needed.
<pre>
<?xml version="1.0" encoding="ISO-8859-1" ?>
<erddapDatasets>
<a rel="help" href="#angularDegreeUnits"><angularDegreeUnits></a>...</angularDegreeUnits> <!-- 0 or 1 -->
<a rel="help" href="#angularDegreeTrueUnits"><angularDegreeTrueUnits></a>...</angularDegreeTrueUnits> <!-- 0 or 1 -->
<a rel="help" href="#cacheMinutes"><cacheMinutes></a>...</cacheMinutes> <!-- 0 or 1 -->
<a rel="help" href="#commonStandardNames"><commonStandardNames></a>...</commonStandardNames> <!-- 0 or 1 -->
<a rel="help" href="#convertInterpolateDatasetIDVariableExample"><convertInterpolateDatasetIDVariableExample /></a> <!-- 0 or more -->
<a rel="help" href="#convertInterpolateDatasetIDVariableList"><convertInterpolateDatasetIDVariableList /></a> <!-- 0 or more -->
<a rel="help" href="#convertToPublicSourceUrl"><convertToPublicSourceUrl /></a> <!-- 0 or more -->