-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
1089 lines (872 loc) · 87.6 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Geoconnex Structured Metadata Guidance"
title-block-banner: '#181868'
number-sections: true
website:
search:
type: overlay
format:
html:
toc: true
toc-depth: 7
toc-expand: 6
toc-title: Table of Contents
toc-location: left
code-tools: true
code-overflow: wrap
code-line-numbers: true
code-annotations: hover
# embed-resources: true
anchor-sections: true
link-external-newwindow: true
comments:
# utterances:
# repo: internetofwater/geoconnex-guidance
hypothesis: true
editor: visual
---
::: {align="center"}
<img src="images/geoconnex-logo.png" alt="Logo" width="600"/>
:::
# Introduction {#sec-introduction}
The Geoconnex project is about providing technical infrastructure and guidance to create an open, community-contribution model for a knowledge graph linking hydrologic features in the United States, published in accordance with [Spatial Data on the Web best practices](https://www.w3.org/TR/sdw-bp/) as an implementation of [Internet of Water](https://github.com/opengeospatial/SELFIE/blob/master/docs/demo/internet_of_water.md) principles. The development of geoconnex.us takes place on GitHub. See [here](https://github.com/internetofwater/about.geoconnex.us) for the system of repositories.
Geoconnex will allow data users to answer questions like: "What datasets are available about the portions of [**Colorado River**](https://geoconnex.us/ref/mainstems/29559 "This is an HTTP identifier in the geoconnex system to unambiguously denote the Colorado River") *upstream* of [**Hoover Dam**](https://geoconnex.us/ref/dams/1080095 "This an HTTP identifier in the Geoconnex system to unambiguously identify the Hoover Dam, in particular its location along the Colorado River") *within* **Nevada** and **Utah** regarding *variables* **discharge** and **total suspended solids** with measurements taken at least **daily** with *coverage* **between 2002 and 2007**?" and be returned metadata for all relevant datasets from all participating organizations, including federal, state, private, and NGO organizations.
See <https://geoconnex.us/demo> for a mockup of data discovery and access workflows that `https://geoconnex.us` aspires to enable.
Geoconnex rests on data providers publishing metadata to the system. Thus, Geoconnex involves the publication of Web Resources, which include structured, embedded metaadata that describe water datasets and the real-world environmental features (eg rivers, wells, dams, catchments) or the cataloging features (eg government jurisdiction areas, statiscal summary reporting areas) that they are relevant to. This document provides guidance, including general principles as well as specific templates, for data providers for how to structure this metadata using the JSON-LD format.
**Related materials, presentations, and publications**
[National Hydrography Infrastructure and Geoconnex](https://drive.google.com/file/d/1J0NKYOq3pGjQXr58FKO8sd7uHpGA8kNB/view?usp=sharing)
[New Mexico Water Data Initiative including geoconnex.us](https://docs.google.com/presentation/d/1yuNpBbQPcmb_Nw8DXiuNTazAjIM8UF7o/edit?usp=sharing&ouid=102421334323378854304&rtpof=true&sd=true)
[Roundtable presentation including geoconnex.us](https://www.westernstateswater.org/wp-content/uploads/2020/06/CO_Roundable_IoW.pdf)
[Second Environmental Linked Features Interoperability Experiment](https://github.com/opengeospatial/SELFIE)
[ESIP Sessions on Structured Data in the Web](https://2020esipsummermeeting.sched.com/event/cIvv/structured-data-on-the-web-putting-best-practice-to-work) [slides](https://docs.google.com/presentation/d/1LSXHz2_Y7hrkGZPC_sNoJWl8AIujI8AAWktl9amIR4E/edit#slide=id.g8250495469_1_30)
## Basic Information Model {#sec-infomodel}
The model used to organize information in the Geoconnex system is shown in @fig-info-model.
![Basic information model for resources in geoconnex](images/screenshot.png){#fig-info-model}
- **Data providers** refer to specific systems that publish water-related **datasets** on the web. Many times a provider will simply be the data dissemination arm of an organization, such as the [Reclamation Information Sharing Environment (RISE)](https://data.usbr.gov) of the US Bureau of Reclamation. Some organizations may have multiple data providers, such as US Geological Survey, which administers the [National Water Information System](https://waterdata.usgs.gov) as well as the [National Groundwater Monitoring Network](https://cida.usgs.gov/ngwmn/), among others. Some data providers are aggregators of other organizations' data, such as the [Hydrologic Information System](https://data.cuahsi.org) of CUAHSI.
- **Datasets** refer to specific collections of data that are published by data providers. In the context of Geoconnex, a single dataset generally refers to one that is collected from, or summarizable to, a specific spatial **location** on earth, as part of a specific activity. For example, a dataset would be the stage, discharge and water quality sensor data coming from a single stream gage, but not the collection of all stream gage readings from all stream gages operated by a given organization. A dataset could also be the time-series of a statistical summary of water use at the county level.
- **Locations** are specific locations on earth that datasets are collected from or about, such as stream gages, groundwater wells, and dams. In the case of data that is reported at a summary unit such as a state, county, or hydrologic unit code (HUC), these can also be considered Locations. Conceptually, multiple datasets from multiple providers can be about the same Location, as might occur when a USGS streamgage and a state DEQ water quality sampling site are both located at a specific bridge.
- **Hydrologic features** are elements of the water system that are related to locations. For example, a point may be on a river, which is within a watershed, and whose flow influences an aquifer. Each of these are distinct, identifiable features which many Locations are hydrologically related to, and which a user of a given dataset might also want to use.
- **Cataloging features** are areas on earth that commonly group datasets. They are a superset of summary features such as HUCs, counties and states. For example, a state-level dataset summarizing average annual surface water availability would not have states as a cataloging feature. However, streamgage is within a state, county, HUC, congressional district, etc and may be tagged with these features in metadata, and thus be filtered alongside other streamgages within the same state.
This Geoconnex guidance concerns how to explicitly publish metadata that describes Datasets how they are related to each of the other elements of the information model.
## JSON-LD Primer {#sec-primer}
JSON-LD is a version of JSON, the popular data exchange format used by web APIs, to express linked data. Linked Data is an approach to data publication that allows data from various sources to be easily integrated. JSON-LD accomplishes this by mapping terms from a source data system to a machine-readable definition of that term available on the web, allowing different attribute names from different data sources to be consistently interpreted together. Commonly, JSON-LD is embedded within websites, allowing search engines and applications to parse the information available from web addresses (URLs). For an in-depth exploration and multimedia resources, refer to the [JSON-LD official site](https://json-ld.org) and its [learning section](https://json-ld.org/learn.html). JSON-LD documents should be embedded in the HTML of websites using script headers. A brief overview of the JSON-LD format follows below.
Below is an example JSON-LD document as embedded in a `<script>` division within a `<head>` or `<body>` section of an HTML page, with an explanation of its major elements.
``` json
<script type="application/ld+json">
{
"@context": {
"@vocab": "https://schema.org/",
"ex": "https://example.com/schema/",
"locType": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType"
},
"@id": "https://example.com/well/1234",
"@type": "schema:Place",
"name": "Well 1234",
"description": "Well at 1234 Place St., USA",
"locType": "well",
"subjectOf": {
"@id": "https://datasystem.org/dataset1",
"@type": "schema:Dataset",
"name": "Well Locations Dataset",
"ex:recordCount": 500
}
}
<script>
```
**`<script type="application/ld+json">`, `<script>`** These are immutable HTML elements that tell machines to interpret everything between them as JSON-LD.
**`@context`** The `@context` keyword in JSON-LD sets the stage for interpreting the data by mapping terms to IRIs (Internationalized Resource Identifiers). By doing so, properties and values are clearly defined and identified. Our updated example has two contexts:
- `@vocab`: Sets the default document vocabulary to `https://schema.org/`, which is a standard vocabulary for web-based structured data. This means that in general, attributes in the document will be assumed to have `https://schema.org/` as a prefix, so JSON-LD parsers will map `name` to <https://schema.org/name>
- `ex`: This is a custom context prefix representing `https://example.com/schema/`, signifying specific extensions or custom data definitions specific to our website. The prefix can be used on other attributes so that JSON-LD parsers do the appropriate mapping. Thus, `ex:name` will be parsed as `https://example.com/schema/recordCount`.
- `locType`: This is a custom direct attribute mapping, specifying that this attribute exactly matches to the concept identified by this HTTP identifier <https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType>. Using this direct mapping approach allows data publishers to map their arbitrary terminology to any publicly accessibly and well-identified standard term.
**`@id`** The `@id` keyword furnishes a uniform resource identifier (URI) for subjects in the JSON-LD document, enabling the subjects to be interconnected with data elsewhere. In this example:
- Well 1234 has the identifier `https://example.com/well/1234`.
- The dataset that it is about, "Well Locations Dataset", has its unique identifier as `https://datasystem.org/dataset1`.
**`@type`** The `@type` keyword stipulates the type or nature of the subject or node in the JSON-LD. It aids in discerning the entity being depicted. In the given context:
- Well 1234 is specified as a "Place" from the schema.org vocabulary (`schema:Place`).
- Well Locations Dataset's type is a "Dataset" from the schema.org vocabulary (`schema:Dataset`).
**Nodes** Nodes represent entities in JSON-LD, with each entity having properties associated with it. In the example:
- The main node is Well 1234, possessing properties like "name", "description", "locType", and "subjectOf".
- subjectOf property itself is a node representing a dataset that is about Well 1234. Apart from the "name" property, the dataset now also has a property called "ex:recordCount" (using the `ex:` prefix from `@context`) indicating the number of rows in the dataset. This extension showcases the flexibility and strength of JSON-LD, where you can seamlessly integrate standard vocabulary with custom definitions, ensuring rich and well-structured interconnected data representations. Below, you can see how JSON-LD tools would parse and standardize the JSON-LD in the example.
```{=html}
<iframe width="780" height="500" src="https://tinyurl.com/29qaectm" title="JSON-LD playground"></iframe>
```
## Geoconnex JSON-LD elements {#sec-jsonldelem}
A Geoconnex JSON-LD document should be embedded in a human-readable website that is about either a **Location** or a **Dataset**. Documents about **Locations** should ideally include references to relevant **Hydrologic Features**, **Cataloging Features**, and **Datasets**. Documents about **Datasets** *must* include references to one or more relevant Reference **Monitoring Locations** or **Hydrologic Features** or **Cataloging Features**, or declare their spatial coverage.
### Context {#sec-context}
Geoconnex JSON-LD documents can have varying contexts. However, there are several vocabularies other than `schema.org` that mqy be useful, depending on the type of location and dataset being described and the level of specificity for which metadata is produced by the data provider. The example context below can serve as general-purpose starting point, although simpler contexts may be sufficient for many documents:
``` json
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
}
```
- `@vocab` specifies [`schema`](https://schema.org/) as the default vocabulary from https://schema.org
- [`xsd`](https://www.w3.org/TR/xmlschema-2/) is a general web-enabled data types vocabulary (e.g., text vs number vs. datetime)
- [`rdfs`](https://www.w3.org/TR/rdf12-schema/) is a general vocabulary for basic relationships
- [`dc`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#) is the [Dublin Core](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) vocabulary for general information metadata attributes
- [`dcat`](https://www.w3.org/ns/dcat#) is the [Data Catalog (DCAT) Vocabulary](https://www.w3.org/TR/vocab-dcat-3), a vocabulary for dataset metadata attributes
- [`freq`](http://purl.org/cld/freq/) is the [Dublin Core Collection Frequency Vocabulary](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/), a vocabulary for dataset temporal resolution and update frequency
- [`qudt-units`](https://www.qudt.org/doc/DOC_VOCAB-UNITS.html) provides standard identifiers for units (e.g. [cubic feet per second](https://qudt.org/vocab/unit/FT3-PER-SEC))
- [`qudt-quantkinds`](https://www.qudt.org/doc/DOC_VOCAB-QUANTITY-KINDS.html) provides ids for general phenomena (e.g. [Volume flow rate](https://qudt.org/vocab/quantitykind/VolumeFlowRate)) which may be measured in various units
- [`gsp`](http://defs.opengis.net/vocprez/object?uri=http://www.opengis.net/def/function/geosparql) provides ids for spatial relationships (e.g. intersects)
- [`odm2var`](http://vocabulary.odm2.org/variablename) is a supplement to `qudt-quantkinds`, and includes ids for many variables relevant to water science and management (e.g. [turbidity](http://vocabulary.odm2.org/variablename/turbidity/))
- [`odm2varType`](http://vocabulary.odm2.org/variabletype/) is a supplement to `odm2var` that includes ids for large groupings of variables (e.g. [Water Quality](http://vocabulary.odm2.org/variabletype/WaterQuality/))
- [`hyf`](https://www.opengis.net/def/schema/hy_features/hyf/) provides ids for surface water hydrology concepts (e.g. [streams](https://defs.opengis.net/vocprez/object?uri=https%3A//www.opengis.net/def/schema/hy_features/hyf/HY_River))
- [`skos`](https://www.w3.org/TR/swbp-skos-core-spec/) provides general properties for relating different concepts (e.g. broader, [narrower,](https://www.w3.org/2009/08/skos-reference/skos.html#narrower) exact Match)
- [`ssn`](https://www.w3.org/TR/vocab-ssn/) and `ssn-system` provide ids for aspects of observations and measurement (e.g. measurement methods)
### Reference Features {#sec-ref}
Embedding links to URIs of Reference Features are the best way to ensure that your data can be related to other data providers' data. URIs for reference features are available from [the Geoconnex reference feature server](https://reference.geoconnex.us/collections). Reference features can be one of three types:
- **Monitoring Locations** which are common locations that many organizations might have data about such as a streamgage station e.g. <https://geoconnex.us/ref/gages/1143822>
- **Hydrologic Features** which are common specific features of the hydrologic landscape that many organizations have data about. These could include confluence points, aquifers, stream segments and river mainstems and named tributaries, e.g. <https://geoconnex.us/ref/mainstems/29559>.
- **Cataloging Features** which are larger area units that are commonly used to group and filter data, such as [HUCs](https://geoconnex.us/ref/hu04/0308)[^1], [states](https://geoconnex.us/ref/states/48)[^2], [counties](https://geoconnex.us/ref/counties/37003)[^3], PLSS grids, public agency operating districts, etc.
[^1]: https://geoconnex.us/ref/hu04/0308
[^2]: https://geoconnex.us/ref/states/48
[^3]: https://geoconnex.us/ref/counties/37003
# Building Geoconnex Web Resources, Step-by-Step {#sec-step-by-step}
This section provides step-by-step guidance to build Geoconnex Web Resources, which should be an HTML webpage with a unique URL within which is embedded an JSON-LD document (see @sec-primer). See @sec-complete-examples for completed example documents to skip the step-by-step.
## Location or Dataset oriented?
Depending on what kind of resource i.e. (location or dataset) and the level of metadata you have available to publish, you can use different elements of the `@context` or use Reference Features in various ways. Below we will work through creating a JSON-LD document depending on your situation.
There are two basic patterns to think about:
1. `Location-oriented` webpages that include a catalog of parameters and periods of record for which there is data about the location. This pattern may be suitable where data can be accessed separately for each location and possibly for each parameter for each location. This is typical of streamgages, monitoring wells, water diversions, reservoirs, regulated effluent discharge locations, etc. where there is an ongoing monitoring or modeling program that includes data collection or generation for multiple parameters. The Monitor My Watershed Site pages published by the [Stroud Center](https://stroudcenter.org) are an example of this pattern. At [this page](https://monitormywatershed.org/sites/RH_MD/), one finds a variety of information about a specific location, such as that location's identifier and name and a map of where it is. In addition there is information about which continuous sensor and field water quality sample data are available about the location, and links to download these data.
2. `Dataset-oriented` webpages that tag which locations are relevant to the dataset described at a given page. This pattern may be suitable for static datasets where data was collected or modeled for a consistent set of parameters for a pre-specified research question and time period across one or more locations, and where it would not make sense to publish separate metadata for the parts of the dataset that are relevant to each individual feature and parameter. This is typical of datasets created for, and published in association withm scientific and regulatory studies. [This dataset record](https://www.hydroshare.org/resource/11dd1840fe6a48abb9a33380ecaa6e1d/) published on [CUAHSI](https://cuahsi.org)'s [Hydroshare](https://hydroshare.org) platform is an example, where there is a "Related Geospatial Features" section that explicitly identifies several features that the dataset has data about.
In some cases, it is possible to set up a web architecture that implements both patterns. For example, the [Wyoming State Engineer's Office Web Portal](https://seoflow.wyo.gov) conceptualizes a time series for a specific parameter at a specific location as a dataset. Thus, webpages exist for both [Locations](https://seoflow.wyo.gov/Data/Location/Summary/Location/06280300/Interval/Latest) and [Datasets](https://seoflow.wyo.gov/Data/DataSet/Summary/Location/06280300/DataSet/Discharge/Discharge/Interval/Latest), and they link to each other where relevant. In this case, it is only necessary to implement Geoconnex embedded JSON-LD at either the Location or Dataset level, although both could be done as well.
Having chosen one of the patterns, proceed to [location-oriented](@sec-loc) or [dataset-oriented](@sec-data) guidance to start building a JSON-LD document.
### Location-oriented {#sec-loc}
The purpose of the location-oriented page is to give enough information about the location and the data available about that location that a water data user would be able to quickly determine whether and how to download the data after reading. We will use the USGS Monitoring Location [08282300](https://geoconnex.us/usgs/monitoring-location/08282300) as an example for the type of content to put in location-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.
::: callout-note
Scroll up and down to view elements of the example landing page
:::
```{=html}
<iframe width="780" height="500" src="https://waterdata.usgs.gov/monitoring-location/08282300" title="USGS Example"></iframe>
```
This location-oriented web resource includes this type of information
- "[This is my HTTP identifier](https://geoconnex.us/ref/monitoring-location/08282300)"[^4]
- "I am the same thing as [Geoconnex Reference Gage 1018463](https://geoconnex.us/ref/gages/1018463)"[^5]
- "My unique USGS ID is `08282300`"
- "My name is `Rio Brazos at Fishtail Road NR Tierra Amarilla, NM`"
- "Data about me is provided by the `USGS Water Data for the Nation`"
- "I am a `hydrometric station`[^6]"
- "My lat/long is `36.738 -106.471`"
- "I am on the [Rio Brazos](https://geoconnex.us/ref/mainstems/1611418)"[^7]
- "There is data about me for the parameter `Discharge` and between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from `in-situ observation`, in particular using [USGS discharge measurement methods](https://pubs.usgs.gov/publication/tm3A8). You can download it [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00060&startDT=2023-08-13T03:08:21.313-06:00&endDT=2023-08-20T03:08:21.313-06:00&siteStatus=all&format=rdb) using the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Test-Tool.html) in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info)". You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations) using the [SensorThings API standard](https://docs.ogc.org/is/15-078r6/15-078r6.html) in `JSON` or `CSV` formats."[^8]
- "There is data about me for the parameter `Gage Height` between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from `in-situ observation`, in particular using [USGS stage measurement methods](https://pubs.usgs.gov/publication/tm3A7). You can download it [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00065&startDT=2023-08-13T03:08:21.313-06:00&endDT=2023-08-20T03:08:21.313-06:00&siteStatus=all&format=rdb) from the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Test-Tool.html) in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info)". You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('ba774169b3e542cdb9c02e8d705b4d0f')?$expand=Thing,Observations) using the [SensorThings API standard](https://docs.ogc.org/is/15-078r6/15-078r6.html) in `JSON` or `CSV` formats."
[^4]: This is ideally a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.
[^5]: Where possible, it will useful to tag your organization's locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.
[^6]: This ideally would come from a [codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) so that data providers use consistent terminology
[^7]: Note that ideally this would be a geoconnex URI for a river mainstem, in this case <https://geoconnex.us/ref/mainstems/1611418>
[^8]: This is towards the 'more detailed' end of the spectrum. If data is not available via API, it is still good to include links to data file downloads or web apps that provide access to the data
#### JSON-LD
Here we will build the equivalent JSON-LD content step-by-step. The steps are:
1. [Identifiers and provenance](#sec-ident)
2. [Spatial geometry and hydrologic references](#sec-spatial)
3. [Datasets](#sec-loc-data)
These culminate in the [complete example]().
##### Identifiers and provenance {#sec-ident}
A first group of information helps identify the location and its provenance.
- "[This is my HTTP identifier](https://geoconnex.us/usgs/monitoring-location/08282300)"[^9]
- "I am a `hydrometric station`[^10]"
- "I am the same thing as [Geoconnex Reference Gage 1018463](https://geoconnex.us/ref/gages/1018463)"[^11]
- "My unique USGS ID is `08282300`"
- "My name is `Rio Brazos at Fishtail Road NR Tierra Amarilla, NM`"
- "Data about me is provided by the `USGS Water Data for the Nation`"
[^9]: This is ideally a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.
[^10]: This ideally would come from a [codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) so that data providers use consistent terminology
[^11]: Where possible, it will useful to tag your organization's locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.
``` json
{
"@context": {
"@vocab":"https://schema.org/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"locType": "http://vocabulary.odm2.org/sitetype/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
}
}
```
Here we construct the JSON-LD document by adding a context which includes the <https://schema.org/> vocabulary, as well as the <https://www.opengis.net/def/schema/hy_features/hyf/> vocabulary which defines specific concepts in surface hydrology, and the ODM2 [sitetype vocabulary](http://vocabulary.odm2.org/sitetype/) which defines types of water data collection locations.
- The `@id` element of <https://geoconnex.us/ref/monitoring-location/08282300> in this case is a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.
- The `@type` element here specifies that <https://geoconnex.us/ref/monitoring-location/08282300> is a [Place](https://schema.org/Place) (i.e. a generic place on earth), a [Hydrometric Feature](https://www.opengis.net/def/schema/hy_features/hyf/HY_HydrometricFeature) (i.e. a data collection station) and a [HydroLocation](https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocation) (i.e. a specific location that could in principle define a catchment). The `locType` further specifies the type of location using the ODM2 sitetype vocabulary <http://vocabulary.odm2.org/sitetype/>, which expresses the location type in terms of the feature of interest (e.g. a stream, a groundwater system). If the location is more meant to represent a general location about which non-hydrologic data is being provided, as might be the case with a data provider publishing data about dams, levees, culverts, bridges, etc. but not associated water data, then `locType` and `hyf:HY_HydrometricFeature` can be omitted.
- The `hyf:HydroLocationType` can be used to identify the type of site with greater specificity and customization by using text values from any codelist, but preferably the [HY_Features HydroLocationType codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) instead of identifiers. It can be useful to describe something like a dam, weir, culvert, bridge, etc.
- The `sameAs` element is optional if the `@id` element is included as a persistent geoconnex URI. However, wherever possible, it should be populated with a Geoconnex Reference Feature URI. If all data providers tag their own location metadata with these, it becomes much more easy for users of the Geoconnex system to find data collected by other providers about the same location. Reference features of all sorts are available to browse in a web map at <https://geoconnex.us/iow/map>, access via API at <https://reference.geoconnex.us/collections>, or to download in bulk as GeoPackage files from [HydroShare](https://www.hydroshare.org/resource/3cc04df349cd45f38e1637305c98529c/). If your location does not appear to be represented in a reference location, please consider contributing your location. You can start this process by [submitting an issue at the geoconnex.us GitHub repository](https://github.com/internetofwater/geoconnex.us/issues/new?assignees=&labels=&projects=&template=general.md&title=%5Bgeneral%5D). In this case `sameAs` is a persistent geoconnex URI for a "Reference Gage". Reference Gages is an open source, continuously updated set of all known surface water monitoring locations with data being collected by all known organizations. It is managed on GitHub at <https://github.com/internetofwater/ref_gages>
- The `identifier` element specifies the ID scheme name (`propertyID`) for the location in the data source and the ID itself (`value`)
- The `name` (required) and `description` (optional) elements are self-explanatory and can follow the conventions of the data provider.
- The `provider` element describes the data provider, which is generally conceptualized in Geoconnex as being a data system available on the web. Note that under `provider`, in addition to an identifying `name`, there is a `url` if available for the website of the providing data system, and a `@type`, which is most likely a sub type of <https://schema.org/Organization>, which includes [GovernmentOrganization](https://schema.org/GovernmentOrganization), [NGO](https://schema.org/NGO), [ResearchOrganization](https://schema.org/ResearchOrganization), [EducationalOrganization](https://schema.org/EducationalOrganization), and [Corporation](https://schema.org/Corporation), among others.
##### Spatial geometry and hydrologic references {#sec-spatial}
The second group of information provides specific location and spatial context:
- "My lat/long is `36.738 -106.471`"
- "I am on the [Rio Brazos](https://geoconnex.us/ref/mainstems/1611418)"[^12]
[^12]: Note that ideally this would be a geoconnex URI for a river mainstem, in this case <https://geoconnex.us/ref/mainstems/1611418>
Adding this information to the bottom of JSON-LD document:
``` json
{
"@context": {
"@vocab":"https://schema.org/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"locType": "http://vocabulary.odm2.org/sitetype/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"geo": {
"@type": "schema:GeoCoordinates",
"longitude": -106.4707722,
"latitude": 36.7379333
},
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#Point",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "POINT (-106.4707722 36.7379333)"
},
"gsp:crs": {
"@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
}
},
"hyf:referencedPosition":{
"hyf:HY_IndirectPosition":{
"hyf:linearElement":{
"@id": "https://geoconnex.us/ref/mainstems/1611418"
}
}
}
}
```
We have added a context element `gsp` and three blocks: `geo`, `gsp:hasGeometry`, and `hyf:referencedPosition`.
- `gsp` is the [GeoSPARQL](https://www.ogc.org/standard/geosparql/) ontology used to standardize the representation of spatial data and relationships in knowledge graphs like the Geoconnex system
- `geo` is the `schema.org` [standard for representing spatial data](https://schema.org/geo). It is what is used by search engines like Google and Bing to place webpages on a map. While useful, it does not have a standard way for representing multipoint, multipolyline, or multipolygon features, or a way to specify coordinate reference systems or projections, and so we need to also provide a GeoSPARQL version of the geometry. In this case, we are simply providing a point with a longitude and latitude via the [schema:GeoCoordinates](https://schema.org/GeoCoordinates) property. It is also possible to represent [lines](https://schema.org/line) and [polygons](https://schema.org/polygon)
- `gsp:hasGeometry` is the GeoSPARQL version of geometry, with which we can embed [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) representations of geometry in structured metadata in the `@value` element, and declare the coordinate reference system or projection in the `gsp:crs` element by using EPSG codes as encoded in the [OGC register of reference systems](http://www.opengis.net/def/crs/EPSG/0/), in this case using <http://www.opengis.net/def/crs/EPSG/0/4326> for the familiar WGS 84 (EPSG 4326) system.
- `hyf:referencedPosition` uses the [HY_Features](https://www.opengis.net/def/schema/hy_features/hyf/) model to declare that this location is located on a specific river, in this case the [Rio Brazos in New Mexico](%22https://geoconnex.us/ref/mainstems/1611418%22) as identified in the Reference Mainstems dataset, which is available via API at <https://reference.geoconnex.us/collections/mainstems> and managed on GitHub at <https://github.com/internetofwater/ref_rivers>. All surface water locations should include this type of element.
::: callout-note
####### What about groundwater?
Groundwater monitoring locations may use the `hyf:referencedPosition` element if data providers wish their wells to be associated with specific streams. However, groundwater sample and monitoring locations such as wells can also be referenced to hydrogeologic unit or aquifer identifiers where available using this pattern, instead of using the `hyf:referencedPosition` pattern:
``` json
"http://www.w3.org/ns/sosa/isSampleOf": {
"id": "https://geoconnex.us/ref/sec_hydrg_reg/S26"
}
```
USGS Principal Aquifers and Secondary Hydrogeologic Unit URIs are available from <https://reference.geoconnex.us/collections>
If reference URIs are not available for the groundwater unit you'd like to reference, but an ID does exist in a dataset that exists online you may use this pattern
``` json
"http://www.w3.org/ns/sosa/isSampleOf": {
"@type": "GW_HydrogeoUnit",
"name": "name of the aquifer",
"identifier": {
"@type": "PropertyValue",
"propertyID": "Source aquifer dataset id field name",
"value": "aq-id-1234"
},
"subjectOf": {
"@type": "Dataset",
"url": "url where dataset that descibes or includes the aquifer can be accessed"
}
}
```
:::
##### Datasets {#sec-loc-data}
Now that we have described our location's provenance, geospatial geometry, and association with any reference features , we now describe the data that can be accessed about that location. The simplest, most minimal way to do this is to add a block like this, which would be added to the bottom of the JSON-LD document we have created so far:
``` json
"subjectOf": {
"@type": "Dataset",
"name": "Discharge data from USGS-08282300",
"description": "Discharge data from USGS-08282300 at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060&period=P7D"
}
```
Here, we simply declare that the location we have been working with is `subjectOf` of a `Dataset` with a name, description, and URL where information about the dataset can be found.
However, to enable data users (and search engines) to filter for your data using more standardized names for variables, and by temporal coverage and resolution, and determine if they want to use that data based on the methods used (such as whether it is observed or modeled/forecasted data), and possibly preview actual data values, it will be useful to include much more detailed metadata. In general, following [Science-on-Schema.org Guidelines](https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md) is recommended. We implement this guidance, with some extension, for the USGS Monitoring Location example. Hover over the code annotation bubbles on the right for translation and explanation:
``` json
{
"subjectOf":{ // <1>
"@type": "Dataset", // <2>
"name": "Discharge data from USGS Monitoring Location 08282300", // <3>
"description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", // <3>
"license": "https://spdx.org/licenses/CC-BY-4.0", // <4>
"isAccessibleForFree": "true", // <5>
"variableMeasured": { // <6>
"@type": "PropertyValue", // <7>
"name": "discharge", // <7>
"description": "Discharge in cubic feet per second", // <7>
"propertyID": "https://www.wikidata.org/wiki/Q8737769", // <8>
"url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)", // <9>
"unitText": "cubic feet per second", // <10>
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", // <11>
"unitCodet": "qudt-units:FT3-PER-SEC", // <12>
"measurementTechnique": "observation", // <13>
"measurementMethod": { // <14>
"name":"Discharge Measurements at Gaging Stations", // <14>
"publisher": "U.S. Geological Survey", // <14>
"url": "https://doi.org/10.3133/tm3A8" // <14>
} // <14>
}, // <14>
"temporalCoverage": "2014-06-30/..", // <15>
"dc:accrualPeriodicity": "freq:daily", // <16>
"dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"}, // <16>
"distribution": [ // <17>
{
"@type": "DataDownload", // <17>
"name": "USGS Instantaneous Values Service" // <17>
"contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00060&format=rdb", // <17>
"encodingFormat": ["text/tab-separated-values"], // <17>
"dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf" // <17>
},
{
"@type": "DataDownload", // <18>
"name": "USGS SensorThings API", // <18>
"contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations", // <18>
"encodingFormat": ["application/json"], // <18>
"dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html" // <18>
}
]
}
}
```
1. This node (we are continuing from the above JSON-LD document, so the USGS Monitoring Location) is `subjectOf` the node that follows)
2. This node is a `Dataset`(<https://schema.org/Dataset>)
3. The dataset's name and description
4. The dataset's license, which is most easily populated by a URI for the license appropriate for your data. Federal agencies, many state agencies, and academics use open licenses such as those provided by [opendatacommons.org](https://opendatacommons.org/licenses/) and [creativecommons.org](https://creativecommons.org/licenses). URIs for licenses are available from <https://spdx.org/licenses/>
5. Either `true` or `false` depending on if the dataset is available for free.
6. The dataset includes information on a variable. (in schema.org called [variableMeasured](https://schema.org/variableMeasured)). Multiple `variableMeasured` can be specified for datasets by using [arrays](https://www.w3.org/TR/json-ld11/#example-135-indexing-language-tagged-strings-and-set), which is useful for datasets that must be downloaded in bulk that include multiple variables of interest. In general it is more clear to specify a "dataset" per `variableMeasured` if the data has different temporal coverage per variable, or can be downloaded on a per-variable basis. Multiple `variableMeasured` can be specified using nested JSON arrays.
7. `@PropertyValue` is a generic type to extend schema.org properties and should just be used as a rule on `variableMeasured` nodes.
8. `propertyID` should be a URI where there is a machine-readable resource defines what the variable is. In this case, we are using a Wikidata link to the concept of stream discharge. In general, a good source for URIs is the [ODM2 variable vocabulary](http://vocabulary.odm2.org/variablename/).
9. Here `url` points to a human-readable resource describing the variable, in this case, we are using a Wikipedia link to the concept of stream discharge.
10. Here we use the units as written in the data source.
11. While `name` and `propertyID` specifies the variable as being "discharge" in this case, since multiple data sources might use different words and identifiers for their variables, it can be useful to reference a more general category of variables that we can ue to group variables across sources. We can use identifiers for [QuantityKinds](https://qudt.org/schema/qudt/QuantityKind) from QUDT, which we reference with the `qudt-quantkinds` for the prefix as described in the `@context` in @sec-context.
12. While `unitText` above specifies the units, since multiple data sources might use different words for the same unit, to improve interoperability we can use identifiers for units provided by QUDT, which we reference with the `qudt-units` vocabulary prefix as described in the `@context` in @sec-context. If units from QUDT are unavailable, first check if `unitText` can be filled with a term from name from http://vocabulary.odm2.org/units/.
13. `measurementTechnique` is meant to be a highly general account of the data generating procedure, and primarily to distinguish between observed and modeled data. It is highly recommended for this to be `model` or `observation`, or if more specificity is required, to restrict these values to the ODM2 [methodType](http://vocabulary.odm2.org/methodtype/) vocabulary.
14. `measurementMethod` specifies the method used to generate the data to as great a degree of specificity as possible. Ideally it could a persistent identifier that directs to a machine-readable web resource that unambiguously describes that method. This would look something like this: `"measurementMethod": {"@id": "https://www.nemi.gov/methods/method_summary/4680/"}` In lieu of that, a name, description and URL to human-readable web resource like an explanatory webpage, technical report, standards document, or academic article would be appropriate, as in this example for USGS discharge measurement.
15. `temporal coverage` refers to the first and last time for which data is available. It can be specified using [ISO 8061 interval format](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals) (`YYYY-MM-DD/YYYY-MM-DD`, with the start date first and the end date after the `/` . It can also include time like so `YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS` . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so `YYYY-MM-DD/..` .
16. `dc:accrualPeriodicty` refers to the update schedule of the published dataset. The value of this can be from the [Dublin Core frequency vocabulary](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/) (here in @context as `freq:`). `dcat:temporalResolution` refers to the minimum intended time spacing between observations in the case of regular time series data. The value should be an [xsd duration encoded string](http://www.datypic.com/sc/xsd/t-xsd_duration.html) e.g. "PT15M" for 15-minute, "P1D" for Daily, "PT1H" for Hourly, "P7D" for Weekly, "P1M" for Monthly, "P1Y" for Annual. [`freq:`](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/) or be specified using ISO duration code.
17. `distribution` provides a way to structure information about data access points. This can range in complexity from a specification of a URL and format to specifications for how to interact with an API. In this example, a URL, format (`encodingFormat` populated by [MIME type](https://www.iana.org/assignments/media-types/media-types.xhtml)). `conformsTo` is optional and should be a document that helps interpret the data structure. This could be a link to a data dictionary in the case of simple tabular data, documentation of a data model for a complex database, or an API specification document for an API endpoint.
18. Multiple `distributions` can be specified using nested JSON arrays.
This translates roughly to
- There is is the following information about me: a `Dataset`
- for the variable (`measuredVariable`) `Discharge`
- It has values between `June 6, 2014` to the `present`
- at a `15` `minute` time resolution
- updated/ published daily
- in units of `cubic feet per second`
- generated by `location observation`
- generated in particular using [USGS discharge measurement methods](https://pubs.usgs.gov/publication/tm3A8).
- You can download it:
- [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶mete)
- Using the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Service.html)
- in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info)
- You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations)
- Using the [USGS SensorThings API implementation](https://labs.waterdata.usgs.gov/docs/sensorthings/index.html)
- in JSON
### Dataset-oriented {#sec-data}
The purpose of the dataset-oriented page is to give enough information about the data available and the area, locations, or features that it is relevant to that a water data user would be able to quickly determine whether and how to download the data after reading. We will use this [data resource about water utility treated water demand that has been published at HydroShare](https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/) as an example for the type of content to put in dataset-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.
::: callout-note
Scroll up and down to view elements of the example landing page
:::
```{=html}
<iframe width="780" height="500" src="https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/" title="Hydroshare Example"></iframe>
```
This dataset-oriented web resource includes this type of information
- "This is my URI (which is a DOI-URL): https://geoconnex.us/ref/monitoring-location/08282300"[^13]
- "This is my permanent identifier, which is a DOI": [^14]
- "This is my URL <https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299>"[^15]
- "My creator is <name>"
- "I am provided by HydroShare"
- "My spatial coverage is the bounding box `"35.5463 -79.1235 36.0520 -78.3765"`"[^16]
- "I have data between January 1, 2002 and December 31, 2020" [^17]
- "My data is at a `1` `month` time step frequency" [^18]
- "I am about the following features":[^19].
- [Raleigh Public Water System](https://geoconnex.us/ref/pws/NC0392010)
- [Cary Public Water System](https://geoconnex.us/ref/pws/NC0392020)
- [Durham Public Water System](https://geoconnex.us/ref/pws/NC0332010)
- [Apex Public Water System](https://geoconnex.us/ref/pws/NC0392045)
- [Orange Water and Sewer Authority](https://geoconnex.us/ref/pws/NC0368010)
- "I have the following variables":
- Monthly Water demand measured in units of averaged millions of gallons per day
- Historic Mean monthly water demand over the period of record measured in units of millions of gallons per day
- The monthly water demand divided by historic mean monthly water demand, as a percent
- "You can download me [here](https://www.hydroshare.org/hsapi/resource/4cf2a4298eca418f980201c1c5505299/) on HydroShare as a zipped csv file"
- "I am accessible for free subject to this [license](http://creativecommons.org/licenses/by/4.0/).
[^13]: If a permanent identifier like a DOI is available
[^14]: for identifiers that are not HTTP URLs
[^15]: The actual URL where the resource
[^16]: Spatial coverage revers to maximum area extent of where data is about. For Geoconnex purposes, this is not necessary if the "about" elements with links to Geoconnex Reference Features is used
[^17]: refers to the first and last time for which data is available. It can be specified using [ISO 8061 interval format](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals) (`YYYY-MM-DD/YYYY-MM-DD`, with the start date first and the end date after the `/` . It can also include time like so `YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS` . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so `YYYY-MM-DD/..`
[^18]: refers to the minimum intended time spacing between observations in the case of regular time series data
[^19]: These should be geoconnex reference feature URIs. If the locations the dataset is about is not within <https://reference.geoconnex.us/collections>, then consider [creating location-based resources and minting geoconnex identifiers](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md). If the dataset is extensive over a vector feature spatial fabric, like all Census Tracts or HUC12s or NHD Catchments, then this can be a reference to a single reference fabric dataset rather than an array of identifiers for every single feature. If the dataset is extensive over an area but has no particular tie to a particular reference feature set, like a raster dataset, then this can be omitted.
#### JSON-LD
Much is similar to the [Datasets guidance for location-oriented web resources](#sec-loc-data), so here we focus on the differences. Note that HydroShare automatically embeds JSON-LD. The JSON-LD examples below vary somewhat from HydroShare's default content to illustrate optional elements that would be useful for Geoconnex that are not currently implemented in HydroShare.
##### Identifiers, provenance, license, and distribution.
For basic identifying and descriptive information, [science-on-schema.org has appropriate guidance](https://github.com/ESIPFed/science-on-schema.org/blob/master/examples/dataset/minimal.jsonld). In this case, note that a specific file download URL has been provided rather than an API endpoint, and that `dc:conformsTo` points to a data dictionary that is supplied at the same web resource.
``` json
{
"@context": {
"@vocab": "https://schema.org/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"keywords": ["water demand", "water supply", "geoconnex"],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": [
{
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": ["text/csv"],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},
...
```
##### Variables and Methods
Again, follows the [dataset guidance](#sec-loc-data). In the example below, multiple `variableMeasured` are specified using a nested array. Other differences to point out:
- The unit of "million gallons per day" is not available from the QUDT units vocabulary. It is in the [ODM2 units codelist](http://vocabulary.odm2.org/units/), so we populate `unitCode` with the url listed there.
- The measurementMethod for both variables, which are simply different aggregation statistics for the same variable, do not have known web resources or specific identifiers available, and so use `description` to clarify the method.
``` json
...,
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "water demand",
"description": "treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, accumlated over one month",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
{
"@type": "PropertyValue",
"name": "water demand (monthly average)",
"description": "average monthly treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, average accumlated over each month for multiple years",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
],
"temporalCoverage": "2002-01-01/2020-12-31",
"ssn-system:frequency": {
"value": "1",
"unitCode": "qudt-units:Month"
},
```
##### Geoconnex Reference Feature Links and Spatial Coverage
Unlike the location-based example, where a location is explicitly the `subjectOf` the dataset, here, the dataset must be described as being `about` certain features. If the dataset is not explicitly about any discrete features, such as raster datasets, then a Spatial Coverage should be specified.
Using the `about` construction, a single geoconnex URI or an array of multiple can be constructed. In the below example, multiple are used. Note the nesting of nodes within the array so that each URI has an `@id` keyword and is `@type` `Place`. In this example, URIs from the geoconnex [reference features set for Public Water Systems](https://reference.geoconnex.us/collections/pws) are used.
``` json
...,
"about": [
{
"@id": "https://geoconnex.us/ref/pws/NC0332010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0368010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392020",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392045",
"@type": "Place"
}
],
...
```
To assist in finding reference features, <https://reference.geoconnex.us> allows queries following the [OGC-API Features](https://ogcapi.ogc.org/features/) API standard and the CQL [Common Query Language standard](https://portal.ogc.org/files/96288).
For example, to find the Geoconnex URI for the Raleigh public water system (PWS), we can construct the URL:
- CQL filter API endpoint for the PWS feature collection <https://reference.geoconnex.us/collections/pws/items>
- filter for name field `pws_name`: <https://reference.geoconnex.us/collections/pws/items?filter=pws_name>
- filter for a name that includes "Raleigh": [https://reference.geoconnex.us/collections/pws/items?filter=pws_name ILIKE '%Raleigh%'](https://reference.geoconnex.us/collections/pws/items?filter=pws_name%20ILIKE "%Raleigh%")
Sometimes it is impossible to use feature URIs because the relevant specific features are not available from <https://reference.geoconnex.us/collections>. If so, feel free to [submit an issue to the geoconnex.us github repository](https://github.com/internetofwater/geoconnex.us/issues/new/choose) requesting a reference feature set.
Sometimes it is impractical to list all applicable reference features, whether or not they are in <https://reference.geoconnex.us> or another source. This is common for comprehensive datasets that are all about an entire reference dataset or other another dataset like a hydrofabric, such as datasets summmarizing values to U.S. Counties, or the National Water Model generating values for all NHDPlusV2 COMID flowlines. In this case it is best to declare that the Dataset is [isBasedOn](https://schema.org/isBasedOn) the source geospatial fabric. For example, if the example dataset were about all public water systems instead of just the 5 listed, instead of `about`, we should specify an identifier, name, description, and any URLs for other resources that describe the source fabric and how to interpret it:
``` json
...,
"isBasedOn": {
"@id": "https://www.hydroshare.org/resource/9ebc0a0b43b843b9835830ffffdd971e/",
"name": "U.S. Community Water Systems Service Boundaries, v4.0.0"
"description": "This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US."
"url": "https://github.com/SimpleLab-Inc/wsb"
},
...
```
Sometimes there are no particular features that a dataset is explicitly about. This is common with remote sensing raster data. In this case, it is best to specify a `spatialCoverage` polygon using WKT encoded geometry:
``` json
"spatialCoverage": {
"@type": "Place",
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#MultiPolygon",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "MULTIPOLYGON (((-85.67957299999999 32.799514, -85.679637 32.822002999999995, -85.67199699999999 32.822063, -85.66421 32.821711, -85.647989 32.82224, -85.627966 32.822331, -85.627781 32.800716, -85.627496 32.778602, -85.635931 32.778656999999995, -85.645034 32.778146, -85.653352 32.778481, -85.67933699999999 32.778239, -85.67936399999999 32.784064, -85.679808 32.792068, -85.67957299999999 32.799514)))"
}
}
}
```
## Complete Examples {#sec-complete-examples}
Below are complete examples for the general JSON-LD document types depending on the location or dataset orientation and data type.
They are viewable together below, or available for download:
- [location-oriented example](https://raw.githubusercontent.com/internetofwater/geoconnex-guidance/main/examples/location-complete.jsonld)
- [dataset-oriented example](https://raw.githubusercontent.com/internetofwater/geoconnex-guidance/main/examples/dataaset-complete.jsonld)
### Location-oriented {#sec-loc-complete-example}
``` json
{
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {
"@id": "https://geoconnex.us/ref/gages/1018463"
},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"geo": {
"@type": "schema:GeoCoordinates",
"longitude": -106.4707722,
"latitude": 36.7379333
},
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#Point",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "POINT (-106.4707722 36.7379333)"
},
"gsp:crs": {
"@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
}
},
"hyf:referencedPosition": {
"hyf:HY_IndirectPosition": {
"hyf:linearElement": {
"@id": "https://geoconnex.us/ref/mainstems/1611418"
}
}
},
"subjectOf": {
"@type": "Dataset",
"name": "Discharge data from USGS Monitoring Location 08282300",
"description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060",
"variableMeasured": {
"@type": "PropertyValue",
"name": "discharge",
"description": "Discharge in cubic feet per second",
"propertyID": "https://www.wikidata.org/wiki/Q8737769",
"url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)",
"unitText": "cubic feet per second",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "qudt-units:FT3-PER-SEC",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "Discharge Measurements at Gaging Stations",
"publisher": "U.S. Geological Survey",
"url": "https://doi.org/10.3133/tm3A8"
}
},
"temporalCoverage": "2014-06-30/..",
"dc:accrualPeriodicity": "freq:daily",
"dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},
"distribution": [
{
"@type": "DataDownload",
"name": "USGS Instantaneous Values Service",
"contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00060&format=rdb",
"encodingFormat": [
"text/tab-separated-values"
],
"dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf"
},
{
"@type": "DataDownload",
"name": "USGS SensorThings API",
"contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations",
"encodingFormat": [
"application/json"
],
"dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html"
}
]
}
}
```
### Dataset-oriented {#sec-data-complete-example}
``` json
{
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"url": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"keywords": [
"water demand",
"water supply",
"geoconnex"
],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": {
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": [
"text/csv"
],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},
"variableMeasured": [
{