In October 2024, GBIF launched a pilot phase for the Metabarcoding Data programme intended to improve GBIF’s integration of DNA metabarcoding data on biodiversity.
The pilot programme is open to GBIF nodes who wish to administer an MDT. Link to application form.
The GBIF Secretariat will provide each participating node with a hosted installation of the MDT, providing maintenance and updates as new features and versions are introduced. Installations can be configured to operate in one of two modes. The selection is based on wanted features and potential restrictions on data hosting.
Publishing mode: MDT admins (and users, if given permission) can register datasets for publication through GBIF through the organizations to which they’re associated. Operating in this mode, the MDT functions similarly to an installation of GBIF’s Integrated Publishing Toolkit (IPT) and serves a publishing platform into GBIF.
Unique features
-
Direct dataset publication to GBIF.org by admins (and users when/if given permision).
-
Easy correction, updating, re-mapping, re-annotation of sequences, etc, followed by publication (update).
-
Extra BIOM endpoints for improved interoperability and visualization purposes.
Conversion-only mode: MDT admins and users can use it to reshape their datasets into GBIF-ready Darwin Core Archive (DwC-A) files but must download them for hosting and publication on another repository, such as an IPT. This mode may be most appropriate where nodes or data holders have data sovereignty concerns. This mode is offered in recognition of the fact that data sovereignty issues may require that some nodes host datasets on servers within their national boundaries. Unfortunately, this "download-and-publish-elsewhere" procedure, disables most of the user-friendly features of the MDT (see table below).
Unique features
-
Final DwC-A cannot be published directly from the MDT, and needs to be downloaded and hosted elsewhere.
Feature | Publishing | Conversion-only | Comments |
---|---|---|---|
Admins can publish directly to GBIF.org |
✅ |
❌ |
|
Users can be given rights to publish to GBIF.org |
✅ * |
❌ |
*) Admins can opt to withhold publishing rights from users, and manage all publication themselves |
Datasets can be updated from the MDT |
✅ |
❌ * |
*) For conversion-only MDT original input files could be updated, uploaded again, processed, and downloaded as updated DwC-A to replace the previous endpoint wherever it is hosted |
Datasets on GBIF.org contains the extra endpoint type BIOM_2_1 |
✅ |
❌ |
The BIOM formats ensures better reuse and retains future visualization possibilites |
Dataset on GBIF.org contains the extra endpoint type BIOM_1_0 |
✅ |
❌ |
The BIOM formats ensures better reuse and retains future visualization possibilites |
Retain unmappable fields for potential future re-mapping |
✅ |
❌ |
The BIOM format holds all uploaded data fields (incl unmapped fields), and the mapping can thus be extended e.g. when new DwC terms are available |
Dataset has to be downloaded and published elsewhere |
❌ |
✅ |
|
Dataset can be downloaded and published elsewhere |
✅ * |
✅ * |
*) Difficult to update/correct datasets. No BIOM endpoints |
Dataset (endpoint) hosted by GBIF |
✅ |
❌ * |
*) The entire dataset will remain in the hosted MDT unless deleted from there |
Both MDT administrators (and users) can start using a hosted MDT immediately. NB: Administrators typically do not need to take any action until they receive an email from an user of the MDT – this process (automatic email generation) is explained in detail from the user’s perspective in the Publishing Step of the [detailed_guidance].
Basic workflow with a new user
-
A new user (data holder) logs into the MDT with GBIF user account.
-
The user uploads a dataset, processes it, and (optionally) publishes it to the test environment (steps 1 to 6). In step 7 (Publish), the workflow is different for MDTs in Publishing Mode and MDTs in Conversion-only Mode:
In step 7 (Publish), the user selects one of two options, that both initiate an automated email to the MDT administrator:
-
"Ask for access to publish under this institution/organisation" – in cases where they can find their institution from the drop-down menu. (NB: drop-down pulls from the GBIF Registry).
OR
-
"Ask for help with registering your institution/organisation" – in cases where they cannot find their institution from the drop-down menu.
The MDT administrator then recieves an email from the user with relevant information to either:
-
Associate the user with the publishing institution/organization in the MDT (see section: [man_org]). A link in the email links directluy to the relevant dialogue box for the administrator.
OR
-
Start the procedure of identifying an existing publisher, or start the process of endorsing a new institution/organization as GBIF publisher. And – when this is done – associate the user with this publisher in the MDT (see section: [man_org]).
In the mail there is a link to the dataset in the MDT as well as a link to the dataset in the test environment if UAT publication was done.
Note
|
Returning users can upload and publish datasets under any of the publisning institutions/organizations they have been associated with in the MDT. |
In step 7 (Publish), the user clicks the link "Ready to publish a dataset? Reach out to the administrator for assistance". This will start a preformulated email to the administrator of the MDT, requesting help.
When receiving the email from the user, the MDT administrator can then start the process of ensuring that the user (and/or dataset) is associated with an endorsed GBIF publisher, that the dataset is suitable for publication, and help publishing the prepared DwC Archive through some other procedure (see: Publish through IPT or elsewhere).
Only MDT administrators have access to the Administration section. The Administration section has two tabs:
All Datasets (tab)
Manage Organizations (tab)
Here the administrator can add new organizations to the MDT. Organizations can also be deleted from the MDT (will not affect the GBIF Registry). Users (GBIF username) can be associated with an organization by pressing the plus (+) sign under Users. This enables users to publish datasets under that organization as publisher.
Organizations already registered in GBIF can be added to the MDT by the admin.
Administrators of MDTs in conversion-only mode, will need to download the [dwc-a] from the MDT and publish them through some other channel.
The Integrated Publishing Toolkit — commonly referred to as the IPT — is a free open-source software developed by GBIF and used by organizations around the world to create and manage repositories for sharing biodiversity datasets. If you have access to an IPT and know how to use it, you can download the [dwc-a] produced by the MDT at the Export (step 6) and publish it through an IPT.
Note
|
By downloading dataset from the MDT and publishing elsewhere, the possibility for easy updating, re-processing and visualization of the dataset in the MDT is lost. |
The MDT produces a fully publishable [dwc-a] with no need for changes or additions. The archive can validate in the GBIF data validator.
IPT users/administrators may run into challenges if using older versions of the IPT and/or if the DNA-derived data extension has not been installed. Also there is a known issue that requires the values of the license fields to be set manually.
Publishing an archive from the MDT via IPT
-
Download the DwC-A (archive.zip) from the MDT.
-
login to the IPT.
-
Press Magage Resources.
-
Press Create new.
-
Give your dataset a Shortname.
-
Select Occurrence under Type.
-
Choose Import from an archived resource. and select the archive on the computer.
-
Press Create.
-
Validate and verify that the data looks as expected.
-
Publish the data as one would normally do in the IPT.
A Darwin Core Archive produced with the MDT may be put elsewhere on the web – preferably in a stable repository (e.g. Zenodo, GitHub) – and can then be indexed by GBIF. This requires somebody to register the new resource with GBIF.
-
Download the DwC-A (archive.zip) from the MDT.
-
Put the archive in a stable repository so you have an URL: www/xxx/archive.zip
-
Register the dataset with the relevant publisher in the GBIF registry (How is that done ?).