This project is actively under development. Reach out to bayden.willms@noaa.gov for questions.
Creating new data templates and updating the data dictionary accordingly have now been automated, in addition to the term markdown file generation in generate_term_docs.py
. The new script, generate_template.py
, will perform the following tasks:
- Get Latest MIMARKS Terms: Retrieve the latest terms and their comments from the latest MIMARKS package (Excel file).
- Get up-to-date AOML and Darwin Core Terms: Obtain up-to-date AOML and Darwin Core terms from the data dictionary (Google Sheet).
- Compare and Create New Data Template: Compare terms from those files and an existing data template to create a new data template (Google Sheet) for a new environment.
- Update Data Dictionary: Update the data dictionary (
study-data-template-dict
Google Sheet) with new terms and update existing terms appropriately.
Additionally, there is a script to verify the Google Sheets API connection called test_auth.py
.
An AOML Omics Google Drive admin must grant permission for the Google Sheets and Drive API.
- Go to the Google Cloud Console.
- Log in with AOML Omics account.
- The project name is
data-templates
.
- Go to API & Services > Credentials.
- Click on "Create Credentials" and select "Service Account".
- Fill in the service account details (name, ID, description) and click "Create".
- In the next step, assign the "Editor" role to the service account to grant access to modify Google Sheets.
- Click "Continue".
- Under the service account you created, click "Add Key" and select "JSON".
- This will download a JSON file with the credentials to your computer.
- Share this JSON file securely with the user who needs to set up the environment.
- The service account has an email address (e.g.,
service-account-name@data-templates.iam.gserviceaccount.com
). - Share the Google Sheets with this email address to grant access.
This project works on both macOS and Windows. Ensure you have Git and Anaconda installed on your computer:
- Clone the GitHub project to your local machine:
git clone https://github.com/NOAA-Omics/noaa-omics-templates.git
cd noaa-omics-templates
- Configure Anaconda environment using provided
environment.yml
conda env create -f environment.yml
conda activate data-templates-env
- The admin will provide you with your
credentials.json
file. - Place the
credentials.json
in your 'script-dependencies' folder. - You can use the
test_auth.py
script to test API connection.
-
Create a Copy of an Existing Data Template:
- Go to the AOML Omics Google Drive.
- Create a copy of an existing data template.
- Rename the copy according to your new environment name.
- Rename the sheet name, for example, from
water_sample_data
tosediment_sample_data
.
-
Download MIMARKS File:
- Download the latest MIMARKS package from NCBI Biosample Templates.
- Move the downloaded Excel file to the
script-dependencies
folder in the project. - Copy and paste the filename into the
generate_template.py
script at the line:mimarks_filename = 'MIMARKS.survey.sediment.6.0.xlsx' # EDIT ME
-
Configure Credentials:
- Copy your
credentials.json
filename and paste it into thegenerate_template.py
script at the line:credentials_filename = 'data-templates-c7159dc891a7.json' # EDIT ME
- Copy your
-
Set Up Google Sheet IDs:
- The Google Sheet ID is available in the URL of the Google Sheet, between
/d/
and/edit
. For example, in the URLhttps://docs.google.com/spreadsheets/d/abc123XYZ456/edit
, the Google Sheet ID isabc123XYZ456
. - You will need two Google Sheet IDs:
- One for
study-data-template-dict
. - One for your newly created data template (the one that is a copy of an existing template).
- One for
- Then, copy and paste those IDs into the
generate_template.py
script at the line:
study_template_dict_sheet_id = "abc123XYZ456" #EDIT ME new_template_id = "abc123XYZ456" #EDIT ME
- The Google Sheet ID is available in the URL of the Google Sheet, between
- Navigate to the Scripts Folder:
cd scripts python generate_template.py