Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

294 create branch with content of PRs #285 and #289 for testing value sets/enumerations/subsets integration with EnvO #295

Conversation

turbomam
Copy link
Member

@turbomam turbomam commented Jan 22, 2025

better input/output locations
…iated-ones-that-are-ready-then-build-and-release' into 294-create-branch-with-content-of-prs-285-and-289-for-testing-value-setsenumerationssubsets-integration-with-envo
@turbomam turbomam changed the title 294 create branch with content of prs 285 and 289 for testing value setsenumerationssubsets integration with envo 294 create branch with content of PRs #285 and #289 for testing value sets/enumerations/subsets integration with EnvO Jan 22, 2025
@turbomam
Copy link
Member Author

turbomam commented Jan 22, 2025

  • build submission-schema make squeaky-clean all test
  • build envo cd src/envo ; ./run.sh make -B modules/nmdc_env_context_subset_membership.owl all
  • copy new src/envo/envo.owl into submission-schema's notebooks/environmental_context_value_sets/
  • force regeneration of subsets ROBOT template in submission-schema make -B notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv
    • oops, -B forces regeneration of the whole project since nmdc_env_context_subset_membership.tsv depends on src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml. Should have just deleted the old nmdc_env_context_subset_membership.tsv and then made it without -B
  • cut -f3 notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv | sort | uniq -c
      1 AI http://www.geneontology.org/formats/oboInOwl#inSubset
     54 ENVO:03605013
     92 ENVO:03605014
    102 ENVO:03605015
      1 subset

Still only getting subset assignments to ENVO:03605013, ENVO:03605014, ENVO:03605015

Check NMDC subset annotations by querying envo.owl with SPARQL

one way to do that: docker run -p 127.0.0.1:7200:7200 --name graphdb-instance-name -t ontotext/graphdb:10.8.2

create a new graphdb repo with no reasoning and choose it/make it default

import/upload notebooks/environmental_context_value_sets/envo.owl

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ENVO: <http://purl.obolibrary.org/obo/ENVO_>
select * where {
    ?s rdfs:subPropertyOf* ENVO:03605010 .
    optional {
        ?s rdfs:label ?l 
    }
    optional {
        ?s rdfs:seeAlso ?a .
        filter(strstarts(str(?a), "https://microbiomedata.github.io/submission-schema/"))
    }
}
order by ?a ?l
?s ?l ?a
obo:ENVO_03605022 "NMDC PlantAssociated broad scale value set"@en  
obo:ENVO_03605024 "NMDC PlantAssociated environmental medium value set"@en  
obo:ENVO_03605023 "NMDC PlantAssociated local scale value set"@en  
obo:ENVO_03605016 "NMDC PlantAssociated value sets"@en  
obo:ENVO_03605011 "NMDC Soil value sets"@en  
obo:ENVO_03605012 "NMDC Water value sets"@en  
obo:ENVO_03605010 "NMDC environmental context value sets"@en  
obo:ENVO_03605013 "NMDC Soil broad scale value set"@en https://microbiomedata.github.io/submission-schema/EnvBroadScaleSoilEnum/
obo:ENVO_03605017 "NMDC Water broad scale value set"@en https://microbiomedata.github.io/submission-schema/EnvBroadScaleWaterEnum/
obo:ENVO_03605014 "NMDC Soil local scale value set"@en https://microbiomedata.github.io/submission-schema/EnvLocalScaleSoilEnum/
obo:ENVO_03605018 "NMDC Water local scale value set"@en https://microbiomedata.github.io/submission-schema/EnvLocalScaleWaterEnum/
obo:ENVO_03605015 "NMDC Soil environmental medium value set"@en https://microbiomedata.github.io/submission-schema/EnvMediumSoilEnum/
obo:ENVO_03605019 "NMDC Water environmental medium value set"@en https://microbiomedata.github.io/submission-schema/EnvMediumWaterEnum/
yq '.enums.[].name' src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml | egrep '^Env[BLM]' | sort

EnvBroadScaleSoilEnum
EnvLocalScaleSoilEnum
EnvMediumSoilEnum

@turbomam
Copy link
Member Author

turbomam commented Jan 22, 2025

I need a better understanding of when the ingest-triad aka ingest-triad target is made in submission-schema

 make ingest-triad

Processing all matching TSV files in notebooks/environmental_context_value_sets...
Processing notebooks/environmental_context_value_sets/plant_associated/env_broad_scale/post_google_sheets_plant_associated_env_broad_scale.tsv...
Processing notebooks/environmental_context_value_sets/plant_associated/env_medium/post_google_sheets_plant_associated_env_medium.tsv...
Processing notebooks/environmental_context_value_sets/soil/env_local_scale/post_google_sheets_soil_env_local_scale.tsv...
Processing notebooks/environmental_context_value_sets/soil/env_broad_scale/post_google_sheets_soil_env_broad_scale.tsv...
Processing notebooks/environmental_context_value_sets/soil/env_medium/post_google_sheets_soil_env_medium.tsv...
Processing notebooks/environmental_context_value_sets/water/env_local_scale/post_google_sheets_water_env_local_scale.tsv...
Processing notebooks/environmental_context_value_sets/water/env_broad_scale/post_google_sheets_water_env_broad_scale.tsv...
Processing notebooks/environmental_context_value_sets/water/env_medium/post_google_sheets_water_env_medium.tsv...

yq '.enums.[].name' src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml | egrep '^Env[BLM]' | sort

EnvBroadScalePlantAssociatedEnum
EnvBroadScaleSoilEnum
EnvBroadScaleWaterEnum
EnvLocalScaleSoilEnum
EnvLocalScaleWaterEnum
EnvMediumPlantAssociatedEnum
EnvMediumSoilEnum
EnvMediumWaterEnum

@turbomam
Copy link
Member Author

turbomam commented Jan 22, 2025

rm -rf notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv ; \
    make notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv
cut -f3 notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv | \
    sort | uniq -c
      1 AI http://www.geneontology.org/formats/oboInOwl#inSubset
     52 ENVO:03605013
     83 ENVO:03605014
     85 ENVO:03605015
     56 ENVO:03605017
     86 ENVO:03605018
     96 ENVO:03605019
      1 subset

@turbomam
Copy link
Member Author

implementation in submission-schema looks good now

I had set EnvO to use rdfs:comment instead of rdfs:seeAlso for the link between the PlantAssociated value sets and the submission-schema documentation pages. That's edited but not built or pushed yet.

@turbomam
Copy link
Member Author

turbomam commented Jan 22, 2025

submission-schema:

  • copy latest src/envo/envo.owl from EnvO into submission-schema's notebooks/environmental_context_value_sets/
  • execute any
make squeaky-clean all test ingest-triad env-triad-robot-all

src/nmdc_submission_schema/scripts/create_env_context_robot_template.py reports the classes in the submission-schema enumerations that are not defined in EnvO, and therefor shouldn't be added to the ROBOT template

[ERROR] Missing label for PO:0004518
[ERROR] Missing label for PO:0025356
[ERROR] Missing label for PO:0025355
[ERROR] Missing label for PO:0025623
[ERROR] Missing label for PO:0020103
[ERROR] Missing label for PO:0009046
[ERROR] Missing label for PO:0009001
[ERROR] Missing label for PO:0025034
[ERROR] Missing label for PO:0020038
[ERROR] Missing label for PO:0006001
[ERROR] Missing label for PO:0006109
[ERROR] Missing label for PO:0005052
[ERROR] Missing label for PO:0025626
[ERROR] Missing label for PO:0025281
[ERROR] Missing label for PO:0020031
[ERROR] Missing label for PO:0030078
[ERROR] Missing label for PO:0004542
[ERROR] Missing label for PO:0009005
[ERROR] Missing label for PO:0003023
[ERROR] Missing label for PO:0004513
[ERROR] Missing label for PO:0005848
[ERROR] Missing label for PO:0009010
[ERROR] Missing label for PO:0008037
[ERROR] Missing label for PO:0009047
[ERROR] Missing label for PO:0025522
[ERROR] Missing label for PO:0025417

cut -f3 notebooks/environmental_context_value_sets/nmdc_env_context_subset_membership.tsv | \
    sort | uniq -c
      1 AI http://www.geneontology.org/formats/oboInOwl#inSubset
     52 ENVO:03605013
     83 ENVO:03605014
     85 ENVO:03605015
     56 ENVO:03605017
     86 ENVO:03605018
     96 ENVO:03605019
     72 ENVO:03605022
      2 ENVO:03605024
      1 subset

@turbomam
Copy link
Member Author

turbomam commented Jan 22, 2025

EnvO:

  • copy submission-schema's nmdc_env_context_subset_membership.tsv into modules/
cd src/envo
./run.sh make -B nmdc-robot-all all

query that with something like

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ENVO: <http://purl.obolibrary.org/obo/ENVO_>
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>
select ?nmdc_subset ?nl (count(?envo_class) as ?envo_class_count)
where {
    ?envo_class oio:inSubset ?nmdc_subset .
    optional {
        ?envo_class rdfs:label ?el 
    }
    ?nmdc_subset rdfs:subPropertyOf* ENVO:03605010 .
    optional {
        ?nmdc_subset rdfs:label ?nl 
    }
}
group by ?nmdc_subset ?nl
order by ?nmdc_subset ?nl
?nmdc_subset ?nl ?envo_class_count
obo:ENVO_03605013 "NMDC Soil broad scale value set"@en "52"^^xsd:integer
obo:ENVO_03605014 "NMDC Soil local scale value set"@en "83"^^xsd:integer
obo:ENVO_03605015 "NMDC Soil environmental medium value set"@en "85"^^xsd:integer
obo:ENVO_03605017 "NMDC Water broad scale value set"@en "56"^^xsd:integer
obo:ENVO_03605018 "NMDC Water local scale value set"@en "86"^^xsd:integer
obo:ENVO_03605019 "NMDC Water environmental medium value set"@en "96"^^xsd:integer
obo:ENVO_03605022 "NMDC PlantAssociated broad scale value set"@en "72"^^xsd:integer
obo:ENVO_03605024 "NMDC PlantAssociated environmental medium value set"@en "2"^^xsd:integer

…nto 294-create-branch-with-content-of-prs-285-and-289-for-testing-value-setsenumerationssubsets-integration-with-envo
…iated-ones-that-are-ready-then-build-and-release' into 294-create-branch-with-content-of-prs-285-and-289-for-testing-value-setsenumerationssubsets-integration-with-envo
@turbomam turbomam closed this Jan 22, 2025
@turbomam turbomam deleted the 294-create-branch-with-content-of-prs-285-and-289-for-testing-value-setsenumerationssubsets-integration-with-envo branch January 22, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant