Identifies Non-standard concepts used in concept set expressions, compares source codes captured and domain changes among included concepts;
- Vocabulary (OHDSI standardized vocabularies) version the cohort were initially created on
- Vocabulary version you are going to migrate
- achilles_count_cc table (resultSchema) This table is generated on top of Achilles results, see how to generate it here: https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/ddl/achilles/achilles_result_concept_count.sql
2. Active Atlas intance with cohorts instantiated (you don't need to run them - just create/import cohorts in Atlas)
#install package
remotes::install_github("OHDSI/PhenotypeChangesInVocabUpdate")
library (dplyr)
library (openxlsx)
library (readr)
library (tibble)
library (PhenotypeChangesInVocabUpdate)
#set the BaseUrl of your Atlas instance
#baseUrl <- "https://yourSecureAtlas.ohdsi.org/"
# if security is enabled authorize use of the webapi
ROhdsiWebApi::authorizeWebApi(
baseUrl = baseUrl,
authMethod = "windows")
# specify cohorts you want to run the comparison for
# you can define the cohorts as vector:
cohorts <-c(1, 2, 3)
#specify excluded nodes ("Nodes" are the concepts put in concept set expression)
#it is a text string with nodes you want to exclude from the analysis, it's set to 0 by default
# for example now some CPT4 and HCPCS are mapped to Visit concepts and we didn't implement this in the ETL,
# so we don't want these in the analysis (note, the tool doesn't look at the actual CDM, but on the mappings in the vocabulary, predicting how the ETL will be done)
#this way, the excludedNodes are defined in this way:
excludedNodes <-"9202, 2514435,9203,2514436,2514437,2514434,2514433,9201" # visit concepts
#you can restrict the output by using specific source vocabularies (only those that exist in your data as source concepts and thus plays role in event capture), if variable isn't defined, all vocabularies are included in the analysis
# for example:
includedSourceVocabs <- "'ICD10', 'ICD10CM', 'CPT4', 'HCPCS', 'NDC', 'ICD9CM', 'ICD9Proc', 'ICD10PCS', 'ICDO3', 'JMDC'"
#set connectionDetails,
#you can use keyring to store your credentials,
#see how to configure keyring: https://github.com/OHDSI/PhenotypeChangesInVocabUpdate/blob/modify_output/extras/KeyringSetup.R
# you can also define connectionDetails directly, see the DatabaseConnector documentation: https://ohdsi.github.io/DatabaseConnector/
connectionDetails = DatabaseConnector::createConnectionDetails(
dbms = keyring::key_get("YourDatabase", "dbms" ),
connectionString = keyring::key_get("YourDatabase", "connectionString"),
user = keyring::key_get("YourDatabase", "username"),
password = keyring::key_get("YourDatabase", "password" )
)
#specify working schemas
newVocabSchema <-'vocab_schema_n1' #schema containing a new vocabulary version
oldVocabSchema <-'vocab_schema_n0' #schema containing an older vocabulary version
resultSchema <-'achilles_results' #schema containing Achilles results
#create the dataframe with concept set expressions using the getNodeConcepts function
Concepts_in_cohortSet<-getNodeConcepts(cohorts, baseUrl)
#resolve concept sets, compare the outputs on different vocabulary versions, write results to the Excel file "PhenChange.xlsx" saved in a session root folder
resultToExcel(connectionDetails = connectionDetails,
Concepts_in_cohortSet = Concepts_in_cohortSet,
newVocabSchema = newVocabSchema,
oldVocabSchema = oldVocabSchema,
resultSchema = resultSchema,
excludedNodes = excludedVisitNodes,
includedSourceVocabs = includedSourceVocabs
)
#open the excel file
#Windows
shell.exec("PhenChange.xlsx")
#MacOS
#system(paste("open", "PhenChange.xlsx"))
Writes an Excel file with a separate tab for each type of comparison.
"Node concept" is a concept directly used in Concept Set Expression
"drc": descendant record count - total number of occurrences of descendants of a given concept
"source concept": related source concept_id. The concept set definition is usually done through standard concepts, but different clinical events might be captured with the same standard concepts if mapping was changed, that's why the tool tracks source concepts related.
“Action”: flags whether concept or hierarchy branch is added or removed
lists non-standard concepts used in the concept set definition.
Note, the concept set definition JSON isn't updated with the vocabulary update, so you will not see concept changes in Atlas.
This way you need to run this tool to see if concepts changed to non-standard.
- For example, the cohort_id 10729 has conceptset =’Malignancies that spread to liver’ which has Node concept = "4324190|History of malignant neoplasm of breast" with descendants included,
this concept is non-standard and mapped this way:
Maps to "1340204|History of event"
Maps to value "4112853|Malignant tumor of breast".
In this situation you'll get the output below, which gives you the target concepts you need to use to capture the same clinical events while using a new vocabulary version.
cohortid | 10729 |
cohortname | Malignant neoplasms |
conceptsetname | Malignancies that spread to liver |
conceptsetid | 15 |
isexcluded | 0 |
includedescendants | 1 |
nodeConceptId | 4324190 |
nodeConceptName | History of malignant neoplasm of breast |
drc | 20284048 |
mapsToConceptId | 1340204 |
mapsToConceptName | History of event |
mapsToValueConceptId | 4112853 |
mapsToValueConceptName | Malignant tumor of breast |
Tab shows related source concepts that were added or removed. Mapping in both vocabulary versions is shown.
This way the user knows why the difference in related source concepts occurs and might modify the concept set expression adding or removing mapped concepts.
- In the example below, events with ICD9CM “Neural hearing loss concept, unilateral” are now captured because of the mapping change. OLD_MAPPED_CONCEPT “Unilateral neural hearing loss” didn’t have the proper hierarchy, and wasn’t captured.
COHORTID | 12822 |
COHORTNAME | Nerve disorders |
CONCEPTSETNAME | Cranial nerve disorder |
CONCEPTSETID | 28 |
SOURCE_CONCEPT_ID | 44823107 |
RECORD_COUNT | 7115 |
ACTION | Added |
SOURCE_CONCEPT_NAME | Neural hearing loss, unilateral |
SOURCE_VOCABULARY_ID | ICD9CM |
SOURCE_CONCEPT_CODE | 389.13 |
OLD_MAPPED_CONCEPT_ID | 379831 |
OLD_MAPPED_CONCEPT_NAME | Unilateral neural hearing loss |
OLD_MAPPED_VOCABULARY_ID | SNOMED |
OLD_MAPPED_CONCEPT_CODE | 425601005 |
NEW_MAPPED_CONCEPT_ID | 381312 |
NEW_MAPPED_CONCEPT_NAME | Neural hearing loss |
NEW_MAPPED_VOCABULARY_ID | SNOMED |
NEW_MAPPED_CONCEPT_CODE | 73371001 |
This tab shows included concepts that changed their domain, so the different event table should be used. To show how these concepts are connected to actual events, source codes with their record counts are shown
cohortid | 123 |
cohortname | Altered mental status |
conceptsetname | Altered mental status |
conceptsetid | 1 |
conceptId | 436222 |
conceptName | Altered mental status |
vocabularyId | SNOMED |
sourceconceptCode | R41.82 |
sourceconceptname | Altered mental status, unspecified |
sourceVocabularyId | ICD10CM |
oldDomainId | Condition |
newDomainId | Observation |
sourceConceptRecordCount | 88528142 |
After you created updated versions of cohorts in Atlas, to check if the new cohort on a new vocabulary is resolved in the same way as the old one on the old vocabulary you need to run the "compareCohorts" funcion, it will give result similar to the "PhenChange.xlsx", but two cohorts on two vocabulary versions will be compared