-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if instantiated cohort conforms to cohort table form #36
Comments
@gowthamrao - would a function in CohortGenerator like this be acceptable:
The SQL would be something like: SELECT *
FROM @schema.@cohortTable
WHERE 1 = 0
; This way only the table structure is returned and not the full contents of the table. You could check it on any number of tables you are interested in via R. |
Thanks @anthonysena i think you bought up important validation of cohort table structure.
Those are important and they check the structure and data type of the cohort table. But I am most worried about the generated cohort. Is it conforming to cohort era rule as described here OHDSI/CohortAlgebra#2 So i am suggesting a third validation step to the generated cohort output: Now i dont know if CohortGenerator is appropriate for this function - because these are activities on a generated cohort. So may cohortAlgebra is the right place as it deals with generated cohorts, and does not generate cohorts itself. |
@gowthamrao - I think we're aligned on the cohort table structure part. I'm less clear on this part:
From my perspective, CohortGenerator is concerned with generating the cohort and putting the results into the cohort tables. Ensuring that the cohorts are validated is important but probably not a concern of the CohortGenerator package. I believe the Circe cohort definition editor in ATLAS has functionality to ensure a person does not have overlapping time in a cohort. So perhaps I'm missing when this could happen in a generated cohort? |
Well - remember, it is possible to write custom SQL and get a cohort table structure as output i.e. structure validation and type validation; but does the SQL ensure that required era'fy is present? If the cohort has any two rows for the same person that has overlapping cohort_dates - then it is NOT a cohort. CohortGenerator only requires a cohortDefinitionSet. As long as the cohortDefinitionSet has an SQL its fine with it. - correct? what if the output is not a cohort? |
You are correct - CohortGenerator only requires the SQL so this could pose a problem. Trying to validate that it produced a proper cohort could be tricky but if someone wanted to take a swing at implementing this type of check, I'd be happy to review. |
In OHDSI we define a cohort as a set of persons who satisfy certain criteria for duration of time (ie. cohort_start_date and cohort_end_date). A person cannot have overlapping dates within the same cohortId.
This is not checked, but is implicitly expected. If this is violated, it can cause headache.
How about adding a simple check. We can do it with a SQL like this
The text was updated successfully, but these errors were encountered: