You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having implemented an approach to support cohort- and study period-level longitudinal multivariate analysis, i.e., by allowing users to select year (i.e., study period) as an input feature in a multivariate request, this issue is to suggest that we explore ways to support patient-level longitudinal multivariate analysis. The approach that I had originally conceived was to allow users to select PatientID (i.e., the dummy variable that links patients across years / study periods). This would allow users to retrieve a subset of the underlying deidentified integrated feature table. However, the approach is not computationally feasible, given the large patient sample sizes (e.g., roughly 160,000 total patients in asthma cohort).
One approach might be to put a cap on the cohort size for which users are allowed to include PatientID as an input feature in a multivariate request. To implement this, we could (1) return an error when users attempt to include PatientID as an input feature in a multivariate request AND request to do so for a cohort of size TBD and (2) update the documentation to reflect the limitation. While this approach seems relatively straightforward, it also seems rather arbitrary and statistically unsound.
Another approach might be to create a new multivariate endpoint, one that accepts the following user input: (1) a primary outcome / dependent variable, (2) a set of predictors / independent variables, (3) an optional factor(s) to control for repeated observations (e.g., PatientID, year), and (3) a desired multivariate model (e.g., GLM, conditional random forest). The model would then be applied to the data on the backend, and the endpoint would return model output. This approach may work, although (1) we would have to develop general-purpose models and (2) the run time may be slow, but that's a lesser concern, IMO.
The text was updated successfully, but these errors were encountered:
Having implemented an approach to support cohort- and study period-level longitudinal multivariate analysis, i.e., by allowing users to select
year
(i.e., study period) as an input feature in a multivariate request, this issue is to suggest that we explore ways to support patient-level longitudinal multivariate analysis. The approach that I had originally conceived was to allow users to selectPatientID
(i.e., the dummy variable that links patients across years / study periods). This would allow users to retrieve a subset of the underlying deidentified integrated feature table. However, the approach is not computationally feasible, given the large patient sample sizes (e.g., roughly 160,000 total patients in asthma cohort).One approach might be to put a cap on the cohort size for which users are allowed to include
PatientID
as an input feature in a multivariate request. To implement this, we could (1) return an error when users attempt to includePatientID
as an input feature in a multivariate request AND request to do so for a cohort of size TBD and (2) update the documentation to reflect the limitation. While this approach seems relatively straightforward, it also seems rather arbitrary and statistically unsound.Another approach might be to create a new multivariate endpoint, one that accepts the following user input: (1) a primary outcome / dependent variable, (2) a set of predictors / independent variables, (3) an optional factor(s) to control for repeated observations (e.g., PatientID, year), and (3) a desired multivariate model (e.g., GLM, conditional random forest). The model would then be applied to the data on the backend, and the endpoint would return model output. This approach may work, although (1) we would have to develop general-purpose models and (2) the run time may be slow, but that's a lesser concern, IMO.
The text was updated successfully, but these errors were encountered: