Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore ways to support patient-level longitudinal multivariate analysis #290

Open
karafecho opened this issue Oct 13, 2023 · 2 comments
Open
Assignees

Comments

@karafecho
Copy link
Contributor

karafecho commented Oct 13, 2023

Having implemented an approach to support cohort- and study period-level longitudinal multivariate analysis, i.e., by allowing users to select year (i.e., study period) as an input feature in a multivariate request, this issue is to suggest that we explore ways to support patient-level longitudinal multivariate analysis. The approach that I had originally conceived was to allow users to select PatientID (i.e., the dummy variable that links patients across years / study periods). This would allow users to retrieve a subset of the underlying deidentified integrated feature table. However, the approach is not computationally feasible, given the large patient sample sizes (e.g., roughly 160,000 total patients in asthma cohort).

One approach might be to put a cap on the cohort size for which users are allowed to include PatientID as an input feature in a multivariate request. To implement this, we could (1) return an error when users attempt to include PatientID as an input feature in a multivariate request AND request to do so for a cohort of size TBD and (2) update the documentation to reflect the limitation. While this approach seems relatively straightforward, it also seems rather arbitrary and statistically unsound.

Another approach might be to create a new multivariate endpoint, one that accepts the following user input: (1) a primary outcome / dependent variable, (2) a set of predictors / independent variables, (3) an optional factor(s) to control for repeated observations (e.g., PatientID, year), and (3) a desired multivariate model (e.g., GLM, conditional random forest). The model would then be applied to the data on the backend, and the endpoint would return model output. This approach may work, although (1) we would have to develop general-purpose models and (2) the run time may be slow, but that's a lesser concern, IMO.

@karafecho
Copy link
Contributor Author

Per discussion with Hong, 10.18.2023: Maybe include PatientID as input parameter, similar to year? PatientID=1 or PatientID=1-10.

@karafecho
Copy link
Contributor Author

Related to #286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants