Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wqb_create_epa_ecotox() table creation #4

Open
aylapear opened this issue Feb 14, 2023 · 3 comments
Open

wqb_create_epa_ecotox() table creation #4

aylapear opened this issue Feb 14, 2023 · 3 comments

Comments

@aylapear
Copy link
Collaborator

Currently the tables are created by passing a list of the table names and the corresponding primary keys

db_tbl_core_structure <- function() {
  # names are table names, values are primary keys 
  list(
    "chemical_carriers" = "carrier_id",
    "dose_response_details" = "dose_resp_detail_id",
    "dose_response_links" = "dose_resp_link_id",
    "dose_responses" = "dose_resp_id",
    "doses" = "dose_id",
    "media_characteristics" = "result_id",
    "results" = "result_id",
    "tests" = "test_id"
  )
}

### there are about 40 reference/validation tables but not all being pk'ed
db_tbl_validation_structure <- function() {
  # names are table names, values are primary keys 
  list(
    "application_frequency_codes" = "code",
    "application_type_codes" = "code",
    "chemical_analysis_codes" = "code",
    "chemical_formulation_codes" = NULL,
    "chemical_grade_codes" = NULL,
    "chemicals" = "cas_number",
    "concentration_type_codes" = NULL,
    "concentration_unit_codes" = NULL,
    "control_type_codes" = NULL,
    "dose_stat_method_codes" = NULL,
    "duration_unit_codes" = NULL, 
    "effect_codes" = "code",
    "endpoint_assigned_codes" = NULL,
    "endpoint_codes" = NULL,
    "exposure_type_codes" = "code",
    "field_study_type_codes" = "code",
    "gender_codes" = "code",
    "geographic_codes" = NULL,
    "habitat_codes" = NULL,
    "ion_codes" = NULL,
    "lifestage_codes" = "code",
    "measurement_codes" = "code",
    "media_char_unit_codes" = NULL,
    "media_type_codes" = "code",
    "organic_matter_type_codes" = NULL,
    "organism_source_codes" = "code",
    "radio_label_codes" = NULL,
    "references" = "reference_number", # make sure tbl name isnt issue
    "response_site_codes" = "code",
    "sample_size_unit_codes" = NULL,
    "season_codes" = NULL,
    "species_synonyms" = NULL, 
    "species" = "species_number",
    "statistical_significance_codes" = NULL,
    "substrate_codes" = NULL,
    "test_location_codes" = "code",
    "test_method_codes" = "code",
    "test_type_codes" = "code",
    "trend_codes" = "code",
    "weight_unit_codes" = NULL
  )
}
@aylapear
Copy link
Collaborator Author

This can be improved upon so each validation table has a primary key. Though it is known that the species_synonms table doesn't have a valid primary key anymore as there is some type of duplication in the table.

Only a few tables have been looked at and a primary key was picked as they are the more used tables at this point. Could take the time to go through each table and determine primary keys for the rest.

@aylapear
Copy link
Collaborator Author

Only the primary keys are being set. Could expand the list into a table and set the datatype for each table, foreign keys and more but it would take extensive time to compile that data for it to be used in an automated capacity.

    if (!is.null(validation_data[[i]])) {
      query <- paste0(
        "CREATE TABLE [", names(validation_data[i]), 
        "] (", paste(colnames(dt), collapse = ", "), 
        ", PRIMARY KEY (", validation_data[[i]], "))"
      )
      DBI::dbExecute(con, query)
    }
    DBI::dbWriteTable(con, 
      names(validation_data[i]), value = dt, append = TRUE, row.names = FALSE
    )

Tables with a primary key don't have the data types for each column set but ones without a primary key do have the data types of each column set because of how the DBI::dbWriteTable function works.

Screenshot 2023-02-14 at 8 46 10 AM

@aylapear
Copy link
Collaborator Author

Since the database is being created in its entirety each time and not being added to this should be fine but could be a future improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant