Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: schedules to get bimonthly tests data from csv to datalake #480

Merged
merged 92 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
97d3344
add: schedules to get bimonthly tests data from csv to datalake
BrunodePauloAlmeida Aug 7, 2023
be2df6f
add: encoding parameter for csv generated by dump_url pipeline
BrunodePauloAlmeida Aug 9, 2023
e871702
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2023
c74c070
fix: encoding stop getting default value always
BrunodePauloAlmeida Aug 9, 2023
b45fcc0
add: on_bad_lines parameter to skip lines with errors
BrunodePauloAlmeida Aug 9, 2023
578c758
fix: merge
BrunodePauloAlmeida Aug 9, 2023
7f3007f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2023
c6ab828
add: on_bad_lines parameter to flow config
BrunodePauloAlmeida Aug 9, 2023
8d5eb1f
Merge branch 'staging/dump_educ_basica_prova_bim' of https://github.c…
BrunodePauloAlmeida Aug 9, 2023
051711e
just to trigger actions
BrunodePauloAlmeida Aug 9, 2023
b8b5b73
add: sep parameter to use in pandas read_csv
BrunodePauloAlmeida Aug 9, 2023
05aa1a4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2023
eadf01f
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 10, 2023
c9317ce
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 14, 2023
38cddef
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 16, 2023
50a7803
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 16, 2023
e0d9c71
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 16, 2023
46e4711
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 18, 2023
3886238
change schedule
BrunodePauloAlmeida Aug 25, 2023
f1a320f
Merge branch 'staging/dump_educ_basica_prova_bim' of https://github.c…
BrunodePauloAlmeida Aug 25, 2023
9613ce3
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 25, 2023
9f6757c
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Aug 31, 2023
3041e97
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Sep 4, 2023
0eaa36d
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Sep 18, 2023
d726001
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Sep 19, 2023
266be8b
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 23, 2023
3b3cbe7
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 23, 2023
d8112ca
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 23, 2023
f242274
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 24, 2023
c06171e
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 24, 2023
9a684be
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 24, 2023
3132410
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 26, 2023
0042593
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 27, 2023
5c71315
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Oct 27, 2023
d66eaba
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 13, 2023
7b85d99
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 13, 2023
2d8e22a
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 13, 2023
3ff3f38
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 13, 2023
dfcc2a2
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 13, 2023
f89bc2e
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 14, 2023
a048c4d
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 14, 2023
12b03f1
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 15, 2023
2688375
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 15, 2023
95c2fa4
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 15, 2023
0b79ed3
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 19, 2023
57ae74e
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 19, 2023
6ec5b82
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Nov 24, 2023
536515f
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 11, 2023
5c58936
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 12, 2023
2907e18
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 12, 2023
5ae6a55
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 12, 2023
46aff64
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 13, 2023
f333ba8
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 14, 2023
042a12b
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 14, 2023
3f0f007
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 18, 2023
d36a6fc
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 18, 2023
967576a
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 18, 2023
c102018
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 18, 2023
7ea2743
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Dec 21, 2023
dcca660
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 2, 2024
0e41645
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 8, 2024
baa7336
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 14, 2024
20e43f7
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 15, 2024
4f55654
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 15, 2024
548c5c1
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 16, 2024
7434a6d
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 16, 2024
4f25132
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 18, 2024
0f9d37d
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 19, 2024
5a160bf
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 22, 2024
7391e5c
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 23, 2024
b67073a
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 24, 2024
eb42ab8
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 25, 2024
df7c279
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 25, 2024
8ca8c1c
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Jan 26, 2024
6a7cbd7
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 5, 2024
64d2f84
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 6, 2024
5369694
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 6, 2024
654e232
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 7, 2024
98fb9c3
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 7, 2024
9fab4ed
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 7, 2024
7232594
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 8, 2024
9254b3f
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 8, 2024
cde92bf
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 26, 2024
86a7e20
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 26, 2024
36876ec
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 28, 2024
9cfbdc2
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Feb 28, 2024
247bdef
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Mar 1, 2024
217d430
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Mar 1, 2024
361e2f3
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Mar 4, 2024
c807568
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Mar 4, 2024
5363a35
Merge branch 'master' into staging/dump_educ_basica_prova_bim
mergify[bot] Mar 15, 2024
2871828
fix: schedule to trigger github actions
BrunodePauloAlmeida Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions pipelines/rj_sme/dump_url_educacao_basica/flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@
],
)

sme_gsheets_default_parameters = {
"dataset_id": "educacao_basica_alocacao",
}
sme_gsheets_flow = set_default_parameters(
sme_gsheets_flow, default_parameters=sme_gsheets_default_parameters
)
# sme_gsheets_default_parameters = {
# "dataset_id": "educacao_basica_alocacao",
# }
# sme_gsheets_flow = set_default_parameters(
# sme_gsheets_flow, default_parameters=sme_gsheets_default_parameters
# )

sme_gsheets_flow.schedule = gsheets_year_update_schedule
85 changes: 83 additions & 2 deletions pipelines/rj_sme/dump_url_educacao_basica/schedules.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,93 @@
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_alocacao",
}
},
"bimestral_2023": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1bC-I6mT9SdRVDDL583WpeK8WOJMuIhfz/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2022": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/19PFXJKvaOrbexnt_jA4otE-LnMfHUH0H/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
"encoding": "latin-1",
"on_bad_lines": "skip",
"separator": ";",
},
"bimestral_2021": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1k-taU8bMEYJ2U5EHvrNWQZnzN2Ht3uso/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
"encoding": "latin-1",
},
"bimestral_2019": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1Q_drlgajGOpSsNlqw1cV2pRJ30Oh47MJ/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2018": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1b7wyFsX6T4W6U_VWIjPmJZ4HI9btaLah/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2017": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1kclQeNuzDCy0Npny1ZZLPjqiPMScw_1P/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2016": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1QH9VsphqPvFwUfE7FgQYI6YJ4TJFTptv/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2015": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1VKDnvgOzrEdT5LkNYBDE_ayVvKsj5jR0/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2014": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/18pJonyKwV210dpXr_B2M0p708jYYGwKz/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2013": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1rSi-UgB3qZDLh8U3geKRkMgSdmxddO5v/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
"bimestral_2012": {
"dump_mode": "overwrite",
"url": "https://drive.google.com/file/d/1scfnos9iER86QVMx7Y_qPM1SKVv0MUED/view?usp=drive_link",
"url_type": "google_drive",
"materialize_after_dump": True,
"dataset_id": "educacao_basica_avaliacao",
},
}

gsheets_clocks = generate_dump_url_schedules(
interval=timedelta(days=365),
start_date=datetime(2022, 11, 4, 20, 0, tzinfo=pytz.timezone("America/Sao_Paulo")),
start_date=datetime(2024, 3, 22, 12, 0, tzinfo=pytz.timezone("America/Sao_Paulo")),
labels=[
constants.RJ_SME_AGENT_LABEL.value,
],
Expand Down
6 changes: 6 additions & 0 deletions pipelines/utils/dump_url/flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@

# Table parameters
partition_columns = Parameter("partition_columns", required=False, default="")
encoding = Parameter("encoding", required=False, default="utf-8")
on_bad_lines = Parameter("on_bad_lines", required=False, default="error")
separator = Parameter("separator", required=False, default=",")

# Materialization parameters
materialize_after_dump = Parameter(
Expand Down Expand Up @@ -119,6 +122,9 @@
save_path=DUMP_DATA_PATH,
build_json_dataframe=build_json_dataframe,
dataframe_key_column=dataframe_key_column,
encoding=encoding,
on_bad_lines=on_bad_lines,
separator=separator,
)
DUMP_CHUNKS_TASK.set_upstream(DOWNLOAD_URL_TASK)

Expand Down
15 changes: 13 additions & 2 deletions pipelines/utils/dump_url/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,12 +151,23 @@ def dump_files(
chunksize: int = 10**6,
build_json_dataframe: bool = False,
dataframe_key_column: str = None,
encoding: str = "utf-8",
on_bad_lines: str = "error",
separator: str = ",",
) -> None:
"""
Dump files according to chunk size
Dump files according to chunk size and read mode
"""
event_id = datetime.now().strftime("%Y%m%d-%H%M%S")
for idx, chunk in enumerate(pd.read_csv(Path(file_path), chunksize=chunksize)):
for idx, chunk in enumerate(
pd.read_csv(
Path(file_path),
chunksize=chunksize,
encoding=encoding,
on_bad_lines=on_bad_lines,
sep=separator,
)
):
log(f"Dumping batch {idx} with size {chunksize}")
handle_dataframe_chunk(
dataframe=chunk,
Expand Down
6 changes: 6 additions & 0 deletions pipelines/utils/dump_url/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,12 @@ def generate_dump_url_schedules( # pylint: disable=too-many-arguments,too-many-
parameter_defaults["materialize_to_datario"] = parameters[
"materialize_to_datario"
]
if "encoding" in parameters:
parameter_defaults["encoding"] = parameters["encoding"]
if "on_bad_lines" in parameters:
parameter_defaults["on_bad_lines"] = parameters["on_bad_lines"]
if "separator" in parameters:
parameter_defaults["separator"] = parameters["separator"]
# if "dbt_model_secret_parameters" in parameters:
# parameter_defaults["dbt_model_secret_parameters"] = parameters[
# "dbt_model_secret_parameters"
Expand Down
Loading