-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TIMX 441 - support v2 transform command #314
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In an effort to make feature flag removal easier later, "v1" tests will be moved here during each phase of work. Once v2 is fully implemented and v1 no longer supported, we can remove this file entirely. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good approach! |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# ruff: noqa: FBT003 | ||
|
||
from lambdas import commands | ||
|
||
# NOTE: FEATURE FLAG: this file can be FULLY removed after v2 work is complete | ||
|
||
|
||
def test_generate_transform_commands_required_input_fields(run_id): | ||
input_data = { | ||
"next-step": "transform", | ||
"run-date": "2022-01-02T12:13:14Z", | ||
"run-type": "full", | ||
"source": "testsource", | ||
} | ||
extract_output_files = [ | ||
"testsource/testsource-2022-01-02-full-extracted-records-to-index.xml" | ||
] | ||
assert commands.generate_transform_commands( | ||
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket", run_id | ||
) == { | ||
"files-to-transform": [ | ||
{ | ||
"transform-command": [ | ||
"--input-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-full-extracted-records-to-index.xml", | ||
"--output-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-full-transformed-records-to-index.json", | ||
"--source=testsource", | ||
] | ||
} | ||
] | ||
} | ||
|
||
|
||
def test_generate_transform_commands_all_input_fields(run_id): | ||
input_data = { | ||
"next-step": "transform", | ||
"run-date": "2022-01-02T12:13:14Z", | ||
"run-type": "daily", | ||
"source": "testsource", | ||
} | ||
extract_output_files = [ | ||
"testsource/testsource-2022-01-02-daily-extracted-records-to-index_01.xml", | ||
"testsource/testsource-2022-01-02-daily-extracted-records-to-index_02.xml", | ||
"testsource/testsource-2022-01-02-daily-extracted-records-to-delete.xml", | ||
] | ||
assert commands.generate_transform_commands( | ||
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket", run_id | ||
) == { | ||
"files-to-transform": [ | ||
{ | ||
"transform-command": [ | ||
"--input-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-extracted-records-to-index_01.xml", | ||
"--output-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-transformed-records-to-index_01.json", | ||
"--source=testsource", | ||
] | ||
}, | ||
{ | ||
"transform-command": [ | ||
"--input-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-extracted-records-to-index_02.xml", | ||
"--output-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-transformed-records-to-index_02.json", | ||
"--source=testsource", | ||
] | ||
}, | ||
{ | ||
"transform-command": [ | ||
"--input-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-extracted-records-to-delete.xml", | ||
"--output-file=s3://test-timdex-bucket/testsource/" | ||
"testsource-2022-01-02-daily-transformed-records-to-delete.txt", | ||
"--source=testsource", | ||
] | ||
}, | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is subtle but important: this effectively sets
/dataset
as the TIMDEX parquet dataset.We may well want to explore setting this via an SSM param / env var at some point, but I'd propose to keep this hardcoded at the moment. We did similar things with
source
, where each application knew that thesource
at hand was the folder to look for in S3. Now, it's kind of a simplication where everyone just knows that/dataset
is where the dataset is located.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to recap, this will result in the following file tree:
where UUID uniquely identifies a run per source?