Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add azure_remote_url and corresponding test. #4629

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

cgrass
Copy link
Contributor

@cgrass cgrass commented Dec 21, 2023

Why are the changes needed?

This PR adds an Azure implementation for RemoteURLInterface. This is needed to keep feature parity with existing (S3, GCP) implementations and their use via AdminServiceServer. However, I'm not sure pre-signed URLs generated by this interface are actively used or supported at the moment. Please review the test section below for my findings on existing behavior (S3) and the limitations with this implementation given fsspec behavior.

What changes were proposed in this pull request?

Stow was recently updated to include a full azure blob storage implementation. That includes a feature to return SAS tokens (presigned urls). I created a PR for flyte that updated the flyte-binary helm chart to use the new stow config, but found that the presignedUrl feature (for some workflows) was not implemented for azure.

How was this patch tested?

Unit Test: I added a new unit test to cover basic functionality.
E2E Testing: This is somewhat difficult because the signed-url output does not seem to be used in the Go codebase. I tested e2e by deploying my code and setting up a little pytest pasted below. I was then able to step through the flytekit code, confirming the token returned from the Flyte API was properly constructed and valid.

def TestSaS() -> str:
    remote = FlyteRemote(
        config=Config.for_endpoint(endpoint="127.0.0.1:8089", insecure=True),
        default_project="flyte-az",
        default_domain="development",
    )

    lp = remote.fetch_launch_plan(name="workflows.lesssimpleworkflow.less_simple_workflow")
    execution = remote.execute(lp, inputs={})
    while test3.is_done is False:
        execution = remote.fetch_execution(name=execution.id.name, project="flyte-az", domain="development")
        time.sleep(1)

    return "test complete"


if __name__ == "__main__":
    print(TestSaS())

This test confirms my code works but also discovered a couple bugs related to fsspec:

  1. Long URLs (beyond filesystem limits) break fsspec get operations. I am discussing options with the fsspec community here, but the current workaround would require a fix to flytekit and is not ideal. The bug only pops when all of the below are true
    a) remoteData is configured to set signedUrls:true
    b) the presigned URL is larger than n (file system limit) characters (common for S3)
    c) the payload is larger than maxSizeInBytes
    d) FlyteRemote is used to interact with the execution - I am not 100% confident in this, but it's the only use case I could find
  2. fsspec has unexpected behavior in some scenarios. calling get with a URL for rpath and a filename for lpath results in lpath being used as a path dir. e.g., if rpath is http://mydomain/output.pb and lpath is /var/output.pb a file gets written at /var/output.pb/output.pb.
    this causes an issue in the flytekit code because remote._get_output_literal_map() creates tmp_name with the filename ("output.pb"), and then uses tmp_name as the location for the resulting file. in reality, that path is a dir that contains the actual file.
    to demonstrate what works with fsspec i put together a hacky update to remote._get_output_literal_map() for my particular use case:
elif execution_data.outputs.bytes > 0:
            with self.remote_context() as ctx:
                tmp_name = os.path.join(ctx.file_access.local_sandbox_dir)
                file_name = execution_data.outputs.url.rsplit("/", 1)[1]
                ctx.file_access.get_data(execution_data.outputs.url, tmp_name)
                return literal_models.LiteralMap.from_flyte_idl(
                    utils.load_proto_from_file(literals_pb2.LiteralMap, tmp_name + "/" + file_name)
                )

Obviously the above is not a prod solution, but is meant to highlight the issue and gives some hints as to a proper fix.

Notes

  • There is a discussion in slack that explains a little bit of the history and context.
  • Presigned URLs are certainly used by flytekit during registration/run scenarios to PUT the script_mode.tar.gz file, but that calls a different API (service.CreateUploadLocation()).
  • The S3/GCP implementations use their corresponding clients directly. I'm not clear on why that is, given the presign logic is already implemented and used via the stow client. I chose to use the latter, so I did not need to reproduce the presign logic for azure.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly. (no doc update needed)
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Dec 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.71%. Comparing base (e72f102) to head (d0361a0).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4629       +/-   ##
===========================================
+ Coverage   36.99%   47.71%   +10.72%     
===========================================
  Files        1318      637      -681     
  Lines      132449    54251    -78198     
===========================================
- Hits        48994    25885    -23109     
+ Misses      79202    26042    -53160     
+ Partials     4253     2324     -1929     
Flag Coverage Δ
unittests-datacatalog ?
unittests-flyteadmin ?
unittests-flytecopilot ?
unittests-flytectl ?
unittests-flyteidl ?
unittests-flyteplugins 53.85% <ø> (ø)
unittests-flytepropeller 42.60% <ø> (ø)
unittests-flytestdlib 55.17% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cgrass
Copy link
Contributor Author

cgrass commented Jan 8, 2024

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (513c3e1) 58.10% compared to head (c0a0e1d) 59.02%.

❗ Current head c0a0e1d differs from pull request most recent head 8393e2c. Consider uploading reports for the commit 8393e2c to get more accurate results

Files Patch % Lines
...admin/pkg/data/implementations/azure_remote_url.go 43.47% 11 Missing and 2 partials ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

looks like the other implementations skip testing stow error responses, so i kept my unit test to a happy path

Signed-off-by: Chris Grass <chris.grass@mcg.com>
@cgrass cgrass marked this pull request as ready for review January 8, 2024 21:45
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Jan 8, 2024
@wild-endeavor
Copy link
Contributor

Signed-off-by: Chris Grass <chris.grass@mcg.com>
@davidmirror-ops
Copy link
Contributor

Is there any update here?
I'm working on a Flyte on Azure reference implementation and seems like without this change, it wouldn't be completely useful

@cgrass
Copy link
Contributor Author

cgrass commented Apr 15, 2024

@davidmirror-ops - this PR needs to be reviewed and merged if your reference impl needs to support presigned urls. that feature is not a hard requirement for flyte to work in azure but is required for feature parity with other cloud providers.

@davidmirror-ops
Copy link
Contributor

@cgrass Is there anything we can do to help you take this to the finish line? Thanks!

wild-endeavor
wild-endeavor previously approved these changes Dec 13, 2024
Chris Grass and others added 6 commits December 13, 2024 15:01
Signed-off-by: Chris Grass <chris.grass@mcg.com>
Signed-off-by: Flyte-Bot <admin@flyte.org>

Co-authored-by: flyte-bot <flyte-bot@users.noreply.github.com>
Signed-off-by: Chris Grass <chris.grass@mcg.com>
Signed-off-by: Flyte-Bot <admin@flyte.org>

Co-authored-by: evalsocket <evalsocket@users.noreply.github.com>
Signed-off-by: Chris Grass <chris.grass@mcg.com>
Signed-off-by: Chris Grass <chris.grass@mcg.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: Approved yet unmerged PRs
Development

Successfully merging this pull request may close these issues.

4 participants