Skip to content

Releases: flyteorg/flytekit

v1.13.15

09 Dec 20:09
272c9c5
Compare
Choose a tag to compare

What's Changed

  • Backport to v1.13 - Set map task metadata only for subnode by @eapolinario in #2993

Full Changelog: v1.13.14...v1.13.15

v1.14.0

06 Dec 23:33
0b4a60a
Compare
Choose a tag to compare

Flytekit 1.14.0 release notes

Important

flytekit 1.14.0 is not forward compatible with old flytekit if you are using dict, dataclass, or pydantic BaseModel.

Added

Introducing native support for Python dictionaries, dataclass, and Pydantic BaseModel (#2760)

Flyte uses JSON strings inside a Protobuf struct to serialize certain data structures, but it has posed some challenges. For example, if you tried to pass a Python dict with int elements, those were converted to float, the type supported by Protobuf’s struct. This led users to write boilerplate code or implement other suboptimal solutions to circumvent these limitations.
With this release, flytekit adopts by default the MessagePack format to serialize dataclass, Pydantic’s BaseModel, and Python’s dict.

Before:

@task
def t1() -> dict:
  ...
  return {"a": 1} # Protobuf Struct {"a": 1.0}

@task
def t2(d: dict):
  print(d["a"]) # this prints 1.0

After:

@task
def t1() -> dict: # Literal(scalar=Scalar(binary=Binary(value=b'msgpack_bytes', tag="msgpack")))
  ...
  return {"a": 1}  # Protobuf Binary value=b'\x81\xa1a\x01', produced by msgpack

@task
def t2(d: dict):
  print(d["a"]) # this prints 1

Warning: this change is backwards-compatible only. It means that tasks registered with flytekit>=1.14 can use results produced by older versions of flytekit but not vice-versa unless you set the FLYTE_USE_OLD_DC_FORMAT environment variable to true. If you try to reference flytekit>=1.14 tasks from flytekit<1.14 downstream tasks, you will get a TypeTransformerFailedError.

Considerations before upgrading

To experience the benefits of this change, you have to upgrade both flytekit and the flyte backend to 1.14. You can plan this in phases, starting with upgrading flyte to 1.14, which is compatible with older flytekit releases but won't let you leverage the new serialization format.

flytekit version flyte version Result
<1.14 <1.14 Old serialization format
<1.14 >=1.14 Compatible but old serialization format
>=1.14 <1.14 Compatible but old serialization format
>=1.14 >=1.14 NEW serialization format!

By introducing a new serialization format (which you will see in the Flyte console as msgpack), Flyte enables you to leverage robust data structures without writing glue code or sacrificing accuracy or reliability.

Notebooks support (#5907)

Now you can consume Flyte from a Jupyter Notebook (or any other Notebook) without recurring to any plugin. Using FlyteRemote, Flyte will automatically detect your requests coming from a Notebook environment and execute accordingly, giving Notebook’s users access to execution outputs, versioning, reproducibility, and all the infrastructure abstractions that Flyte provides.

Learn how it works in this blog by main contributor @mecoli1219.

Currently, @dynamic workflows are not supported in Notebooks. This is a planned enhancement as part of the improved eager mode, coming out early next year.

Flyte now leverages asyncio to speed up executions (#2829)

Both the type engine and the data persistence layer have been updated to support asynchronous, non-blocking I/O operations. These changes aim to improve the performance and scalability of I/O-bound operations. Examples include tasks that return large lists of FlyteFiles, which used to be serialized in batches but now benefit from better performance without any code changes.

Changed

Offloading of literals (#2872)

Flyte automates data movement between tasks using gRPC as the communication protocol. When users need to move large amounts of data or use MapTasks that produce a large literal collection output, they typically hit a limit in the payload size gRPC can handle, getting an error like the following:

[LIMIT_EXCEEDED] limit exceeded. 2.903926mb > 2mb

This has forced users to split up MapTasks, refactoring their workflows to offload outputs to a FlyteFile or FlyteDirectory rather than returning literal values directly, or bumping up the storage.limits.maxDownloadMBs parameter to arbitrary sizes, leading to inconvenient or hard-to-maintain solutions.

For example, before upgrading flytekit, a simple workflow like the following:

@task
def print_arrays(arr1: str) -> None:
    print(f"Array 1: {arr1}")

@task
def increase_size_of_arrays(n: int) -> str:
    arr1 = 'a' * n * 1024
    return arr1

# Workflow: Orchestrate the tasks
@fl.workflow
def simple_pipeline(n: int) -> int:
    arr1 = increase_size_of_arrays(n=n)
    print_arrays(arr1)
    return 2
if __name__ == "__main__":
    print(f"Running simple_pipeline() {simple_pipeline(n=11000)}")

Fails with the following message:

output is too large [11264029] bytes, max allowed [2097152] bytes

flytekit >=1.14 automatically offloads to blob storage any object larger than 10Mb (the gRPC limit) allowing you to manage larger data and achieve higher degrees of parallelism effortlessly while continuing to use literal values.
After upgrading to 1.14, the above example runs and the outputs are stored in the metadata bucket:

s3://my-s3-bucket/metadata/propeller/flytesnacks-development-af5xxxkcqzzmnjhv2n4r/n0/data/0/outputs.pb]

This feature is enabled by default. If you need to turn it off, set propeller.literalOffloadingConfigEnabled to false in your Helm values.

The role you use to authenticate to your infrastructure provider will need to have read access to the metadata bucket so flytekit can retrieve the offloaded literal.

This feature won’t work if you use Flyte from a Jupyter Notebook or with fast registration (pyflyte run) or launching executions from the console. This is a planned future enhancement.

Breaking

BatchSize is removed (#2857)

This change affects MapTasks that relied on the PickleTransformer and the BatchSize class to optimize the serial uploading of big lists.
It was removed because the feature was not widely used and the asynchronous handling of pickles, introduced in this release, reduces the need for batching.

ArrayNode is not experimental anymore (#2900)

Considering ArrayNode is the default MapTask since flytekit 1.12, the feature is no longer under flytekit.experimental.arraynode but it should be used as a base import like flytekit.arraynode

Full changelog

Read more

v1.13.14

22 Nov 22:16
01c51b9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.13.13...v1.13.14

v1.14.0b6

07 Nov 19:02
3475ddc
Compare
Choose a tag to compare
v1.14.0b6 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.14.0b5...v1.14.0b6

v1.13.13

05 Nov 22:33
61c066c
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.13.12...v1.13.13

v1.14.0b5

04 Nov 15:14
7c08d50
Compare
Choose a tag to compare
v1.14.0b5 Pre-release
Pre-release

What's Changed

  • Add support for ContainerTask in PERIAN agent + os-storage parameter by @otarabai in #2867
  • Restrict Python Version Mismatch between Pickled Object and Remote Envrionment by @Mecoli1219 in #2848
  • Default nb task resolver msg by @cosmicBboy in #2889
  • [BUG] Blob uri isn't converted to str when source path is used as uri by @JiangJiaWei1103 in #2881

New Contributors

Full Changelog: v1.14.0b4...v1.14.0b5

v1.13.12

31 Oct 00:34
9d52cd1
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.13.11...v1.13.12

v1.14.0b4

29 Oct 00:29
ff2d0da
Compare
Choose a tag to compare
v1.14.0b4 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.14.0b3...v1.14.0b4

v1.13.11

23 Oct 17:10
662bc66
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.13.9...v1.13.11

[Beta] v1.14.0b3

23 Oct 00:32
3fc51af
Compare
Choose a tag to compare
[Beta] v1.14.0b3 Pre-release
Pre-release

What's Changed

Full Changelog: v1.14.0b2...v1.14.0b3