Releases: flyteorg/flytekit
v1.13.15
What's Changed
- Backport to v1.13 - Set map task metadata only for subnode by @eapolinario in #2993
Full Changelog: v1.13.14...v1.13.15
v1.14.0
Flytekit 1.14.0 release notes
Important
flytekit 1.14.0 is not forward compatible with old flytekit if you are using dict, dataclass, or pydantic BaseModel.
Added
Introducing native support for Python dictionaries, dataclass, and Pydantic BaseModel (#2760)
Flyte uses JSON strings inside a Protobuf struct to serialize certain data structures, but it has posed some challenges. For example, if you tried to pass a Python dict with int elements, those were converted to float, the type supported by Protobuf’s struct. This led users to write boilerplate code or implement other suboptimal solutions to circumvent these limitations.
With this release, flytekit adopts by default the MessagePack format to serialize dataclass, Pydantic’s BaseModel, and Python’s dict.
Before:
@task
def t1() -> dict:
...
return {"a": 1} # Protobuf Struct {"a": 1.0}
@task
def t2(d: dict):
print(d["a"]) # this prints 1.0
After:
@task
def t1() -> dict: # Literal(scalar=Scalar(binary=Binary(value=b'msgpack_bytes', tag="msgpack")))
...
return {"a": 1} # Protobuf Binary value=b'\x81\xa1a\x01', produced by msgpack
@task
def t2(d: dict):
print(d["a"]) # this prints 1
Warning: this change is backwards-compatible only. It means that tasks registered with flytekit>=1.14
can use results produced by older versions of flytekit but not vice-versa unless you set the FLYTE_USE_OLD_DC_FORMAT
environment variable to true. If you try to reference flytekit>=1.14
tasks from flytekit<1.14
downstream tasks, you will get a TypeTransformerFailedError
.
Considerations before upgrading
To experience the benefits of this change, you have to upgrade both flytekit and the flyte backend to 1.14. You can plan this in phases, starting with upgrading flyte to 1.14
, which is compatible with older flytekit
releases but won't let you leverage the new serialization format.
flytekit version | flyte version | Result |
---|---|---|
<1.14 | <1.14 | Old serialization format |
<1.14 | >=1.14 | Compatible but old serialization format |
>=1.14 | <1.14 | Compatible but old serialization format |
>=1.14 | >=1.14 | NEW serialization format! |
By introducing a new serialization format (which you will see in the Flyte console as msgpack
), Flyte enables you to leverage robust data structures without writing glue code or sacrificing accuracy or reliability.
Notebooks support (#5907)
Now you can consume Flyte from a Jupyter Notebook (or any other Notebook) without recurring to any plugin. Using FlyteRemote, Flyte will automatically detect your requests coming from a Notebook environment and execute accordingly, giving Notebook’s users access to execution outputs, versioning, reproducibility, and all the infrastructure abstractions that Flyte provides.
Learn how it works in this blog by main contributor @mecoli1219.
Currently, @dynamic workflows are not supported in Notebooks. This is a planned enhancement as part of the improved eager mode, coming out early next year.
Flyte now leverages asyncio to speed up executions (#2829)
Both the type engine and the data persistence layer have been updated to support asynchronous, non-blocking I/O operations. These changes aim to improve the performance and scalability of I/O-bound operations. Examples include tasks that return large lists of FlyteFiles, which used to be serialized in batches but now benefit from better performance without any code changes.
Changed
Offloading of literals (#2872)
Flyte automates data movement between tasks using gRPC as the communication protocol. When users need to move large amounts of data or use MapTasks that produce a large literal collection output, they typically hit a limit in the payload size gRPC can handle, getting an error like the following:
[LIMIT_EXCEEDED] limit exceeded. 2.903926mb > 2mb
This has forced users to split up MapTasks, refactoring their workflows to offload outputs to a FlyteFile or FlyteDirectory rather than returning literal values directly, or bumping up the storage.limits.maxDownloadMBs
parameter to arbitrary sizes, leading to inconvenient or hard-to-maintain solutions.
For example, before upgrading flytekit, a simple workflow like the following:
@task
def print_arrays(arr1: str) -> None:
print(f"Array 1: {arr1}")
@task
def increase_size_of_arrays(n: int) -> str:
arr1 = 'a' * n * 1024
return arr1
# Workflow: Orchestrate the tasks
@fl.workflow
def simple_pipeline(n: int) -> int:
arr1 = increase_size_of_arrays(n=n)
print_arrays(arr1)
return 2
if __name__ == "__main__":
print(f"Running simple_pipeline() {simple_pipeline(n=11000)}")
Fails with the following message:
output is too large [11264029] bytes, max allowed [2097152] bytes
flytekit >=1.14
automatically offloads to blob storage any object larger than 10Mb (the gRPC limit) allowing you to manage larger data and achieve higher degrees of parallelism effortlessly while continuing to use literal values.
After upgrading to 1.14, the above example runs and the outputs are stored in the metadata bucket:
s3://my-s3-bucket/metadata/propeller/flytesnacks-development-af5xxxkcqzzmnjhv2n4r/n0/data/0/outputs.pb]
This feature is enabled by default. If you need to turn it off, set propeller.literalOffloadingConfigEnabled
to false
in your Helm values.
The role you use to authenticate to your infrastructure provider will need to have read access to the metadata bucket so flytekit can retrieve the offloaded literal.
This feature won’t work if you use Flyte from a Jupyter Notebook or with fast registration (pyflyte run) or launching executions from the console. This is a planned future enhancement.
Breaking
BatchSize is removed (#2857)
This change affects MapTasks that relied on the PickleTransformer
and the BatchSize
class to optimize the serial uploading of big lists.
It was removed because the feature was not widely used and the asynchronous handling of pickles, introduced in this release, reduces the need for batching.
ArrayNode is not experimental anymore (#2900)
Considering ArrayNode is the default MapTask since flytekit 1.12, the feature is no longer under flytekit.experimental.arraynode
but it should be used as a base import like flytekit.arraynode
Full changelog
- Fix array node map task for offloaded literal by @pmahindrakar-oss in #2772
- Support default label/annotation for the default launch plan creating from workflow definition by @Mecoli1219 in #2776
- [FlyteClient][FlyteDeck] Get Downloaded Artifact Signed URL via Data Proxy by @Future-Outlier in #2777
- Expose Options in Flytekit for Direct User Access by @Mecoli1219 in #2785
- Adds a simple async utilitiy that managers an async loop in another thread by @thomasjpfan in #2784
- Adds a random DOCSEARCH_API_KEY to get monodocs build to succeed by @thomasjpfan in #2787
- Binary IDL With MessagePack by @Future-Outlier in #2760
- Pickle remote task for Jupyter Notebook Environment by @Mecoli1219 in #2733
- Fix getting started link, remove extra parenthesis by @deepyaman in #2788
- update bigquery plugin reqs by @dansola in #2790
- Related to flyteorg/flyte#5805 [Flyte Deck] Extras has been added by @101rakibulhasan in #2786
- Async type engine by @wild-endeavor in #2752
- add support for mapping over remote launch plans by @pvditt in #2761
- Fixes boundary conditions for literal convertor by @kumare3 in #2596
- Fix assertion in test_type_engine_binary_idl by @thomasjpfan in #2801
- Add unit test for pickling by @wild-endeavor in #2805
- Update task.py by @RaghavMangla in #2791
- Instance generic empty case (#2802) by @wild-endeavor in #2807
- ensure a space is added if both args are set in ImageSpec by @blaketastic2 in #2806
- add links to register by @dansola in #2804
- Fix mypy errors caught in 1.11.2 by @eapolinario in #2808
- Fix dependabot alerts as of 2024-10-11 by @eapolinario in #2809
- Run active launchplan when available to launch, else run the latest one by @kumare3 in #2796
- Revise Pickle Remote Task for Jupyter Notebook Environment by @Mecoli1219 in https://gith...
v1.13.14
What's Changed
- Offload literals (#2872) by @eapolinario in #2950
Full Changelog: v1.13.13...v1.13.14
v1.14.0b6
What's Changed
- Remove
_fix_structured_dataset_type
to deprecate python 3.8 by @Future-Outlier in #2893 - Agent - missing type hint by @wild-endeavor in #2896
- Map/setup exec by @wild-endeavor in #2898
- Async/exists check should use async function by @wild-endeavor in #2901
- [Client][API] get control plane version by @Future-Outlier in #2874
- Remove array node map task from experimental by @eapolinario in #2900
- Make it easier to use commands with uv by @thomasjpfan in #2897
- Kill the vscode server itself when resume the task by @pingsutw in #2890
- Disable pytest live logs by @eapolinario in #2905
- [MSGPACK IDL] Gate feature by setting ENV by @Future-Outlier in #2894
- pod template inplace operations by @dansola in #2899
- Type Mismatching while Serializing Dataclass with Union by @mao3267 in #2859
- [Core feature] Flytekit should support
unsafe
mode for types by @Mecoli1219 in #2419 - Adds support for wait for execution with a configurable interval by @kumare3 in #2913
- [Housekeeping] stop support the python3.8 by @Terryhung in #2909
New Contributors
- @Terryhung made their first contribution in #2909
Full Changelog: v1.14.0b5...v1.14.0b6
v1.13.13
What's Changed
- Agents - missing type hint (#2896) by @wild-endeavor in #2902
- Map/setup exec (#2898) by @wild-endeavor in #2903
- Add top-level access to FlyteRemote, FlyteFile, and FlyteDirectory and convenience class methods for FlyteRemote (#2836) by @eapolinario in #2904
Full Changelog: v1.13.12...v1.13.13
v1.14.0b5
What's Changed
- Add support for ContainerTask in PERIAN agent + os-storage parameter by @otarabai in #2867
- Restrict Python Version Mismatch between Pickled Object and Remote Envrionment by @Mecoli1219 in #2848
- Default nb task resolver msg by @cosmicBboy in #2889
- [BUG]
Blob
uri
isn't converted tostr
when source path is used asuri
by @JiangJiaWei1103 in #2881
New Contributors
- @JiangJiaWei1103 made their first contribution in #2881
Full Changelog: v1.14.0b4...v1.14.0b5
v1.13.12
What's Changed
- fix enum type assertion with python versions less than 3.12 (#2873) by @eapolinario in #2880
Full Changelog: v1.13.11...v1.13.12
v1.14.0b4
What's Changed
- [TypeTransformer] Support frozen dataclasses by @Future-Outlier in #2823
- add class methods, unit tests for flytefile and flytedirectory by @granthamtaylor in #2852
- Remove pickle batching by @wild-endeavor in #2857
- Update comments in _make_dataclass_serializable by @mao3267 in #2856
- Added V5E tpu and slices to accelerators by @pryce-turner in #2838
- add
__hash__
method toFlyteFile
to fix bug during interactive mode by @granthamtaylor in #2853 - Updated jupyter interaction by @kumare3 in #2858
- [Docs] Flytekit README link not working in the File an Issue section by @400Ping in #2864
- Show traceback by default by @pingsutw in #2862
- Support Identifier in generate_console_url by @thomasjpfan in #2868
- Support overriding node metadata for array node by @pvditt in #2865
- Fix Jupyter Versioning by @Mecoli1219 in #2866
- improved output handling in notebooks by @kumare3 in #2869
- Restrict Eager Task for Interactive Mode by @Mecoli1219 in #2871
- Async/Batching of coroutines by @wild-endeavor in #2855
- fix enum type assertion with python versions less than 3.12 by @dansola in #2873
- Pydantic Transformer V2 by @Future-Outlier in #2792
New Contributors
Full Changelog: v1.14.0b3...v1.14.0b4
v1.13.11
What's Changed
- Backport 2845 v1.13 by @wild-endeavor in #2851
- No-op commit to trigger new version by @eapolinario in #2854
Full Changelog: v1.13.9...v1.13.11
[Beta] v1.14.0b3
What's Changed
- Supports importing modules in current path by @kumare3 in #2830
- Enable Resolve Attr Path for List or Dict of Promise by @Mecoli1219 in #2828
- Adds actual current working directory path by @thomasjpfan in #2832
- Catch mistake in structured dataset by @wild-endeavor in #2834
- Small change to clean up unit test. by @wild-endeavor in #2835
- Fix tree printing by @wild-endeavor in #2837
- handle case where error may not have args by @blaketastic2 in #2831
- Bump pyspark from 3.3.1 to 3.3.2 in /plugins/flytekit-greatexpectations by @dependabot in #2818
- Pull secrets from environment when running locally by @thomasjpfan in #2800
- Support executing launchplans from CLI by @kumare3 in #2839
- Add top-level access to FlyteRemote, FlyteFile, and FlyteDirectory and convenience class methods for FlyteRemote by @granthamtaylor in #2836
- Config for_endpoint doesn't respect config file by @wild-endeavor in #2843
- Union/enum handling by @wild-endeavor in #2845
- update docs for FlyteRemote by @granthamtaylor in #2847
- add great_tables renderer by @cosmicBboy in #2846
- Restrict Dynamic Workflow for Interactive Mode by @Mecoli1219 in #2849
- Async/data persistence by @wild-endeavor in #2829
Full Changelog: v1.14.0b2...v1.14.0b3