You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a flow running in production on a 2min schedule for which artifacts / metadata are being stored. I am trying to interact with those artifacts with the client API. However, requests to /flows/<flow_id>/runs (via instantiating Flow("FlowName")) are now failing with
Metadata request: (/flows/<flow_id>/runs) failed (code 500): {"message": "Internal server error"}
Request to other endpoints, like flows/<flow_id>/runs/<run_id>, are going through just fine. After taking a closer look on API Gateway, I generated the same request through the console there, and get
Execution failed due to configuration error: Integration response of reported length 28729085 is larger than allowed maximum of 10485760 bytes.
Tue May 31 22:49:46 UTC 2022 : Method completed with status: 500
Basically the response payload is larger than the non-configurable 10MB limit on API Gateway.
I can get around this by requesting individual runs directly, but it would be great to still be able to use the flow apis to interact with the child runs of the flow via the client. (Perhaps adding the ability to pass filtering params in the requests, like fetching the last n runs?). Also curious if this is something that others have run into when deploying flows to production where many runs are produced and stored, and if there are any workarounds or things I am missing.
The text was updated successfully, but these errors were encountered:
For others that have run across this issue, one workaround is to use the ui-backend interface (assuming you're using the default outerbounds terraform module), which has much richer server-side filtering features. The original cause of the issue above is that the Metaflow client does a bulk request of all runs from the metadata-service API which lacks filtering features, and then does the filtering client-side.
E.g. to get the latest run for a given flow with given tags:
response = requests.get(f"https://<api_gateway_hostname>/api/runs?_order=-ts_epoch&_limit=30&_group_limit=31&_tags=<tags>&flow_id=<flow_name>&_page=1").json()
if not response['data']:
print(f"No data")
run_id = response['data'][0]['run_id']
run = Run(f"{flow_name}/{run_id}")
Are there any thoughts on making the metadata API adopt the server-side filtering behavior of the UI backend API?
I have a flow running in production on a
2min
schedule for which artifacts / metadata are being stored. I am trying to interact with those artifacts with the client API. However, requests to/flows/<flow_id>/runs
(via instantiatingFlow("FlowName")
) are now failing withRequest to other endpoints, like
flows/<flow_id>/runs/<run_id>
, are going through just fine. After taking a closer look on API Gateway, I generated the same request through the console there, and getBasically the response payload is larger than the non-configurable 10MB limit on API Gateway.
I can get around this by requesting individual runs directly, but it would be great to still be able to use the
flow
apis to interact with the child runs of the flow via the client. (Perhaps adding the ability to pass filtering params in the requests, like fetching the lastn
runs?). Also curious if this is something that others have run into when deploying flows to production where many runs are produced and stored, and if there are any workarounds or things I am missing.The text was updated successfully, but these errors were encountered: