Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow count worfklow executions by different filters #18

Merged

Conversation

phishermaan
Copy link
Contributor

add unit tests and fake acceptance tests.

@coveralls
Copy link

coveralls commented Nov 15, 2016

Coverage Status

Coverage increased (+1.03%) to 89.384% when pulling ef7a7ca on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

Copy link

@movefast1 movefast1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall :) It'll be great if you can also make the naming & docstrings of functions easier for users to understand and choose from since we've got plenty of apis.

latest_start_date=latest_start_date,
)

def count_open_workflow_by_type(self, name, oldest_start_date=None, latest_start_date=None, version=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function takes in a time filter in addition to the type filter, maybe we could also reflect this behavior in its name to distinguish from count_open_workflow_by_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you mean like count_open_workflow_by_type_and_time() ? I think it is fine. Well I think the function name reflects the purpose or args we really need. Since time range is an optional argument, do you think it is OK to keep it and thus have a shorter name? I think the doc string may help?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, since they are optional params a clear docs string should suffice.

)


def _build_time_filter_dict(oldest_start_date=None, latest_start_date=None, oldest_close_date=None, latest_close_date=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably make it much simpler by providing separate apis when filtering by start time vs close time since they are mutually exclusive. In that case we won't need the validation or the exception .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think separate them may be more reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value for breaking these into separate APIs? And what's the value for providing a custom validation exception instead of utilizing the existing boto3 exceptions?

The goal of this library is to function as a thin wrapper around boto3 and providing the Amazon API in a pythonic manner. One way is to provide as close to a one-to-one mapping of the APIs provided by boto3, and to present them as pythonic as possible. This makes it easier for people to use this library as much of the documentation provided by Amazon also applies here. Adding custom validation increases cognitive load since not only is there validation by boto3, there is validation in this library too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think the your point on validation part is reasonable. boto client should warn us when calling it wrongly. I will remove the new exception class.
For separating APIs by start date & close date, I think it makes more sense because start_date and close_date are mutually exclusive. A date arg is required for these boto API calls. Separating them makes users easy to understand what date args is required, like a start date is always needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case that the diff is not updated, I have already stopped date validation in _build_time_filter_dict after talking with @movefast1 . Currently I only use date_tuple to get oldest_date and latest_date

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For separating APIs by start date & close date, I think it makes more sense because start_date and close_date are mutually exclusive. A date arg is required for these boto API calls. Separating them makes users easy to understand what date args is required, like a start date is always needed.

Multiplexing the API based on the inputs increases the number of functions this library exposes and complicates the library. Having lots of functions that perform the same operation make it harder for a developer to figure out which function to use, and harder for us to maintain since it increases the surface area of behavior division when we want to change the behavior of counting workflows. Again, since boto3 does validation, it'll complain if both start date and close dates are used at the same time. We could have one time input, and a flag specifying whether it's a start date, or a close date, and the rest of the filters as optional kwargs. So the function signature could look like:

def count_executions(self, latest_date, oldest_date, count_open=True, filter_by_start=True, workflow_id=None, tag=None, type=None)

Thoughts?

Copy link
Contributor Author

@phishermaan phishermaan Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it is possible to combine all the filter type in this count_executions. In that case my question is do I have to document the usage in great detail so that the user won't get confused when using count_executions? Or it is up to the user to learn by having exceptions? Like to know which filters can be used together while others cannot?

The reason that I separate count open workflow from count close workflow is their filter parameters don't really mirror each other(open workflow only take start date, while close workflow can take start & close date, and it can take a dedicated close_status filter). I could combine APIs in count_close_workflows to take one date paramter to reduce API numbers, while keep the API separated by filter type. How do you think of that?

Copy link
Contributor

@quantsini quantsini Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having high quality documentation I think is a good idea.

You bring up a good point that Amazon separates the API into count count_open_workflow_executions and count_closed_workflow_executions, though, I still think we should minimize having to multiplex these two calls into every permutation of arguments puissant to the arguments defined by the SWF API. Looking at one function and reading a single place for documentation seems easier than having to read each function's name, then reading each docstring for each function to determine what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think maybe we can reduce the API into 2, which is count_open_workflow, count_closed_workflow. Then with proper document it should be easy to understand.



def _build_tag_filter_dict(tag):
filter_dict = {'tag': tag}
Copy link

@movefast1 movefast1 Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to define this var since it's used in the next line immediately after the definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you are right.

@coveralls
Copy link

coveralls commented Dec 1, 2016

Coverage Status

Coverage increased (+0.8%) to 89.199% when pulling 344bdb2 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 1, 2016

Coverage Status

Coverage increased (+1.2%) to 89.539% when pulling 991fc96 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 1, 2016

Coverage Status

Coverage increased (+1.2%) to 89.539% when pulling 53609ba on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@@ -348,7 +362,7 @@ def _count_closed_workflow_executions(

def _build_time_filter_dict(oldest_start_date=None, latest_start_date=None, oldest_close_date=None, latest_close_date=None):
"""
Build time_filter_dict for count open/closed workflow executions.
Build time_filter_dict for calls to _count_closed_workflow_executions and _count_open_workflow_executions.
Return result is a dict: {time_filter_type : filter_dict}
filter_dict must contains key 'oldestDate'.
Copy link

@movefast1 movefast1 Dec 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we only removed the validations here but didn't break it into two separate functions? Did you just forget or it was intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I did separate the count by start_date and close_date. Take count_closed_workflow_by_id_and_start_time and count_closed_workflow_by_id_and_close_time by example. Since the function args make sure we only get start_date or close_date, that is why I remove the validation.

1.2.0 (2016-11-14)
++++++++++++++++++

* WorkflowClient now supports count open or closed workflows by time range and other filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more explicit on what "other filter" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I will be more specific on the change.

)


def _build_time_filter_dict(oldest_start_date=None, latest_start_date=None, oldest_close_date=None, latest_close_date=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value for breaking these into separate APIs? And what's the value for providing a custom validation exception instead of utilizing the existing boto3 exceptions?

The goal of this library is to function as a thin wrapper around boto3 and providing the Amazon API in a pythonic manner. One way is to provide as close to a one-to-one mapping of the APIs provided by boto3, and to present them as pythonic as possible. This makes it easier for people to use this library as much of the documentation provided by Amazon also applies here. Adding custom validation increases cognitive load since not only is there validation by boto3, there is validation in this library too.

@coveralls
Copy link

coveralls commented Dec 7, 2016

Coverage Status

Coverage increased (+1.3%) to 89.691% when pulling 8f1e029 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

:param workflow_filter_dict: dict. filter type as key and filter criteria as value. filter criteria is a dictionary
:return: total number of closed workflows that meet the filter criteria
"""
if not workflow_filter_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: explicitly check for is None. Also, better to inline these as kwargs, instead of having to instantiate a dictionary that will be passed in as kwargs anyway to the underlying count function.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage decreased (-0.6%) to 87.755% when pulling 0c5607e on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage decreased (-0.6%) to 87.755% when pulling 269a39d on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage decreased (-2.2%) to 86.163% when pulling c8a276d on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage decreased (-0.04%) to 88.323% when pulling 43810ca on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage increased (+0.4%) to 88.788% when pulling 6efafc0 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

elif close_status:
workflow_filter_dict = _build_close_status_filter_dict(close_status)
else:
workflow_filter_dict = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this returned an empty dict, the caller doesn't have to special case None.

@coveralls
Copy link

coveralls commented Dec 12, 2016

Coverage Status

Coverage increased (+0.4%) to 88.72% when pulling 6affa00 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

close_status=close_status,
)

if workflow_filter_dict is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing _build_time_filter_dict to only return dictionaries would make this is None check not necessary

"""
Count the number of open workflows for a domain. You can pass in filtering criteria.
The results are best effort and may not exactly reflect recent updates and changes.
http://docs.aws.amazon.com/amazonswf/latest/apireference/API_CountOpenWorkflowExecutions.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protip: our sphinx configuration is set up so that you can link to the external SWF docs. Example:
Passthrough to :meth:~SWF.Client.count_open_workflow_executions``

"""
Count the number of closed workflows for a domain. You can pass in filtering criteria.
The results are best effort and may not exactly reflect recent updates and changes.
http://docs.aws.amazon.com/amazonswf/latest/apireference/API_CountClosedWorkflowExecutions.html#SWF-CountClosedWorkflowExecutions-request-executionFilter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the comment on built in sphinx configuration

date_tuple = (oldest_close_date, latest_close_date)

if not date_tuple:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an empty dictionary would make it easier for callers since they wouldn't need to special case a None check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get that


def _build_type_filter_dict(name, version):
filter_dict = {'name': name}
if version:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear whether this should be checking for Noneness. Should '' (empty string) be a valid input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good catch. empty string is allowed.

if oldest_start_date:
filter_type = 'startTimeFilter'
date_tuple = (oldest_start_date, latest_start_date)
elif oldest_close_date:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly check for None. The default argument is None and I assume we want to change behavior based on its Noneness.

Build time_filter_dict for calls to _count_closed_workflow_executions and _count_open_workflow_executions.
Return result is a dict: {time_filter_type : filter_dict}
filter_dict must contains key 'oldestDate'.
return example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also include the input for this example? That would help people make the connection between the inputs and the example return. Bonus points for more examples showcasing the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

return None

oldest_date, latest_date = date_tuple
filter_dict = dict(oldestDate=oldest_date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's consistently use {} or dict(). There's a lot of resources online about which one is better. http://stackoverflow.com/questions/2972212/creating-an-empty-list-in-python is one example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

workflow_id=None,
close_status=None,
):
if workflow_name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these mutually exclusive? (for example, I wouldn't be able to search on the conjunction of both tag and close_status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstring of count open/closed workflows there is a description saying that these filters are mutually exclusive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return count_closed_workflow_within_time_range(**workflow_filter_dict)


def _build_time_filter_dict(oldest_start_date=None, latest_start_date=None, oldest_close_date=None, latest_close_date=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function would be smaller if it dumbly built the dictionary allowing for "invalid" configurations. We can rely on the boto3 library to raise an appropriate error.

This current implementation can have silent errors since the caller can provide startTimeFilter and closeTimeFilter and this library would assume startTimeFilter (when the caller may only meant to do closeTimeFilter). boto3 would otherwise error with invalid inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is what you are worried about, I will try building both startTimeFilter dict and closeTimeFilter and put them in the result. In this case if user passes in both start date and close date then boto3 will get them both. This will cause exceptions and prevent silent errors.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 88.668% when pulling ad6e450 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 13, 2016

Coverage Status

Coverage increased (+0.3%) to 88.668% when pulling 5fd5e76 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@@ -4,4 +4,4 @@

__author__ = 'Henri Bai'
__email__ = 'hbai@yelp.com'
__version__ = '1.2.0'
__version__ = '1.2.0-a1-abcde'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this?

latest_close_date=latest_close_date,
)

count_closed_workflow_within_time_range = partial(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may read better if we don't use the partial since we're not re-using the partial, and we're just passing in **workflow_filter_dict immediately after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can unpack workflow_filter_dict while having the time_filter_dict unpacked as key word dict. Or am I understanding it wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specifically combining dicts? You can do this with dict.update: https://docs.python.org/2/library/stdtypes.html#dict.update
Using partial for combining **kwargs seems overkill.

@@ -112,3 +113,71 @@ def test_walk_execution_history(workflow_client, decision_client):
decision_task = poll_decision_and_respond(decision_client)
terminate_workflow(workflow_client, workflow_id)
walk_execution_history(decision_client, workflow_id, decision_task.workflow_run_id)


def test_count_open_workflow_executions(workflow_client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests that assert validation errors are raised for invalid input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think the boto_client we are using is a mock thus cannot raise exceptions, since we want the validation to happen on boto_client only? The integration test has covered it though if you want to have a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line of code this comment is tagged is for the acceptence tests. This boto client is not a mock and actually calls out to SWF. Although our current travis ci integration doesn't run them properly due to #19

https://github.com/Yelp/pyswf/blob/master/tests/acceptance/conftest.py#L42 shows how the boto client is created in a pytest conftest under the acceptance folder.

@@ -54,3 +77,239 @@ def test_terminate_workflow(workflow_config, workflow_client, boto_client):
workflowId='workflow_id',
reason='reason',
)


def test_build_time_filter_dict(oldest_start_date, latest_start_date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're already unit testing with different inputs, let's break this up into individual tests.

assert time_filter_dict['closeTimeFilter'] == {'oldestDate': start_date, 'latestDate': end_date}


def test_count_open_workflow_executions(workflow_config, workflow_client, boto_client, oldest_start_date, latest_start_date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto breaking up the big test into smaller unit tests.

)


def test_count_closed_workflow_executions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto breaking up the big test into smaller tests.

_assert_boto_method_called_with(boto_client.count_open_workflow_executions, workflow_config.domain, **kwargs)


def _assert_count_closed_workflow_executions_called_with(boto_client, workflow_config, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these calls since the level of indirection isn't necessary and would make things more explicitly when asserting calls.

@coveralls
Copy link

coveralls commented Dec 13, 2016

Coverage Status

Coverage increased (+0.5%) to 88.855% when pulling c581995 on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 13, 2016

Coverage Status

Coverage increased (+0.5%) to 88.839% when pulling 8329e6c on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.


return self.boto_client.count_open_workflow_executions(
domain=self.workflow_client_config.domain,
startTimeFilter=start_time_filter_dict['startTimeFilter'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also mimic the count_closed_workflow_executions so that we combine the dicts before unpacking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since there is only one type of time filter, there is no benefits of packing them together. Passing them separately makes users be clear on what args they are expecting I think.

@coveralls
Copy link

coveralls commented Dec 13, 2016

Coverage Status

Coverage decreased (-0.4%) to 88.0% when pulling 3bf180f on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@coveralls
Copy link

coveralls commented Dec 14, 2016

Coverage Status

Coverage increased (+11.09%) to 99.448% when pulling 8b51a5e on phishermaan:allow-count-worfklow-executions-by-state into 296c47c on Yelp:master.

@quantsini
Copy link
Contributor

LGTM!!

@quantsini quantsini merged commit 1f073e5 into Yelp:master Dec 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants