Changed writer and doc indices to optional in get_clusters_batch()
#202
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes to
get_clusters_batch()
If the user supplies writer and doc indices, the output data frame contains columns 'docname', 'writer', 'doc', and columns for cluster assignment and graph measurements. If the user does not supply writer and doc indices, the output data frame has the 'docname' column but not 'writer' or 'doc' columns. The cluster assignment and graph measurement columns are the same.
Changes to
get_cluster_fill_counts()
get_cluster_fill_counts()
now checks whether the 'writer' and 'doc' columns are present in the input data frame. If they are present, the function groups by them, along with 'docname' and 'cluster'. If they are not present, the function groups by 'docname' and 'cluster' alone.Tests
I created tests for
get_clusters_batch()
andget_cluster_fill_counts()
when writer and doc indices are not used.I also renamed the test files to match the corresponding R files to make finding things easier.