Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed writer and doc indices to optional in get_clusters_batch() #202

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

stephaniereinders
Copy link
Member

Changes to get_clusters_batch()

If the user supplies writer and doc indices, the output data frame contains columns 'docname', 'writer', 'doc', and columns for cluster assignment and graph measurements. If the user does not supply writer and doc indices, the output data frame has the 'docname' column but not 'writer' or 'doc' columns. The cluster assignment and graph measurement columns are the same.

Changes to get_cluster_fill_counts()

get_cluster_fill_counts() now checks whether the 'writer' and 'doc' columns are present in the input data frame. If they are present, the function groups by them, along with 'docname' and 'cluster'. If they are not present, the function groups by 'docname' and 'cluster' alone.

Tests

I created tests for get_clusters_batch() and get_cluster_fill_counts() when writer and doc indices are not used.

I also renamed the test files to match the corresponding R files to make finding things easier.

## Changes to `get_clusters_batch()`
If the user supplies writer and doc indices, the output data frame contains columns 'docname', 'writer', 'doc', and columns for cluster assignment and graph measurements. If the user does not supply writer and doc indices, the output data frame has the 'docname' column but not 'writer' or 'doc' columns. The cluster assignment and graph measurement columns are the same.

## Changes to `get_cluster_fill_counts()`
`get_cluster_fill_counts()` now checks whether the 'writer' and 'doc' columns are present in the input data frame. If they are present, the function groups by them, along with 'docname' and 'cluster'. If they are not present, the function groups by 'docname' and 'cluster' alone.

## Tests
I created tests for `get_clusters_batch()` and `get_cluster_fill_counts()` when writer and doc indices are not used.

I also renamed the test files to match the corresponding R files to make finding things easier.
@stephaniereinders stephaniereinders merged commit e15d397 into master Nov 6, 2024
1 check passed
@stephaniereinders stephaniereinders deleted the 201-clusters-no-indices branch November 6, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change get_clusters_batch() to not require writer and doc indices
1 participant