Skip to content

Commit

Permalink
Possible bug in starting_communities (#26)
Browse files Browse the repository at this point in the history
* This commit intends to address a possible bug one of our users is finding.

The user has timeseries networks and they want to use the previous timeslice's communities generated as a starting point for the next timeslice's community partitioning.

For whatever reason they're running into the one and only `panic!` macro literally in my code, which indicates we got a subnetwork with zero nodes in it. Since we carefully compress our `Clustering` object after each step I am sure that it isn't in the main leiden algorithm (or, rather, I'm **pretty** sure, but maybe past @daxpryce was the world's biggest jerk.)

Anyway, I **think** the problem, which I have been unable to replicate using synthetic erdos renyi graphs which I will commit into `graspologic` regardless of whether I can repro or not, is in this initial mediator mapping phase where I turn the provided `HashMap`/`Dict` into a `Clustering` object.  I tried to take too many short cuts, prematurely optimizing too much, and I get the sense there is an edge case I hadn't considered, and am not sure if I can repro.

Since this user's data is sensitive I'm doing the old tried and true method of making some changes and tossing a build over a wall and seeing if it sorts out their problems.

* This time with formatting applied

* No longer throw an error if we can't find a node that corresponds to the mapping provided

* cargo fmt fix again

* wow

* This changes the behavior a bit above and beyond a bug fix

* Updating our pydocs now that we no longer throw a InvalidCommunityMappingError
  • Loading branch information
daxpryce authored Nov 15, 2021
1 parent 0ab6bfa commit 0f42264
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 28 deletions.
6 changes: 3 additions & 3 deletions packages/pyo3/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
[package]
name = "graspologic_native"
version = "1.0.1"
version = "1.1.0"
authors = ["dwpryce@microsoft.com"]
edition = "2018"
license = "MIT"
description = "Python native companion module to the graspologic library"
readme = "README.md"

[package.metadata.maturin]
maintainer = "Dwayne Pryce"
maintainer-email = "dwpryce@microsoft.com"
maintainer = "Dax Pryce"
maintainer-email = "daxpryce@microsoft.com"
requires-python = ">=3.6,<3.11"
project-url = {"Github" = "https://github.com/microsoft/graspologic-native", "Graspologic"="https://github.com/microsoft/graspologic"}
classifier = ["Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Topic :: Scientific/Engineering :: Mathematics"]
Expand Down
6 changes: 0 additions & 6 deletions packages/pyo3/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,6 @@ impl PyObjectProtocol for HierarchicalCluster {
/// :rtype: Dict[str, int]
/// :raises ClusterIndexingError:
/// :raises EmptyNetworkError:
/// :raises InvalidCommunityMappingError: The starting communities dictionary either did not include
/// a community for each node in the edge list or the edge list did not contain a node present
/// in the starting community mapping.
/// :raises InternalNetworkIndexingError: An internal algorithm error. Please report with reproduction steps.
/// :raises ParameterRangeError: One of the parameters provided did not meet the requirements in the documentation.
/// :raises UnsafeInducementError: An internal algorithm error. Please report with reproduction steps.
Expand Down Expand Up @@ -197,9 +194,6 @@ fn leiden(
/// :rtype: List[HierarchicalCluster]
/// :raises ClusterIndexingError:
/// :raises EmptyNetworkError:
/// :raises InvalidCommunityMappingError: The starting communities dictionary either did not include
/// a community for each node in the edge list or the edge list did not contain a node present
/// in the starting community mapping.
/// :raises InternalNetworkIndexingError: An internal algorithm error. Please report with reproduction steps.
/// :raises ParameterRangeError: One of the parameters provided did not meet the requirements in the documentation.
/// :raises UnsafeInducementError: An internal algorithm error. Please report with reproduction steps.
Expand Down
45 changes: 26 additions & 19 deletions packages/pyo3/src/mediator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -188,29 +188,36 @@ fn communities_to_clustering(
network: &LabeledNetwork<String>,
communities: HashMap<String, usize>,
) -> Result<Clustering, PyLeidenError> {
// if node not found in internal network, bounce
// if max(communities) > len(set(communities)), collapse result
// if all nodes are mapped, cool
// if not all nodes are mapped, generate integers for each new value

let mut clustering: Clustering = Clustering::as_self_clusters(network.num_nodes());

let mut all_communities: HashSet<usize> = HashSet::new();
// best effort for using a provided community mapping and keeping generated singleton ids
// otherwise

let mut clustering: Clustering = match communities.values().max() {
Some(max_community_id) => {
// if we have values, we have the max community id, which means we'll actually create
// a cluster mapping where every node gets a singleton cluster starting after the actual
// last max cluster.
let mut cluster_mapping: Vec<usize> = Vec::with_capacity(network.num_nodes());
let start: usize = max_community_id + 1;
let end: usize = start + network.num_nodes();
cluster_mapping.extend(start..end);
Clustering::as_defined(cluster_mapping, end)
}
None => Clustering::as_self_clusters(network.num_nodes()),
};

for (node, community) in communities {
all_communities.insert(community);
let mapped_node: CompactNodeId = network
.compact_id_for(node)
.ok_or(PyLeidenError::InvalidCommunityMappingError)?;
clustering
.update_cluster_at(mapped_node, community)
.map_err(|_| PyLeidenError::InvalidCommunityMappingError)?;
let mapping: Option<CompactNodeId> = network.compact_id_for(node);
if mapping.is_some() {
let compact_node_id: CompactNodeId = mapping.unwrap();
clustering
.update_cluster_at(compact_node_id, community)
.map_err(|_| PyLeidenError::ClusterIndexingError)?;
}
}

if clustering.next_cluster_id() != all_communities.len() {
// we have some gaps, compress
clustering.remove_empty_clusters();
}
// we can't easily know if there are empty clusters, so we'll just presume there are
// and compress any gaps
clustering.remove_empty_clusters();

return Ok(clustering);
}

0 comments on commit 0f42264

Please sign in to comment.