-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241
Comments
The root cause is that the logic inside the bipartite case branch was treating nodes that are just hashable (but neither strings nor ints) as strings. The fix is to properly handle such nodes by maintaining a mapping from strings to corresponding nodes:
|
…that are neither strings nor ints
Hi, The main reason for the actual nodetype constraints is that cdlib accommodates several graph representations (networkx, graph_tool, igraph, ad-hoc ones...). Allowing generic objects as nodes makes it complicated to handle some cross-representation transformations (and could affect downstream visualization/evaluation tasks requiring printable node labels - we cannot assume that all objects will came with well defined and unique string reprs). Moreover, using complex objects as nodes usually generate a relevant memory footprint: if such information is not needed during computation (rarely the case in CD) it is indeed better to keep it separated from the graph topology (and maybe use it for post-process analysis). |
Thank you! I understand that cdlib needs to work with diverse graph representations, etc, and I agree that working with complex objects can have adverse memory impact, but real world applications using cdlib will often already deal with complex node objects. As a library, cdlib will be more "ergonomic" if it can consume such objects as is. The good thing is this should be relatively easy for cdlib to do, by taking advantage of the fact that node objects are hashable, by simply invoking
I already see similar code in cdlib:
which makes me suspect the approach I'm suggesting should be easy to apply. Let me know what you think! I'm actually willing to help with this as well. |
There is a similar issue when using other methods such as walktrap, when returning the NodeClustering object, we create a Clustering object which attempts to cast back whatever has been the node back from string into integer. Whenever non-integer nodes are used this causes It might be a good idea to not allow use of non-integer nodes if you do not support them since it requires a bit of looking into before you realize that this is the root cause. A suggestion for a solution might be to implement a dictionary that maps hashes of nodes to themselves and use the hashes instead of the actual nodes. |
Describe the bug
When the input graph for the method
cdlib.utils.convert_graph_formats()
is a networkx graph where the node type is neither astring
not anint
(but is hashable as networkx expects), if the graph is bipartite, this method fails when converting to igraph format with the following error when attempting to populate thegi.vs["type"]
value:To Reproduce
Steps to reproduce the behavior:
Here's a script (
test.py
) to repro the issue:Simply run with:
Expected behavior
I'd expect that call to convert_graph_formats() to work.
Screenshots
NA
Additional context
I also have a fix that I'm workin on! I'll produce a PR shortly.
The text was updated successfully, but these errors were encountered: