Facing Issue while constructing the KG. #3

bphariharan · 2024-12-24T12:20:53Z

Hello Team,

We've gone through the repository source code and we really liked it. We tried executing your construct_kg.ipynb but, we encountered an issue.

ValueError: You are trying to merge on float64 and object columns for key 'x_id'. If you wish to proceed you should use pd.concat

At this line of construct_kg.ipynb:-
edge_df = new_df.merge(new_node_df.loc[new_node_df['node_source']=='PHECODE'], left_on='x_id', right_on='node_id', how='left')

We're also trying to new_homo_hg_hms.pt from the Harvard Dataverse provided here. Which gave us the error UnpicklingError: invalid load key, '?'. It would be great if you could help us possible solution and/or workaround for this matter so, that we can continue to use it.
I tried running the .pt file using
torch.load(<file_path>) and with open("C:/Users/admin/Downloads/new_homo_hg_hms.pt", 'rb') as file: data = pickle.Unpickler(file,fix_imports=True, encoding='ASCII', errors='strict', buffers=None,).load()

I'm also attaching the necessary screenshots for your reference. Do let me know if you need any other information.

construct_kg.ipynb Error.
Unpickling Error:

Thanks & Regards,

The text was updated successfully, but these errors were encountered:

bphariharan · 2025-01-02T07:34:14Z

Hey @ruthjohnson95,
Hope you're doing well.
I would like to know if there's way by which you can help us with above created issue.

Thanks & Regards,
Hariharan

ruthjohnson95 · 2025-01-02T19:02:18Z

Hi @bphariharan, thank you for bringing this to our attention. It's on the top of our list of items once the holiday break is over. I appreciate your patience!

ruthjohnson95 · 2025-01-02T19:03:34Z

accidentally closed this-- reopening

ruthjohnson95 · 2025-01-07T19:03:42Z

For the unpickling error, the file is a DGL graph object so it needs to be read using dgl.load_graphs(...). An example below:

homo_hg = dgl.load_graphs("/n/home01/ruthjohnson/kg_paper/construct_kg/phekg/new_homo_hg_hms.pt")
homo_hg = homo_hg[0][0]

Also see https://docs.dgl.ai/en/1.1.x/generated/dgl.graph.html

ruthjohnson95 · 2025-01-07T19:28:54Z

For the first error, it looks like the new_df automatically casts the x_id columns to floats. We can just cast this to string type and that should address the error.

new_df['x_id'] = new_df['x_id'].astype(str)

Let us know if this resolves the errors. We are glad you are enjoying the software.

bphariharan · 2025-01-08T05:42:34Z

For the first error, it looks like the new_df automatically casts the x_id columns to floats. We can just cast this to string type and that should address the error.

new_df['x_id'] = new_df['x_id'].astype(str)

Let us know if this resolves the errors. We are glad you are enjoying the software.

Hi @ruthjohnson95,
Thanks for the reply.

The unpickling error has been resolved when loading the graph using dgl.load_graphs() method. But, upon trying the resolution for 1st error (type mismatch error) I came across another issue.

Apart from this I have a couple of more queries:-

What tool/software you have used to visualize the graph? I've been trying different tools to visualize the graph using tools such as networkx, gephi, Plotly and Mathplotlib etc. but, because, the graph is huge is size (with close 67k nodes and 1.3 Million edges) it breaks after few hours of execution. Now, I feel this is necessary for me to validate the neighboring nodes I retrieved by providing the index of the node.
While looking into the execution of construct_kg.ipynb I also found that there's a mismatch of CSV files version used in the code and the given in the repository. I understand why the terminologies are provided with _filtered name but, for other CSVs especially the phecode_definitions1.1.csv I'm not quite sure as to why the older version is provided in the repository.
In the new_node_map_df.tab provided here what is the understanding of node_id column?
Lastly, would it be possible for you to tell us the embedding technique used as it will us to perform queries on the graph.

Thanks & Regards,
Hariharan B P

bphariharan · 2025-01-10T13:05:26Z

Hey @ruthjohnson95 ,

Please let me know if you've any updates for us.

ruthjohnson95 · 2025-01-10T16:21:37Z

Hi @bphariharan,

Can you provide a screenshot of the dataframe after you do new_df['x_id'] = new_df['x_id'].astype(str)?

Because the graph is so large, I haven't found a software to visualize the network. To inspect the neighboring nodes, you can get the neighboring nodes by retreiving the edges in dgl.
The 1.1 version is used to maintain continuity since we originally constructed the kg using this version last year. Also see PheMAP files not exist #1
node_id refers to the clinical code (often numeric) that corresponds to each clinical vocabulary term. For example, the node id for the ICD9 code "Diabetes without complications" is 250.0
We've provided all methodological details in the preprint here: https://www.medrxiv.org/content/10.1101/2024.12.03.24318322v2. Please feel free to reach out with any specific questions you might have.

bphariharan · 2025-01-13T05:33:31Z

Hi @ruthjohnson95 ,

Thank you for the reply.

Sure, I will provide you the screenshot of edge_df, new_df and new_node_df :-

edge_df
new_df
new_node_df

node_id refers to the clinical code (often numeric) that corresponds to each clinical vocabulary term. For example, the node id for the ICD9 code "Diabetes without complications" is 250.0

While going through the the CSV provided I found that there are some exception cases where the CPT Codes are more than 5 digits. Assuming, that these are the list of nodes, is there any reason as to why these exceptions?

Thanks & Regards,
Hariharan B P

ruthjohnson95 closed this as completed Jan 2, 2025

ruthjohnson95 reopened this Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing Issue while constructing the KG. #3

Facing Issue while constructing the KG. #3

bphariharan commented Dec 24, 2024 •

edited

Loading

bphariharan commented Jan 2, 2025

ruthjohnson95 commented Jan 2, 2025

ruthjohnson95 commented Jan 2, 2025

ruthjohnson95 commented Jan 7, 2025

ruthjohnson95 commented Jan 7, 2025

bphariharan commented Jan 8, 2025 •

edited

Loading

bphariharan commented Jan 10, 2025

ruthjohnson95 commented Jan 10, 2025

bphariharan commented Jan 13, 2025

Facing Issue while constructing the KG. #3

Facing Issue while constructing the KG. #3

Comments

bphariharan commented Dec 24, 2024 • edited Loading

bphariharan commented Jan 2, 2025

ruthjohnson95 commented Jan 2, 2025

ruthjohnson95 commented Jan 2, 2025

ruthjohnson95 commented Jan 7, 2025

ruthjohnson95 commented Jan 7, 2025

bphariharan commented Jan 8, 2025 • edited Loading

bphariharan commented Jan 10, 2025

ruthjohnson95 commented Jan 10, 2025

bphariharan commented Jan 13, 2025

bphariharan commented Dec 24, 2024 •

edited

Loading

bphariharan commented Jan 8, 2025 •

edited

Loading