-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csv format #138
base: master
Are you sure you want to change the base?
Csv format #138
Conversation
src/common.cu
Outdated
if (!isMainThread()){ | ||
return; | ||
} | ||
persistanceMode = csvName != ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather call everything outputMode
rather than persistanceMode
. Not sure "persistance" would be well understood.
Is there a strong reason to use hierarchical columns? That seems like it might be unique to PANDAS, it certainly isn't universal among relational data models, and my preference would be to keep the CSV as simple as possible. My proposal is to only have singular busbw and algbw columns naked of qualifications for in/out of place, and add an additinoal two-valued column to indicate in/out of place. This would result in there being different rows for the in-place and out of place runs. Example:
I think this fits the conceptual mapping of independent/dependent variables as columns better, since "in-place busbw" isn't a different measurement than "out-of-place busbw", the measurement is just "busbw" ran under two different setups which are described by the independent variables. |
Good point. I wanted to keep it as close to stdout table as possible and I remember working with hierarchical columns in R (I think that hierarchical columns are just general concept with tables and not related to a python library), the idea is that users can the reshuffle dataframe however they want it. If there's not going to be anyone else against this I will move to schema you propose and remove column with units. Just note that there's a lot of columns you didn't include in your example (parameters for the nccl-tests) and I'll keep them in - they were actually also a reason why I didn't look further as user has to normalize the table either way to put it in relational DB and then might as well keep it close to stdout table - but I agree your proposal is a good middle ground. |
Any change of this getting merged? Parsing logs to generate CSV is a bit cumbersome. This would be a very welcome change imo. |
Adding CSV format support as discussed in #120.
cc: @sjeaugey @jbachan I don't have permissions to add reviewers
Example CSV