-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Its taking more than 20h to sample the data #14
Comments
same issue here |
Same issue here. Its more than a day for 5 lakh data *32 |
same issue! |
Me too, it's extremely slow on relatively large datasets. A cuda implementation and/or n_jobs option would be great. |
I think I have a potential solution for this problem and this MIGHT work for you : My problem was using the default settings without specifying anything Here is my previous code that was extremely slow However the moment I started tinkering the parameters somehow it got 15 times faster, a code that used to take me 6 hours only took 30 minutes ! Here is how I changed my code, I hope similar tinkering will help you too. PS : in my project I made a special function to handle all missing data because I had special cases, so the drop_na_col and drop_na_row in these parameters are just for good measure. Apply SMOGN to balance the dataset
` |
Hi Nick,
I am seeing huge runtime for my input data which is of 28K * 59.
Its running for more than a day.
I have even standardized the input data
Any possible solution ?
dist_matrix: 5%|4 | 276/5671 [50:48<16:50:38, 11.24s/it]
The text was updated successfully, but these errors were encountered: