Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new class "Scaling" under processing. #8

Open
sagnik1511 opened this issue Jan 23, 2022 · 4 comments
Open

Add a new class "Scaling" under processing. #8

sagnik1511 opened this issue Jan 23, 2022 · 4 comments
Labels
enhancement New feature or request JWOC This issue/pull request will be considered for JWOC 2k22. medium Points will be: 3(1st Phase), 4(2nd Phase). 2-3 days will be allotted.

Comments

@sagnik1511
Copy link
Owner

sagnik1511 commented Jan 23, 2022

  1. Prepare a new class under the processing module.
  2. Prepare the functions with a proper idea and also add appropriate comments.
  3. Add a function "run" inside the "Scaling" which will go through every feature, e.g. link.
  4. Add the function under the class Preprocessing.

Follow contributing guidelines on README.md

@sagnik1511 sagnik1511 added enhancement New feature or request JWOC This issue/pull request will be considered for JWOC 2k22. medium Points will be: 3(1st Phase), 4(2nd Phase). 2-3 days will be allotted. labels Jan 23, 2022
@Tihsrah
Copy link
Contributor

Tihsrah commented Feb 1, 2022

I would like to suggest that we can scale the data directly by using Sklearn.preprocessing

scaler=MinMaxScaler()
x_train=scaler.fit_transform(x_train)
x_val=scaler.transform(x_val)

by adding these lines to the training.py file we can easily scale the data without needing to parse through each feature through a for loop which would be more time consuming and also can be a reason for many bugs.
we can also put conditions if the model is for regression and also ask user which scailing function they want and apply those to the x_train and x_val

If you agree to this idea then please assign this Issue to me.

@sagnik1511
Copy link
Owner Author

sagnik1511 commented Feb 4, 2022

@Tihsrah , it is not better to use minmaxscaling for continuous data , so in some cases it it better to just scale down or scale up.
So basically the function should be flexible for every single data column.

If you have a flexible idea about it, please drop the idea in the comments, if it shows clarity, I'll assign you.

@Tihsrah
Copy link
Contributor

Tihsrah commented Feb 11, 2022

What if we first do a "Robust scalar" over the data and then use "Standard Scaler"

@sagnik1511
Copy link
Owner Author

@Tihsrah, I would suggest you be flexible while making the class as I stated early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request JWOC This issue/pull request will be considered for JWOC 2k22. medium Points will be: 3(1st Phase), 4(2nd Phase). 2-3 days will be allotted.
Projects
None yet
Development

No branches or pull requests

2 participants