[MRG] Add the SMOTE-RSB oversampling technique #789

zoj613 · 2021-02-04T19:13:56Z

Reference Issue

Related to checklist in #105

What does this implement/fix? Explain your changes.

Adds the SMOTE-RSB oversampling technique.

Any other comments?

NA

pep8speaks · 2021-02-04T19:14:01Z

Hello @zoj613! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file imblearn/over_sampling/_smote.py:

Line 609:17: W503 line break before binary operator
Line 857:89: E501 line too long (91 > 88 characters)
Line 1467:46: W504 line break after binary operator

Comment last updated at 2021-02-05 00:10:14 UTC

zoj613 · 2021-02-04T23:52:42Z

Not sure why there so many build failures. The tests pass locally.

zoj613 · 2021-02-05T00:12:08Z

Ping @glemaitre @hayesall @chkoar . I think this is good for a review.

zoj613 · 2021-02-16T19:09:59Z

ping @glemaitre

glemaitre · 2021-02-18T11:03:35Z

@zoj613 I think that we should prioritize which SMOTE variants to include in imbalanced-learn. I think that the benchmark done there https://github.com/analyticalmindsltd/smote_variants/ is quite interesting.

I would be more in favor on implementing the 6 versions mentioned in the comment: analyticalmindsltd/smote_variants#14 (comment)

I need to update the issue in this regard.

glemaitre · 2021-02-18T11:11:08Z

imblearn/over_sampling/_smote.py

+            raise TypeError("`similarity_func` must be a callable")
+
+    # VERY slow! cython might be better suited for this function
+    def _make_similarity_matrix(self, X_s, X_m, maxmin_diff):


I look quickly at the paper. It seems that the similarity is used to find some neighbours.
I am not entirely sure but a NearestNeighbors with a given radius (i.e. the similarity-value (distance normalized by the max distance) would be a way to get a faster implementation. Managing categorical and numerical value would be to implement different distance (e.g. preocmputed ValueDifferenceMetric as in SMOTEN.

zoj613 mentioned this pull request Feb 4, 2021

New methods #105

Open

19 tasks

zoj613 changed the title ~~[WIP] Add the SMOTE-RSB oversampling technique~~ [MRG] Add the SMOTE-RSB oversampling technique Feb 5, 2021

ENH: Add the SMOTERSB oversampling technique

51fe311

glemaitre reviewed Feb 18, 2021

View reviewed changes

zoj613 closed this Mar 10, 2021

zoj613 deleted the smote-rsb branch March 10, 2021 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Add the SMOTE-RSB oversampling technique #789

[MRG] Add the SMOTE-RSB oversampling technique #789

zoj613 commented Feb 4, 2021 •

edited

Loading

pep8speaks commented Feb 4, 2021 •

edited

Loading

zoj613 commented Feb 4, 2021

zoj613 commented Feb 5, 2021

zoj613 commented Feb 16, 2021

glemaitre commented Feb 18, 2021

glemaitre Feb 18, 2021

[MRG] Add the SMOTE-RSB oversampling technique #789

[MRG] Add the SMOTE-RSB oversampling technique #789

Conversation

zoj613 commented Feb 4, 2021 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

pep8speaks commented Feb 4, 2021 • edited Loading

Comment last updated at 2021-02-05 00:10:14 UTC

zoj613 commented Feb 4, 2021

zoj613 commented Feb 5, 2021

zoj613 commented Feb 16, 2021

glemaitre commented Feb 18, 2021

glemaitre Feb 18, 2021

Choose a reason for hiding this comment

zoj613 commented Feb 4, 2021 •

edited

Loading

pep8speaks commented Feb 4, 2021 •

edited

Loading