You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
). Scikit-learn OneHotEncoder returns sparse matrice with float types:
from sklearn.preprocessing import OneHotEncoder
o = OneHotEncoder()
float_r = o.fit_transform([("a", "b", "c"), ("a", "a", "a")])
print(float_r)
# <2x5 sparse matrix of type '<class 'numpy.float64'>'
# with 6 stored elements in Compressed Sparse Row format>
print(float_r.data.nbytes)
#48
int8_r = r.astype(np.int8)
print(int8_r.data.nbytes)
# 6
And, other typing optimization could be done.
Another potential optimization would be to avoid unecessary copy and force garbage colleciton. For example, in the same _search_slice method: x_encoded = self._one_hot_encoder.fit_transform(input_x)
Here, input_x is kept but it not used anymore and could be deleted.
The text was updated successfully, but these errors were encountered:
The current implementation is not optimized in terms of memory footprint, I proposed to perform some optimization.
The first one would be to use smaller dtypes. An example of it is optimizing the one hot encoded representation of features (used in
sliceline/sliceline/slicefinder.py
Line 542 in ddf2ac7
And, other typing optimization could be done.
Another potential optimization would be to avoid unecessary copy and force garbage colleciton. For example, in the same _search_slice method:
x_encoded = self._one_hot_encoder.fit_transform(input_x)
Here, input_x is kept but it not used anymore and could be deleted.
The text was updated successfully, but these errors were encountered: