Optimize memory footprint #32

florent-pajot · 2023-03-17T10:32:46Z

The current implementation is not optimized in terms of memory footprint, I proposed to perform some optimization.

The first one would be to use smaller dtypes. An example of it is optimizing the one hot encoded representation of features (used in

sliceline/sliceline/slicefinder.py

Line 542 in ddf2ac7

def _search_slices(

). Scikit-learn OneHotEncoder returns sparse matrice with float types:

from sklearn.preprocessing import OneHotEncoder
o = OneHotEncoder()
float_r = o.fit_transform([("a", "b", "c"), ("a", "a", "a")])
print(float_r)
 # <2x5 sparse matrix of type '<class 'numpy.float64'>'
 #   with 6 stored elements in Compressed Sparse Row format>

print(float_r.data.nbytes)
#48

int8_r = r.astype(np.int8)
print(int8_r.data.nbytes)
# 6

And, other typing optimization could be done.

Another potential optimization would be to avoid unecessary copy and force garbage colleciton. For example, in the same _search_slice method:
x_encoded = self._one_hot_encoder.fit_transform(input_x)

Here, input_x is kept but it not used anymore and could be deleted.

The text was updated successfully, but these errors were encountered:

florent-pajot self-assigned this Mar 17, 2023

florent-pajot added the enhancement New feature or request label Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory footprint #32

Optimize memory footprint #32

florent-pajot commented Mar 17, 2023

Optimize memory footprint #32

Optimize memory footprint #32

Comments

florent-pajot commented Mar 17, 2023