Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map cells with 0 hits - reshape error #94

Open
sho-87 opened this issue Nov 22, 2018 · 7 comments
Open

Map cells with 0 hits - reshape error #94

sho-87 opened this issue Nov 22, 2018 · 7 comments

Comments

@sho-87
Copy link

sho-87 commented Nov 22, 2018

I'm following the AirFlights example using my own dataset but run into a problem when I try to plot the real values component plane

I get a reshape error when I reach the following code:

plot_hex_map(empirical_codebook.reshape(sm.codebook.mapsize + [empirical_codebook.shape[-1]]), 
             titles=df.columns[:-1], shape=[4, 5], colormap=None)

ValueError: cannot reshape array of size 476 into shape (10,10,7)

I checked the values in df['bmus']. The max bmu value is 99, but there are only 68 unique bmu values. So this means I have too little data for my mapsize because some map cells are not being chosen as the bmu, and those cells are therefore not represented in df['bmus']. This is confirmed by plotting the hitsmap, which shows a number of cells with 0 hits

I can fix this by reducing my map size, but it only works if I go down to 3x3, which is pretty pointless

Is there any other way to get around this problem? View2D allows me to plot the prototype planes with no problem using the same model, despite some cells not being chosen as the best unit. Any way to do this with the real value planes as well? Maybe fill in the missing bmu values somehow?

@sevamoo
Copy link
Owner

sevamoo commented Dec 12, 2018

Your question is not really clear to me.
Al the nodes have their own weight vectors no matter if they are bmus or not. Therefore, we have component planes visualization.

If you want to visualize the values of the data points, you need to write your own plot. Just convert the bmus to xy values and then use some sort of scatter plots for the training data.

When there are nodes between bmu nodes, this means there is a clear cluster border and those nodes are in those borders. Therefore, reducing the som size to 3x3 is not a good idea.

@germayneng
Copy link

germayneng commented Jan 9, 2019

i have the same error for the real value component heatmaps in the example notebook. Using my own data, I am not able to reshape. Our error comes from this example here:

image

it will be good if someone can unblock us so we can use plot the real component heatmap as well

@sho-87
Copy link
Author

sho-87 commented Jan 9, 2019

@sevamoo taking a step back...generally the problem is that the reshaping you used in the notebook (as posted by @germayneng) doesn't work on all data, even though from the looks of it it should be a technique that can be generalized to any data

what do you think might be happening thats causing the reshaping errors, if it isnt the unique bmu count problem I described in the original question?

@ricardomourarpm
Copy link
Contributor

ricardomourarpm commented Feb 26, 2019

To see selected components:

  # If one wants to visualize components maps of some selected variable
  vars = ['Temp','Consumo','WorkingDays']
 Nodes=pd.DataFrame(som_chosen._normalizer.denormalize_by(
          som_chosen.data_raw,som_chosen.codebook.matrix), columns=Labels)
 Nodes_with_selected_variables = Nodes[vars]

   import matplotlib
    matplotlib.rcParams.update({'font.size': 8})

     from sompy.visualization.plot_tools import plot_hex_map
     plot_hex_map(np.flip(Nodes_with_selected_variables.values.reshape(som_chosen.codebook.mapsize +
                                                          [Nodes_with_selected_variables.values.shape[-1]]),axis=0),
         titles=Nodes_with_selected_variables.columns, shape=[1, 3], colormap=None)

Since in my data I have BMU with 0 hits

image

I had to make a different code

     # Recurring to the data normalized and to the location of the nodes, normalized we may find the nearest neighbours of
     # each node by making the data normalized as the training data in order to empirically create the values for the exogenous
      # variables
   Nodes_normalized = pd.DataFrame(som_chosen.codebook.matrix, columns=Labels) # The location of the nodes normalized
    Data_normalized = pd.DataFrame(som_chosen._data,columns=Labels) # The data normalized for nearest neighbours

    from sklearn.neighbors import NearestNeighbors

     Knearmodel = NearestNeighbors(n_neighbors=5) # 5 nearest neighbours with minkowsky power 2
     Knearmodel.fit(Data_normalized.values)

     TotalMinutesDay_regression = []


  for i in range(len(Nodes)): # impute values for empirical TotalMinutesDay
      distances, indices = Knearmodel.kneighbors([Nodes_normalized.loc[i].values])
      closest_vectors_to_node = Cargadata_relevant.loc[indices[0]]
      regression = np.mean(closest_vectors_to_node.TotalMinutesDay)
      TotalMinutesDay_regression = np.append(TotalMinutesDay_regression,regression)

  Nodes_with_selected_variables["TotalMinutesDay"] = TotalMinutesDay_regression

   # Plot hex_map with all variables needed

 plot_hex_map(np.flip(Nodes_with_selected_variables.values.reshape(som_chosen.codebook.mapsize +
                                                          [Nodes_with_selected_variables.values.shape[-1]]),axis=0),
         titles=Nodes_with_selected_variables.columns, shape=[1, 4], colormap=None)

image

With k nearest neighbours I can attribute a coordinate of the variable not used to create the som.

@ricardomourarpm
Copy link
Contributor

NameError: name 'som_chosen' is not defined

Dear akol67,

The code posted is only to be analysed. Out of context doesn't work.
som_chosen is the som I've trained for my example.

@ricardomourarpm
Copy link
Contributor

NameError: name 'som_chosen' is not defined

In my context, the variables Temp, Consumo, WorkingDays and TotalMinutesDay are exogenous and the only way I think is possible to construct the exogenous components map is by applying the mean of the nearest neighbours of a given neuron. Even if you make value=1 where BMU hit is zero, when constructing the exogenous components map, you would still have the problem that some neurons(nodes) would not be BMU of any data, so you cannot use sompy codes which retrieve the values of the exogenous variables from the data related to that neuron.

@akol67
Copy link

akol67 commented May 17, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants