Only plot particle clusters for example with k-means #84

svenseeberg · 2023-12-03T12:25:56Z

We can use k-means clustering to calculate k clusters so that no end position particle is further than 3km from each cluster center. This can be used as a base for simple search patterns (best route through all cluster centers).

The cluster centers can be ordered so that a path through all centers is the shortest from the current position of the SAR vessel.

The coordinates can be attached as KML or GPX track.

svenseeberg · 2023-12-03T17:25:47Z

The challenge here is to use a valid distance metric in the k-means function. scikit-learn for example can only use the euclidian metric, which does not really work in our case. There is a significant difference in distance when comparing 1° in longitude to 1° in latitude. That means we need a k-means implementation that can use a distance function for coordinates on a sphere. It seems that NLTK implements a k-means method that supports a custom distance function.

The alternative would be to transform the lon/lat coordinates into x/y/z coordinates and apply k-means on them: https://github.com/shezmic/Geodetic-To-Cartesian-and-Vice-Versa/blob/master/converter.py

julled · 2023-12-03T19:27:45Z

i guess it always depends on the geometry how many k make sense, do you think there is a generic answer to this question?
Another option is to use DBSCAN which also estimates the k of the clustering

SteffenME · 2023-12-04T00:16:05Z

I also think that a density based approach like dbscan is better and can be used with meaningful parameters e.g. 2km for neighbourhood distance. If you have two many k you miss the highest densities. Another option would be to calculate local Maxima of density map.

SteffenME · 2023-12-04T00:20:36Z

Transforming the lat/Lon to X/Y is necessary for any approach, I'd suggest to transform to UTM like we do in satsearch, or whenever we calculate distances. The code is already there. Can be done with pyproj or geopandas.

The challenge here is to use a valid distance metric in the k-means function. scikit-learn for example can only use the euclidian metric, which does not really work in our case. There is a significant difference in distance when comparing 1° in longitude to 1° in latitude. That means we need a k-means implementation that can use a distance function for coordinates on a sphere. It seems that NLTK implements a k-means method that supports a custom distance function.

The alternative would be to transform the lon/lat coordinates into x/y/z coordinates and apply k-means on them: https://github.com/shezmic/Geodetic-To-Cartesian-and-Vice-Versa/blob/master/converter.py

svenseeberg · 2023-12-07T17:50:58Z

https://ocefpaf.github.io/python4oceanographers/blog/2013/12/16/utm/

svenseeberg · 2024-01-28T18:13:18Z

A random interesting resource for k-means clustering with Python: https://domino.ai/blog/getting-started-with-k-means-clustering-in-python

julled · 2024-01-28T18:44:16Z

i am not a big fan of k-means clustering, as the user always need to provide the parameter "k".
I am more in favour of DBSCAN, as this does automatic cluster center count estimation

SteffenME · 2024-01-28T20:31:17Z

Some Clustering ideas with plots:

from sklearn.cluster import KMeans
import numpy as np
from pyproj import Transformer
import matplotlib.pyplot as  plt

transform_from_4326=Transformer_to_epsg=Transformer.from_crs(4326,epsg).transform


def distance(a,b):
    return np.sqrt(np.abs((a[0]-b[0])**2+(a[1]-b[1])**2))
    

end_lon,end_lat=lonlats[...,-1]
for point in zip(end_lon,end_lat):
    xx,yy = transform_from_4326(point[0],point[1])
    X.append([xx,yy])

X=np.array(X)

clusters = 2
while True:
    clustering=KMeans(n_clusters=clusters,max_iter=1000).fit(X)
    labels=clustering.labels_
    cluster_centers=clustering.cluster_centers_
    distances=[[]]*clusters
    for i in range(clusters):
        for xx in X[labels==i]:
            distances[i].append(distance(cluster_centers[i],xx))
    if np.array(distances).max()>2000:
        clusters+=1
    else:
        break
for i in range(clusters):
    color=np.zeros((100,3))+np.random.randint(0, 256, size=3)/256
    plt.scatter(np.array(x)[labels==i],np.array(y)[labels==i],c=color[labels==i])
    s=(max(100,(len(labels[labels==i])*50)))
    plt.scatter(cluster_centers[i][0],cluster_centers[i][1],s=s,c=color[0])

svenseeberg mentioned this issue Dec 3, 2023

Improved search route #85

Open

svenseeberg added Future Feature Idea for future features enhancement New feature or request labels Dec 3, 2023

svenseeberg mentioned this issue Dec 3, 2023

Calculate center of mass of simulation result #11

Closed

svenseeberg mentioned this issue Jan 27, 2024

Add animation of wind & sea as outputs #63

Closed

4 tasks

svenseeberg changed the title ~~K-Means clustering~~ Only plot particle clusters for example with k-means Jan 27, 2024

svenseeberg added the prio:high label Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only plot particle clusters for example with k-means #84

Only plot particle clusters for example with k-means #84

svenseeberg commented Dec 3, 2023 •

edited

Loading

svenseeberg commented Dec 3, 2023 •

edited

Loading

julled commented Dec 3, 2023

SteffenME commented Dec 4, 2023 •

edited

Loading

SteffenME commented Dec 4, 2023

svenseeberg commented Dec 7, 2023

svenseeberg commented Jan 28, 2024

julled commented Jan 28, 2024

SteffenME commented Jan 28, 2024

Only plot particle clusters for example with k-means #84

Only plot particle clusters for example with k-means #84

Comments

svenseeberg commented Dec 3, 2023 • edited Loading

svenseeberg commented Dec 3, 2023 • edited Loading

julled commented Dec 3, 2023

SteffenME commented Dec 4, 2023 • edited Loading

SteffenME commented Dec 4, 2023

svenseeberg commented Dec 7, 2023

svenseeberg commented Jan 28, 2024

julled commented Jan 28, 2024

SteffenME commented Jan 28, 2024

svenseeberg commented Dec 3, 2023 •

edited

Loading

svenseeberg commented Dec 3, 2023 •

edited

Loading

SteffenME commented Dec 4, 2023 •

edited

Loading