Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only plot particle clusters for example with k-means #84

Open
svenseeberg opened this issue Dec 3, 2023 · 8 comments
Open

Only plot particle clusters for example with k-means #84

svenseeberg opened this issue Dec 3, 2023 · 8 comments
Labels
enhancement New feature or request Future Feature Idea for future features prio:high

Comments

@svenseeberg
Copy link
Member

svenseeberg commented Dec 3, 2023

We can use k-means clustering to calculate k clusters so that no end position particle is further than 3km from each cluster center. This can be used as a base for simple search patterns (best route through all cluster centers).

The cluster centers can be ordered so that a path through all centers is the shortest from the current position of the SAR vessel.

The coordinates can be attached as KML or GPX track.

@svenseeberg svenseeberg added Future Feature Idea for future features enhancement New feature or request labels Dec 3, 2023
@svenseeberg
Copy link
Member Author

svenseeberg commented Dec 3, 2023

The challenge here is to use a valid distance metric in the k-means function. scikit-learn for example can only use the euclidian metric, which does not really work in our case. There is a significant difference in distance when comparing 1° in longitude to 1° in latitude. That means we need a k-means implementation that can use a distance function for coordinates on a sphere. It seems that NLTK implements a k-means method that supports a custom distance function.

The alternative would be to transform the lon/lat coordinates into x/y/z coordinates and apply k-means on them: https://github.com/shezmic/Geodetic-To-Cartesian-and-Vice-Versa/blob/master/converter.py

@julled
Copy link
Collaborator

julled commented Dec 3, 2023

i guess it always depends on the geometry how many k make sense, do you think there is a generic answer to this question?
Another option is to use DBSCAN which also estimates the k of the clustering

@SteffenME
Copy link
Collaborator

SteffenME commented Dec 4, 2023

I also think that a density based approach like dbscan is better and can be used with meaningful parameters e.g. 2km for neighbourhood distance. If you have two many k you miss the highest densities. Another option would be to calculate local Maxima of density map.

@SteffenME
Copy link
Collaborator

Transforming the lat/Lon to X/Y is necessary for any approach, I'd suggest to transform to UTM like we do in satsearch, or whenever we calculate distances. The code is already there. Can be done with pyproj or geopandas.

The challenge here is to use a valid distance metric in the k-means function. scikit-learn for example can only use the euclidian metric, which does not really work in our case. There is a significant difference in distance when comparing 1° in longitude to 1° in latitude. That means we need a k-means implementation that can use a distance function for coordinates on a sphere. It seems that NLTK implements a k-means method that supports a custom distance function.

The alternative would be to transform the lon/lat coordinates into x/y/z coordinates and apply k-means on them: https://github.com/shezmic/Geodetic-To-Cartesian-and-Vice-Versa/blob/master/converter.py

@svenseeberg
Copy link
Member Author

@svenseeberg svenseeberg changed the title K-Means clustering Only plot particle clusters for example with k-means Jan 27, 2024
@svenseeberg
Copy link
Member Author

A random interesting resource for k-means clustering with Python: https://domino.ai/blog/getting-started-with-k-means-clustering-in-python

@julled
Copy link
Collaborator

julled commented Jan 28, 2024

i am not a big fan of k-means clustering, as the user always need to provide the parameter "k".
I am more in favour of DBSCAN, as this does automatic cluster center count estimation

@SteffenME
Copy link
Collaborator

Some Clustering ideas with plots:

from sklearn.cluster import KMeans
import numpy as np
from pyproj import Transformer
import matplotlib.pyplot as  plt

transform_from_4326=Transformer_to_epsg=Transformer.from_crs(4326,epsg).transform


def distance(a,b):
    return np.sqrt(np.abs((a[0]-b[0])**2+(a[1]-b[1])**2))
    

end_lon,end_lat=lonlats[...,-1]
for point in zip(end_lon,end_lat):
    xx,yy = transform_from_4326(point[0],point[1])
    X.append([xx,yy])

X=np.array(X)

clusters = 2
while True:
    clustering=KMeans(n_clusters=clusters,max_iter=1000).fit(X)
    labels=clustering.labels_
    cluster_centers=clustering.cluster_centers_
    distances=[[]]*clusters
    for i in range(clusters):
        for xx in X[labels==i]:
            distances[i].append(distance(cluster_centers[i],xx))
    if np.array(distances).max()>2000:
        clusters+=1
    else:
        break
for i in range(clusters):
    color=np.zeros((100,3))+np.random.randint(0, 256, size=3)/256
    plt.scatter(np.array(x)[labels==i],np.array(y)[labels==i],c=color[labels==i])
    s=(max(100,(len(labels[labels==i])*50)))
    plt.scatter(cluster_centers[i][0],cluster_centers[i][1],s=s,c=color[0])

cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Future Feature Idea for future features prio:high
Projects
None yet
Development

No branches or pull requests

3 participants