-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only plot particle clusters for example with k-means #84
Comments
The challenge here is to use a valid distance metric in the k-means function. scikit-learn for example can only use the euclidian metric, which does not really work in our case. There is a significant difference in distance when comparing 1° in longitude to 1° in latitude. That means we need a k-means implementation that can use a distance function for coordinates on a sphere. It seems that NLTK implements a k-means method that supports a custom distance function. The alternative would be to transform the lon/lat coordinates into x/y/z coordinates and apply k-means on them: https://github.com/shezmic/Geodetic-To-Cartesian-and-Vice-Versa/blob/master/converter.py |
i guess it always depends on the geometry how many k make sense, do you think there is a generic answer to this question? |
I also think that a density based approach like dbscan is better and can be used with meaningful parameters e.g. 2km for neighbourhood distance. If you have two many k you miss the highest densities. Another option would be to calculate local Maxima of density map. |
Transforming the lat/Lon to X/Y is necessary for any approach, I'd suggest to transform to UTM like we do in satsearch, or whenever we calculate distances. The code is already there. Can be done with pyproj or geopandas.
|
A random interesting resource for k-means clustering with Python: https://domino.ai/blog/getting-started-with-k-means-clustering-in-python |
i am not a big fan of k-means clustering, as the user always need to provide the parameter "k". |
Some Clustering ideas with plots: from sklearn.cluster import KMeans
import numpy as np
from pyproj import Transformer
import matplotlib.pyplot as plt
transform_from_4326=Transformer_to_epsg=Transformer.from_crs(4326,epsg).transform
def distance(a,b):
return np.sqrt(np.abs((a[0]-b[0])**2+(a[1]-b[1])**2))
end_lon,end_lat=lonlats[...,-1]
for point in zip(end_lon,end_lat):
xx,yy = transform_from_4326(point[0],point[1])
X.append([xx,yy])
X=np.array(X)
clusters = 2
while True:
clustering=KMeans(n_clusters=clusters,max_iter=1000).fit(X)
labels=clustering.labels_
cluster_centers=clustering.cluster_centers_
distances=[[]]*clusters
for i in range(clusters):
for xx in X[labels==i]:
distances[i].append(distance(cluster_centers[i],xx))
if np.array(distances).max()>2000:
clusters+=1
else:
break
for i in range(clusters):
color=np.zeros((100,3))+np.random.randint(0, 256, size=3)/256
plt.scatter(np.array(x)[labels==i],np.array(y)[labels==i],c=color[labels==i])
s=(max(100,(len(labels[labels==i])*50)))
plt.scatter(cluster_centers[i][0],cluster_centers[i][1],s=s,c=color[0]) |
We can use k-means clustering to calculate k clusters so that no end position particle is further than 3km from each cluster center. This can be used as a base for simple search patterns (best route through all cluster centers).
The cluster centers can be ordered so that a path through all centers is the shortest from the current position of the SAR vessel.
The coordinates can be attached as KML or GPX track.
The text was updated successfully, but these errors were encountered: