Skip to content

Implement mini-batch k-means in PySpark distributed framework and test the performance of the algorithm on standard synthetic datasets

Notifications You must be signed in to change notification settings

W-Mrt/Mini-batch-k-Means-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scale K-Means Clustering

Management and analysis of physical dataset project

Implement and benchmark alternatives of common clustering algorithms in Spark environment, without using the related already provided functions.

The project is thus focused on the efficient implementation of algorithms in a distributed system.

main topics:

Mini-batch k-Means, K-means ++, K-means ||

About

Implement mini-batch k-means in PySpark distributed framework and test the performance of the algorithm on standard synthetic datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •