We will be using the following graphs from the Stanford Network Analysis Project (SNAP): ca-GrQc, Oregon-1, roadNet-CA, soc-Epinions1, and web-NotreDame (http://snap.stanford.edu/data/index.html). Project description in project.pdf and final report in report.pdf.
Kamada-Kawai graph visualization of the ca-GrQc graph and Clustering using the Spectral Embedding.
Graph | #vertices | #edges | #clusters |
---|---|---|---|
ca-GrQc | 4158 | 13428 | 2 |
Oregon-1 | 10670 | 22002 | 5 |
soc-Epinions1 | 75877 | 405739 | 10 |
web-NotreDame | 325729 | 1117563 | 20 |
roadNet-CA | 1957027 | 2760388 | 50 |
Python 3 and install dependencies:
pip install -r requirements.txt
Usage of virtualenv is recommended for package library / runtime isolation.
Run the clustering algorithm from the main Python file graph_clustering.py. You can read arguments help and find command examples in EXPERIMENTS.sh. List of arguments:
- seed: Random seed.
- iterations: Number of iterations with different seed.
- file: Path of the input graph file.
- outputs_path: Path to save the outputs.
- clustering: Use "kmeans", "custom_kmeans", "kmeans_sklearn", "xmeans" or "agglomerative".
- random_centroids: Random centroids initialization for "custom_kmeans".
- distance_metric: Distance metric for "custom_kmeans": "MINKOWSKI", "CHEBYSHEV", "EUCLIDEAN".
- compute_eig: Compute eigenvectors or load them.
- k: Number of desired clusters.
- networkx: Use networkx library for Laplacian.
- eig_kept: Number of eigen vectors kept.
- normalize_laplacian: Normalize Laplacian.
- invert_laplacian: Invert Laplacian.
- second: Using only second smallest eigenvector.
- eig_normalization: Normalization of eigen vectors by "vertex", "eig" or "None".
👤 Álvaro Orgaz Expósito (alvarorgaz)
👤 Adrià Cabeza (adriacabeza)