Skip to content

Commit

Permalink
completed configurable backends
Browse files Browse the repository at this point in the history
  • Loading branch information
maniospas committed Jun 10, 2024
1 parent d329967 commit 7b2cc72
Show file tree
Hide file tree
Showing 21 changed files with 168 additions and 92 deletions.
34 changes: 12 additions & 22 deletions docs/userguide/convergence.md → docs/advanced/ranking.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
# Demo
# Node Ranking

As a quick start, let us construct a graph
and a set of nodes. The graph's class can be
imported either from the `networkx` library or from
`pygrank` itself. The two are in large part interoperable
and both can be parsed by our algorithms.
But our implementation is tailored to graph signal
processing needs and thus tends to be faster and consume
only a fraction of the memory.
Here we will see how an appropriate convergence manager
can be used to speed up a node ranking process, where
nodes obtain ordinal values 1,2,3,... based on their
importance in the graph structure (1 is the most
important node). For starters, let us construct some data to test with:

```python
from pygrank import Graph
import pygrank as pg

graph = Graph()
graph = pg.Graph()
graph.add_edge("A", "B")
graph.add_edge("B", "C")
graph.add_edge("C", "D")
Expand All @@ -24,16 +21,8 @@ seeds = {"A", "B"}
```

We now run a personalized PageRank
to score the structural relatedness of graph nodes to the ones of the given set.
First, let us import the library:

```python
import pygrank as pg
```

For instructional purposes,
we experiment with (personalized) *PageRank*
and make it output the node order of ranks.
to score the structural relatedness of graph nodes to the ones of the given set
and apply a postprocessor that ranks nodes based on their score:

```python
ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto") >> pg.Ordinals()
Expand Down Expand Up @@ -61,6 +50,7 @@ print(ordinals["B"], ordinals["D"], ordinals["E"])
# 3.0 5.0 4.0
```

Close to the previous results at a fraction of the time! For large graphs,
This is close to the previous results at a fraction of the time!
For large graphs,
most ordinals would be near the ideal ones. Note that convergence time
does not take into account the time needed to preprocess graphs.
4 changes: 2 additions & 2 deletions docs/userguide/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Quickstart

## 1. Install and import
Install the library using `pip install pygrank` and import it. Construct a node ranking algorithm from a graph filter by incrementally applying postprocessors using >>. There are many components and parameters available. You can use [autotuning](autotuning.md) to find good configurations.
Install the library using `pip install pygrank` and import it. Construct a node ranking algorithm from a graph filter by incrementally applying postprocessors using >>. There are many components and parameters available. Use [autotuning](autotuning.md) to find good configurations.

```python
import pygrank as pg
Expand All @@ -28,4 +28,4 @@ Evaluate the scores using a stochastic generalization of the unsupervised conduc
measure = pg.Conductance() # an evaluation measure
pg.benchmark_print_line("My conductance", measure(scores)) # pretty
print("Cite this algorithm as:", hk5_advanced.cite())
```~~
```
51 changes: 39 additions & 12 deletions docs/userguide/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ and install or upgrade to the latest version of `pygrank` with:
pip install --upgrade pygrank
```

## Creating graphs

When working n practical problems,
use `networkx` to construct graphs
by adding edges between Python objects.
`pygrank` also provides its own `pygrank.Graph` class
that implements a subset of `networkx.Graph` operations
to gain several optimizations; it tends to be faster for the
construction of large graphs and consumes
only a fraction of the memory.

## Backends

Several popular computational backends are supported.
Expand Down Expand Up @@ -41,12 +52,13 @@ The same message points to a configuration file stored under *home/.pygrank*.
In addition to automatically downloaded content, there is a JSON configuration
file specifying the default backend to be set upon first import and the option
to silence the reminder message. The configuration looks like this and can either be
edited directly or programmatically set with `pg.set_backend_preference(name, reminder=True)`):
edited directly or programmatically set with `pg.set_backend_preference(name, reminder=True, **init)`):

```json
{
"backend": "numpy",
"reminder": "true"
"reminder": "true",
"init": {}
}
```

Expand All @@ -64,28 +76,26 @@ necessarily be the fastest option for dense or very sparse graphs.

### <span class="component">tensorflow</span>
<b class="parameters">About</b><br>Performs computations within the `tensorflow` execution environment.
The latter is an open-source platform for machine learning developed by the Google Brain team.
It allows for efficient computation across multiple CPUs and GPUs, making it suitable for
performant large-scale data processing and deep learning applications.
The latter is an open-source platform for machine learning developed by the Google Brain team.
There
are two modes in which this backend can be executed: `"dense"` (default) and `"sparse"`.
The mode may be provided as additional arguments to the
`pg.set_backend("tensorflow", mode=...)` call.
`pg.set_backend("tensorflow", mode="dense" device="auto")` call.
In dense mode, the tensorflow backend attempts to store graphs in dense square
matrices that take full advantage of tensorflow's parallelization.
If there is not enough memory to allocate a sparse adjacency matrix,
the backend generates a sparse version and creates a warning.
<br>
The backend's initialization also accepts a device string or object to
which computations should be internally transferred. This needs to
be one among tensorflow's available devices.
<br>
<b class="parameters">Installation</b><br> `pip install tensorflow[and-cuda]`<br>On Windows install WSL2 (Windows Subsystem for Linux) first.<br>
<b class="parameters">Links</b><br> [tensorflow](https://www.tensorflow.org/install)


### <span class="component">pytorch</span>
<b class="parameters">About</b><br>Performs computations within the `pytorch` execution environment.
The latter is an open-source platform for machine learning developed by Meta's AI Research lab.
It is known for its flexibility, ease of use, and dynamic computation graph, which makes it popular
in research and production.
The latter is an open-source platform for machine learning developed by Meta's AI Research lab.
Similarly to `"tensorflow"`,
are two modes in which this backend can be executed: `"dense"` (default) and `"sparse"`.
The mode may be provided as additional arguments to the
Expand All @@ -94,9 +104,26 @@ In dense mode, the pytorch backend attempts to store graphs in dense square
matrices that take full advantage of tensorflow's parallelization.
If there is not enough memory to allocate a sparse adjacency matrix,
the backend generates a sparse version and creates a warning.
The backend's initialization also accepts a device string or object to
which computations should be internally transferred. This needs to
be one among pytorch's available devices (typically `"cuda"` or `"cpu"`).
<br>
<br>
<br>
<b class="parameters">Installation</b><br> For full installation instructions visit pytorch's website in the links below.<br>
<b class="parameters">Links</b><br> [pytorch](https://pytorch.org/get-started/locally)

### <span class="component">torch_sparse</span>
<b class="parameters">About</b><br>Performs computations within the `pytorch` execution environment,
but contrary to the `"pytorch` backend uses the sparse computations of the `torch_sparse` library.
The latter is an open-source platform for machine learning developed by Meta's AI Research lab.
Similarly to `"tensorflow"`,
are two modes in which this backend can be executed: `"dense"` (default) and `"sparse"`.
The backend's initialization only accepts a device string or object to
which computations should be internally transferred. This needs to
be one among pytorch's available devices (typically `"cuda"` or `"cpu"`).
!!! info
`"torch_sparse"` is much more computationally efficient than `"pytorch"`
for computations with sparse data structures.

<b class="parameters">Installation</b><br> For full installation instructions visit pytorch's website in the links below.<br>
<b class="parameters">Links</b><br> [pytorch](https://pytorch.org/get-started/locally) <br>
[torch_sparse](https://github.com/rusty1s/pytorch_sparse)
Expand Down
11 changes: 8 additions & 3 deletions examples/run_backend.py → examples/playground/run_backend.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
import pygrank as pg
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


with pg.Backend("torch_sparse", device=device):
_, graph, community = next(pg.load_datasets_one_community(["amazon"]))
ppr = pg.PageRank(alpha=0.9, normalization="symmetric", assume_immutability=True,
convergence=pg.ConvergenceManager(max_iters=38, error_type="iters"))
ppr = pg.PageRank(
alpha=0.9,
normalization="symmetric",
assume_immutability=True,
convergence=pg.ConvergenceManager(max_iters=38, error_type="iters"),
)
ppr.preprocessor(graph)
signal = pg.to_signal(graph, {node: 1.0 for node in community})
torch.cuda.synchronize() # correct timing
scores = ppr(signal)
print(ppr.convergence)
print(scores["B00005MHUG"]) # 0.00508212111890316
print(scores["B00006RGI2"]) # 0.70645672082901
print(scores["0006497993"]) # 0.19633759558200836
print(scores["0006497993"]) # 0.19633759558200836
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ nav:
- 'userguide/autotuning.md'
- 'userguide/preprocessing.md'
- Applications:
- 'advanced/ranking.md'
- 'advanced/community.md'
- 'advanced/gnn.md'
- 'advanced/fairness.md'
- R&D:
- 'tips/citations.md'
- 'tips/big.md'
Expand Down
2 changes: 1 addition & 1 deletion pygrank/algorithms/convergence.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def start(self, restart_timer: bool = True):

def has_converged(self, new_ranks: BackendPrimitive) -> bool:
# TODO: convert to any backend
new_ranks = np.array(new_ranks).squeeze()
new_ranks = backend.to_numpy(new_ranks).squeeze()
self.accumulated_ranks = (
self.accumulated_ranks * self.iteration + new_ranks
) / (self.iteration + 1)
Expand Down
12 changes: 6 additions & 6 deletions pygrank/core/backend/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@ def converted(*args, **kwargs):
return converted

setattr(thismod, api, converter(mod.__dict__[api]))
else: # pragma: no cover
raise Exception("Missing implementation for " + str(api))
#else: # pragma: no cover
# raise Exception("Missing implementation for " + str(api))
return mod.backend_init(*args, **kwargs)


Expand Down Expand Up @@ -157,9 +157,9 @@ def get_backend_preference(): # pragma: no cover
return {"mod_name": mod_name, **init_parameters}


def set_backend_preference(mod_name: str ,
remind_where_to_find: bool = True,
**kwargs): # pragma: no cover
def set_backend_preference(
mod_name: str, remind_where_to_find: bool = True, **kwargs
): # pragma: no cover
default_dir = os.path.join(os.path.expanduser("~"), ".pygrank")
if not os.path.exists(default_dir):
os.makedirs(default_dir)
Expand All @@ -169,7 +169,7 @@ def set_backend_preference(mod_name: str ,
{
"backend": mod_name.lower(),
"reminder": str(remind_where_to_find).lower(),
"init": {str(k): str(v) for k, v in kwargs.items()}
"init": {str(k): str(v) for k, v in kwargs.items()},
},
config_file,
)
Expand Down
5 changes: 3 additions & 2 deletions pygrank/core/backend/ddask.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def backend_init(*args, splits: int = 8, client=None, **kwargs):
if client is None:
client = dsk.distributed.Client(*args, **kwargs)
__pygrank_dask_config["client"] = client
else:
elif client is not None:
__pygrank_dask_config["client"] = client
return __pygrank_dask_config["client"]

Expand Down Expand Up @@ -117,7 +117,8 @@ def multiply_and_collect(signal, split):

# Use Dask to parallelize the multiplication
futures = [
__pygrank_dask_config["client"].submit(multiply_and_collect, signal, split) for split in M_splits
__pygrank_dask_config["client"].submit(multiply_and_collect, signal, split)
for split in M_splits
]
results = __pygrank_dask_config["client"].gather(futures)

Expand Down
2 changes: 1 addition & 1 deletion pygrank/core/backend/numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def to_array(obj, copy_array=False):
if obj.__class__.__module__ == "tensorflow.python.framework.ops":
return obj.numpy()
if obj.__class__.__module__ == "torch":
return obj.detach().numpy()
return obj.detach().cpu().numpy()
return np.array(obj)


Expand Down
41 changes: 29 additions & 12 deletions pygrank/core/backend/pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,15 @@ def diag(x, offset=0):
def backend_init(mode="dense", device=None):
__pygrank_torch_config["mode"] = mode
if device is not None and device == "auto":
if not isinstance(__pygrank_torch_config["device"], str) or __pygrank_torch_config["device"] != "auto":
if (
not isinstance(__pygrank_torch_config["device"], str)
or __pygrank_torch_config["device"] != "auto"
):
return
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
warnings.warn(f"[pygrank.backend.pytorch] Automatically detected device to run on {device}: {torch.cuda.get_device_name(device)}")
warnings.warn(
f"[pygrank.backend.pytorch] Automatically detected device to run on {device}: {torch.cuda.get_device_name(device)}"
)
if device is not None and isinstance(device, str):
device = torch.device(device)
__pygrank_torch_config["device"] = device
Expand Down Expand Up @@ -94,14 +99,22 @@ def scipy_sparse_to_backend(M):
return torch.FloatTensor(M.todense()).to(__pygrank_torch_config["device"])
except MemoryError:
warnings.warn(
f"[pygrank.backend.pytorch] Not enough memory to convert a scipy sparse matrix with shape {M.shape} to a numpy dense matrix before moving it to your device.\nWill create a torch.sparse_coo_tensor instead.\nAdd the option mode=\"sparse\" to the backend's initialization to hide this message,\nbut prefer switching to the torch_sparse backend for a performant implementation.")
f"[pygrank.backend.pytorch] Not enough memory to convert a scipy sparse matrix with shape {M.shape} "
f"to a numpy dense matrix before moving it to your device.\nWill create a torch.sparse_coo_tensor instead."
f'\nAdd the option mode="sparse" to the backend\'s initialization to hide this message,'
f"\nbut prefer switching to the torch_sparse backend for a performant implementation."
)

coo = M.tocoo()
return torch.sparse_coo_tensor(
torch.LongTensor(np.vstack((coo.col, coo.row))),
torch.FloatTensor(coo.data),
coo.shape,
).coalesce().to(__pygrank_torch_config["device"])
return (
torch.sparse_coo_tensor(
torch.LongTensor(np.vstack((coo.col, coo.row))),
torch.FloatTensor(coo.data),
coo.shape,
)
.coalesce()
.to(__pygrank_torch_config["device"])
)


def to_array(obj, copy_array=False):
Expand All @@ -111,12 +124,16 @@ def to_array(obj, copy_array=False):
return torch.clone(obj).to(__pygrank_torch_config["device"])
return obj.to(__pygrank_torch_config["device"])
return torch.ravel(obj).to(__pygrank_torch_config["device"])
return torch.ravel(torch.FloatTensor(np.array([v for v in obj], dtype=np.float32))).to(__pygrank_torch_config["device"])
return torch.ravel(
torch.FloatTensor(np.array([v for v in obj], dtype=np.float32))
).to(__pygrank_torch_config["device"])


def to_primitive(obj):
if isinstance(obj, float):
return torch.tensor(obj, dtype=torch.float32).to(__pygrank_torch_config["device"])
return torch.tensor(obj, dtype=torch.float32).to(
__pygrank_torch_config["device"]
)
return torch.FloatTensor(obj).to(__pygrank_torch_config["device"])


Expand All @@ -132,9 +149,9 @@ def self_normalize(obj):


def conv(signal, M):
#if M.is_sparse:
# if M.is_sparse:
return torch.mv(M, signal)
#return M@signal.reshape((-1,1))
# return M@signal.reshape((-1,1))


def length(x):
Expand Down
10 changes: 10 additions & 0 deletions pygrank/core/backend/specification.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,13 @@ def epsilon() -> float: # pragma: no cover

def shape0(M: BackendPrimitive) -> int: # pragma: no cover
pass


def to_numpy(obj):
import numpy as np

if obj.__class__.__module__ == "tensorflow.python.framework.ops":
return obj.numpy()
if obj.__class__.__module__ == "torch":
return obj.detach().cpu().numpy()
return np.array(obj)
Loading

0 comments on commit 7b2cc72

Please sign in to comment.