Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Sparsity from metrics logging #124

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/algos/performances.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ For single-policy algorithms, the metric used will be the scalarized return of t

### Multi-policy algorithms
For multi-policy algorithms, we propose to rely on various metrics to assess the quality of the **discounted** Pareto Fronts (PF) or Convex Coverage Set (CCS). In general, we want to have a metric that is able to assess the convergence of the PF, a metric that is able to assess the diversity of the PF, and a hybrid metric assessing both. The metrics are implemented in `common/performance_indicators`. We propose to use the following metrics:
* (Diversity) Sparsity: average distance between each consecutive point in the PF. From the PGMORL paper [1]. Keyword: `eval/sparsity`.
* **[Do not use]** (Diversity) Sparsity: average distance between each consecutive point in the PF. From the PGMORL paper [1]. Keyword: `eval/sparsity`.
* (Diversity) Cardinality: number of points in the PF. Keyword: `eval/cardinality`.
* (Convergence) IGD: a SOTA metric from Multi-Objective Optimization (MOO) literature. It requires a reference PF that we can compute a posteriori. That is, we do a merge of all the PFs found by the method and compute the IGD with respect to this reference PF. Keyword: `eval/igd`.
* (Hybrid) Hypervolume: a SOTA metric from MOO and MORL literature. Keyword: `eval/hypervolume`.
Expand Down
4 changes: 0 additions & 4 deletions morl_baselines/common/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
hypervolume,
igd,
maximum_utility_loss,
sparsity,
)
from morl_baselines.common.weights import equally_spaced_weights

Expand Down Expand Up @@ -156,7 +155,6 @@ def log_all_multi_policy_metrics(

Logged metrics:
- hypervolume
- sparsity
- expected utility metric (EUM)
If a reference front is provided, also logs:
- Inverted generational distance (IGD)
Expand All @@ -172,14 +170,12 @@ def log_all_multi_policy_metrics(
"""
filtered_front = list(filter_pareto_dominated(current_front))
hv = hypervolume(hv_ref_point, filtered_front)
sp = sparsity(filtered_front)
eum = expected_utility(filtered_front, weights_set=equally_spaced_weights(reward_dim, n_sample_weights))
card = cardinality(filtered_front)

wandb.log(
{
"eval/hypervolume": hv,
"eval/sparsity": sp,
"eval/eum": eum,
"eval/cardinality": card,
"global_step": global_step,
Expand Down
3 changes: 3 additions & 0 deletions morl_baselines/common/performance_indicators.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ def igd(known_front: List[np.ndarray], current_estimate: List[np.ndarray]) -> fl
def sparsity(front: List[np.ndarray]) -> float:
"""Sparsity metric from PGMORL.

(!) This metric only considers the points from the PF identified by the algorithm, not the full objective space.
Therefore, it is misleading (e.g. learning only one point is considered good) and we recommend not using it when comparing algorithms.

Basically, the sparsity is the average distance between each point in the front.

Args:
Expand Down
Loading