Contrib dist stat #346

arthurPignet · 2021-05-22T19:20:19Z

New contributivity measueament based on statistical distances between 2 distributions:

The partner-specific probability distribution of the label, wrt to the input; (estimated via maximun likelihood wrt to the partner's data)
The latent joint probability distribution of the label, wrt to the input.(estimated via maximun likelihood wrt to the joint dataset)

This difference of distributions is interpreted as a noise, which allow us to use a multiheaded adaptation of the smodel method to the multipartner case to estimate and quantify this pseudo-noise.

These contributivity metrics only need inferences to be computed, on the trained model (trained via FedSmodel)

The computational additional cost is thus neglectable
The method doesn't need a 'perfect' and global test dataset.

For now 3 distances are implemented:

KullBack- Leiber divergence
Hellinger metric
Bhattacharyya distance

These metrics are tested on the reference scenarios, see the colab notebook :
https://colab.research.google.com/drive/1DN1lLdd1b1ZmttmEiQKpx8xW5guEf_f_?usp=sharing

TODO

Add doc
Investigate over s-model bug when using Advanced or Flexible splitter
Handle dict contributivity score for result dataframe
Investigate std computations

This class is equivalent to the previous FedGDO_reset_local and FedGDO_persistent The reset of the local optimizers between each global batch can be set via a mpl_parameter Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Add fedgdo in the mpl.init Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

fix log flake it Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Fix kwargs use for mpl in contributivity S-model initialization could fail if the confusion matrix has not the right shape, which can be the case if some labels are not included in the dataset of a partner. btw I noticed that smodel can only work with datasets with 10 labels, that's only cifar and mnist. I opened an issue about that Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr> Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

codecov-commenter · 2021-05-23T15:22:55Z

Codecov Report

Merging #346 (078cbca) into master (ecc3ea8) will decrease coverage by 0.19%.
The diff coverage is 80.37%.

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   80.68%   80.49%   -0.20%     
==========================================
  Files          15       15              
  Lines        3045     3128      +83     
==========================================
+ Hits         2457     2518      +61     
- Misses        588      610      +22

Impacted Files	Coverage Δ
mplc/multi_partner_learning/__init__.py	`100.00% <ø> (ø)`
mplc/multi_partner_learning/basic_mpl.py	`84.98% <ø> (-0.29%)`	⬇️
mplc/multi_partner_learning/fast_mpl.py	`61.09% <55.31%> (-0.81%)`	⬇️
mplc/contributivity.py	`77.23% <100.00%> (+0.67%)`	⬆️
mplc/scenario.py	`83.27% <100.00%> (+0.77%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecc3ea8...078cbca. Read the comment docs.

arthurPignet added 4 commits May 22, 2021 15:39

Add FedGDO class.

47074f8

This class is equivalent to the previous FedGDO_reset_local and FedGDO_persistent The reset of the local optimizers between each global batch can be set via a mpl_parameter Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Add fast mpl methods to documentation

4b06777

Add fedgdo in the mpl.init Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

add the possibility to change the global optimizer

029821b

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Fix init order

40379dc

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

arthurPignet self-assigned this May 23, 2021

arthurPignet added experiment An experiment on data need test labels May 23, 2021

arthurPignet added 7 commits May 23, 2021 17:00

wip

b042b87

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Add statistical distances computation

1ddcca0

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Change name of contributivity method

28db75b

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

fix tf metric use

4bedc3f

fix log flake it Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

fix sc.to_dataframe method to handle contributivity scores as dict

2b476d5

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Update contrib test

078cbca

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr> Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

arthurPignet force-pushed the contrib-dist-stat branch from 610da4d to 078cbca Compare May 23, 2021 15:00

arthurPignet marked this pull request as ready for review May 23, 2021 16:04

arthurPignet requested review from bowni, natct10, RomainGoussault and Thomas-Galtier as code owners May 23, 2021 16:04

bowni requested a review from HeytemBou August 17, 2021 10:03

bowni requested review from JustineBoulant and kat-leen September 14, 2021 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contrib dist stat #346

Contrib dist stat #346

arthurPignet commented May 22, 2021 •

edited

Loading

codecov-commenter commented May 23, 2021 •

edited

Loading

Contrib dist stat #346

Are you sure you want to change the base?

Contrib dist stat #346

Conversation

arthurPignet commented May 22, 2021 • edited Loading

codecov-commenter commented May 23, 2021 • edited Loading

Codecov Report

arthurPignet commented May 22, 2021 •

edited

Loading

codecov-commenter commented May 23, 2021 •

edited

Loading