Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contrib dist stat #346

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Contrib dist stat #346

wants to merge 11 commits into from

Conversation

arthurPignet
Copy link
Collaborator

@arthurPignet arthurPignet commented May 22, 2021

New contributivity measueament based on statistical distances between 2 distributions:

  • The partner-specific probability distribution of the label, wrt to the input; (estimated via maximun likelihood wrt to the partner's data)
  • The latent joint probability distribution of the label, wrt to the input.(estimated via maximun likelihood wrt to the joint dataset)

This difference of distributions is interpreted as a noise, which allow us to use a multiheaded adaptation of the smodel method to the multipartner case to estimate and quantify this pseudo-noise.

These contributivity metrics only need inferences to be computed, on the trained model (trained via FedSmodel)

The computational additional cost is thus neglectable
The method doesn't need a 'perfect' and global test dataset.

For now 3 distances are implemented:

  • KullBack- Leiber divergence
  • Hellinger metric
  • Bhattacharyya distance

These metrics are tested on the reference scenarios, see the colab notebook :
https://colab.research.google.com/drive/1DN1lLdd1b1ZmttmEiQKpx8xW5guEf_f_?usp=sharing

TODO

  • Add doc
  • Investigate over s-model bug when using Advanced or Flexible splitter
  • Handle dict contributivity score for result dataframe
  • Investigate std computations

This class is equivalent to the previous FedGDO_reset_local and FedGDO_persistent
The reset of the local optimizers between each global batch can be set via a mpl_parameter

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Add fedgdo in the mpl.init

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
@arthurPignet arthurPignet self-assigned this May 23, 2021
@arthurPignet arthurPignet added experiment An experiment on data need test labels May 23, 2021
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
fix log
flake it

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Fix kwargs use for mpl in contributivity

S-model initialization could fail if the confusion matrix has not the right shape, which can be the case if some labels are not included in the dataset of a partner. btw I noticed that smodel can only work with datasets with 10 labels, that's only cifar and mnist. I opened an issue about that
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>

Signed-off-by: arthurPignet <arthur.pignet@mines-paristech.fr>
@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #346 (078cbca) into master (ecc3ea8) will decrease coverage by 0.19%.
The diff coverage is 80.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   80.68%   80.49%   -0.20%     
==========================================
  Files          15       15              
  Lines        3045     3128      +83     
==========================================
+ Hits         2457     2518      +61     
- Misses        588      610      +22     
Impacted Files Coverage Δ
mplc/multi_partner_learning/__init__.py 100.00% <ø> (ø)
mplc/multi_partner_learning/basic_mpl.py 84.98% <ø> (-0.29%) ⬇️
mplc/multi_partner_learning/fast_mpl.py 61.09% <55.31%> (-0.81%) ⬇️
mplc/contributivity.py 77.23% <100.00%> (+0.67%) ⬆️
mplc/scenario.py 83.27% <100.00%> (+0.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecc3ea8...078cbca. Read the comment docs.

@arthurPignet arthurPignet marked this pull request as ready for review May 23, 2021 16:04
@bowni bowni requested a review from HeytemBou August 17, 2021 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiment An experiment on data need test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants