Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubts and questions #7

Open
fvarno opened this issue Sep 1, 2021 · 15 comments
Open

Some doubts and questions #7

fvarno opened this issue Sep 1, 2021 · 15 comments

Comments

@fvarno
Copy link

fvarno commented Sep 1, 2021

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

@tian1327
Copy link

tian1327 commented Oct 5, 2022

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

@fvarno
Copy link
Author

fvarno commented Oct 5, 2022

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

@firewood1996
Copy link

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

@fvarno
Copy link
Author

fvarno commented Nov 2, 2022

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

Cool! If you've ever decided to implement it using FedSim I can help setting up.

@firewood1996
Copy link

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

  1. The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
  2. Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
  3. what is the optimization algorithm and learning rate used to train the DQN network?
  4. What is the frequency of updating the target network (from the learning DQN)?
  5. do you use learning rate decay as in FedAvg? Does it match their numbers?
  6. Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

Cool! If you've ever decided to implement it using FedSim I can help setting up.

Thx!I have developed the code based on Open-Ai's gym and this repository. I will try my best to achieve that! :)

@tian1327
Copy link

tian1327 commented Nov 2, 2022

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

@firewood1996
Copy link

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

@tian1327
Copy link

tian1327 commented Nov 2, 2022

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

Cool!

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

Cool! Looking forward to seeing your code!

@tian1327
Copy link

firewood1996

Hi there, I am wondering in your implementation, during the training of DDQN, did you choose only 1 client in each communication round? If that's the case, then only 1 device would report its local weights to the server and thus there would be no FedAvg in the server. I have implemented this scheme, but the results were horrible. I doubt the accuracy would improve using this scheme because for every round, the selected 1 device cannot benefit from observing the weights update from other devices. Would you mind sharing your experiences? Thank you!

@firewood1996
Copy link

firewood1996 commented Nov 30, 2022

According to the paper, they sorted the Q value of the total 100 clients, and then selected 10 clients which were with bigger Q value than others. While training the Q network, they just use the biggest Q value to train. I have tried many ways, but the performance still does not achieve the goal which was mentioned in the paper. Actually, I strongly suspect that DDQN can actually work.

  1. Using DDQN means that it costs much of the time to train. If you just have only one GPU, you must train the model over more than two days. Moreover, if you use the PCA, it will cost more time.
  2. The federated learning model is able to converge with or without the DQN. Therefore, the DQN model training early rewards and late rewards are unlikely to produce very large jumps. The environment in which traditional reinforcement learning operates, such as Gym, does not converge at all in the early training period, which is different from that of federated learning.

I have given up on continuing to optimize using reinforcement learning, as the many methods tried do not achieve the results mentioned in the paper. And knowing that now the authors still haven't open sourced it shows that there are many problems.

I am very sorry that I really do not have the means to achieve. If you have new ideas, you can discuss with me.

firewood1996

Hi there, I am wondering in your implementation, during the training of DDQN, did you choose only 1 client in each communication round? If that's the case, then only 1 device would report its local weights to the server and thus there would be no FedAvg in the server. I have implemented this scheme, but the results were horrible. I doubt the accuracy would improve using this scheme because for every round, the selected 1 device cannot benefit from observing the weights update from other devices. Would you mind sharing your experiences? Thank you!

@tian1327
Copy link

tian1327 commented Dec 6, 2022

@firewood1996 hi, thanks a lot for sharing your experiences! I have implemented the DQN based on flsim, but still cannot reproduce the training performance. Based on my experiments I strongly agree with you that the "FL model will converge with or without DQN". In another word, during the training of DQN, even selecting the same device for every round, the testing accuracy will improve as more communication rounds go. I also believe that the DQN is not suitable for this type of device selection problem because of the strong dependency between actions.

In case you are interested, you can find our implementation, short presentation, slides and report here. https://github.com/tian1327/flsim_dqn

@firewood1996
Copy link

@firewood1996 hi, thanks a lot for sharing your experiences! I have implemented the DQN based on flsim, but still cannot reproduce the training performance. Based on my experiments I strongly agree with you that the "FL model will converge with or without DQN". In another word, during the training of DQN, even selecting the same device for every round, the testing accuracy will improve as more communication rounds go. I also believe that the DQN is not suitable for this type of device selection problem because of the strong dependency between actions.

In case you are interested, you can find our implementation, short presentation, slides and report here. https://github.com/tian1327/flsim_dqn

I have read about your work, it is a very excellent work!!! And I strongly agree with your opinion that DQN is not suit and the reward setting also does not reveal the intrinsic connections between different clients. By the way, would you mind add my QQ (2497978657) to discuss more about FL?

@tian1327
Copy link

tian1327 commented Dec 8, 2022

@firewood1996
Sure, I don't use QQ. Would you pls send your wechat ID to my email address here skytamu6@gmail.com
Thanks

@fvarno
Copy link
Author

fvarno commented Dec 9, 2022

Dear @tian1327 and @firewood1996,
I hope you can find some clear answers to our questions regarding this project very soon 🚀.
I'm interested to know about your attempts. In case you'll have some updates I would be thrilled to get to know them.

Best of luck!

@tian1327
Copy link

tian1327 commented Dec 9, 2022

Dear @tian1327 and @firewood1996, I hope you can find some clear answers to our questions regarding this project very soon 🚀. I'm interested to know about your attempts. In case you'll have some updates I would be thrilled to get to know them.

Best of luck!

Feel free to check our implementation and results (code, report, slide, Youtube) here: https://github.com/tian1327/flsim_dqn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants