-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on SSE #2
Comments
Hi, thanks for asking. In the codes, I did implement in the SGD stage
because it is easier. But it really should work for any embeddings no
matter where the embedding layer is. If the embedding layer is the bottom
of the architecture (often the case) or the top (for the label part), then
it is equivalent to data/label augmentation (as is done in the BERT pre-training
or as is done in label smoothing). But if you read my other NeurIPS paper
on the Stochastic Shared Embeddings, you can find that we are actually
solving a different loss function which has a smoother loss landscape therefore easier to solve, and it doesn’t even have to be random
but can be from a graph. It improves generalization error based on
theoretical analysis.
…On Sat, Jan 8, 2022 at 12:01 AM Chunpai ***@***.***> wrote:
Thank you for your great work. I am not sure if I understand SSE-SE
correctly. Based on your code, it seems you randomly replace the items in a
sequence with random item or replace the user with other random user during
SGD. Am I right? Also, can I view SSE-SE as a kind of data augmentation
technique? Thanks.
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD2L2HF7VERCSV6UBQMMKZDUU7VO5ANCNFSM5LQKZZOQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I also demonstrated how SSE can be used in computer vision by treating feature maps as embeddings in this CVPR paper: https://openaccess.thecvf.com/content_CVPR_2020/papers/Abavisani_Multimodal_Categorization_of_Crisis_Events_in_Social_Media_CVPR_2020_paper.pdf "We treat feature maps of images as embeddings and use class labels to construct knowledge graphs. The feature maps of two images are connected by an edge in the graph, if and only if they belong to the same class (e.g. they are both labeled “affected individuals”). We follow the same procedure for text embeddings and construct a knowledge graph for text embeddings as well. Finally, we connect the nodes associated with the knowledge graph of image fea- ture maps with an edge to nodes in text’s knowledge graph if and only if they belong to the same class." |
Thank you so much for your response. This is very helpful to me. |
Thank you for your great work. I am not sure if I understand SSE-SE correctly. Based on your code, it seems you randomly replace the items in a sequence with random item or replace the user with other random user during SGD. Am I right? Also, can I view SSE-SE as a kind of data augmentation technique? Thanks.
The text was updated successfully, but these errors were encountered: