-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathindex.html
323 lines (270 loc) · 16.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>14_BERT_BART</title>
<link rel="stylesheet" href="https://stackedit.io/style.css" />
</head>
<body class="stackedit">
<div class="stackedit__html"><h1 id="bert--bart">14 BERT & BART</h1>
<h2 id="assignment">Assignment</h2>
<ol>
<li>TASK 1: Train BERT using the code mentioned <a href="https://drive.google.com/file/d/1Zp2_Uka8oGDYsSe5ELk-xz6wIX8OIkB7/view?usp=sharing">here (Links to an external site.)</a> on the Squad Dataset for 20% overall samples (1/5 Epochs). Show results on 5 samples.</li>
<li>TASK 2: Reproductive <a href="https://mccormickml.com/2019/07/22/BERT-fine-tuning/">these (Links to an external site.)</a> results, and show output on 5 samples.</li>
<li>TASK 3: Reproduce the training explained in this <a href="https://towardsdatascience.com/bart-for-paraphrasing-with-simple-transformers-7c9ea3dfdd8c">blog (Links to an external site.)</a>. You can decide to pick fewer datasets.</li>
<li>Proceed to Session 14 - Assignment Solutions page and:
<ol>
<li>Submit README link for Task 1 (training log snippets and 5 sample results along with BERT description must be available) - 750</li>
<li>Submit README link for Task 2 (training log snippets and 5 sample results) - 250</li>
<li>Submit README link for Task 3 (training log snippets and 5 sample results along with BART description must be available) - 1000</li>
</ol>
</li>
</ol>
<h2 id="solution">Solution</h2>
<table>
<thead>
<tr>
<th></th>
<th>NBViewer</th>
<th>Google Colab</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>TASK 1</strong>: BERT QA Bot on SQUAD</td>
<td><a href="https://nbviewer.jupyter.org/github/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BERT_Tutorial_How_To_Build_a_Question_Answering_Bot.ipynb"><img alt="Open In NBViewer" src="https://img.shields.io/badge/render-nbviewer-orange?logo=Jupyter"></a></td>
<td><a href="https://githubtocolab.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BERT_Tutorial_How_To_Build_a_Question_Answering_Bot.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></td>
</tr>
<tr>
<td><strong>TASK 2</strong>: BERT Sentence Classification</td>
<td><a href="https://nbviewer.jupyter.org/github/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BERT_Fine_Tuning_Sentence_Classification_v2.ipynb"><img alt="Open In NBViewer" src="https://img.shields.io/badge/render-nbviewer-orange?logo=Jupyter"></a></td>
<td><a href="https://githubtocolab.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BERT_Fine_Tuning_Sentence_Classification_v2.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></td>
</tr>
<tr>
<td><strong>TASK 3</strong>: BART Paraphrasing</td>
<td><a href="https://nbviewer.jupyter.org/github/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BART_For_Paraphrasing_w_Simple_Transformers.ipynb"><img alt="Open In NBViewer" src="https://img.shields.io/badge/render-nbviewer-orange?logo=Jupyter"></a></td>
<td><a href="https://githubtocolab.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/BART_For_Paraphrasing_w_Simple_Transformers.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></td>
</tr>
</tbody>
</table><h3 id="task-1-results">Task 1 Results</h3>
<p>BERT QA Bot on SQUAD Dataset</p>
<p><strong>Training Logs</strong></p>
<pre><code>Train loss: 1.682131371960044
Saving model checkpoint to checkpoint-1000
</code></pre>
<p><img src="https://github.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/assets/bert_training.png?raw=true" alt="bert qa model training loss"></p>
<p><strong>Model Evaluation</strong></p>
<pre><code>{
"exact": 59.230009871668315,
"f1": 61.210163805868284,
"total": 6078,
"HasAns_exact": 45.49828178694158,
"HasAns_f1": 49.634149694868604,
"HasAns_total": 2910,
"NoAns_exact": 71.84343434343434,
"NoAns_f1": 71.84343434343434,
"NoAns_total": 3168,
"best_exact": 59.493254359986835,
"best_exact_thresh": -0.10016250610351562,
"best_f1": 61.38637161248114,
"best_f1_thresh": -0.08133554458618164
}
</code></pre>
<p><strong>Sample Outputs</strong></p>
<pre><code>question >> How does HT strive to give up power?
model's answer >> through "ideological struggle
question >> When did Kublai ban the international Mongol slave trade?
model's answer >> 1291
question >> What is the mayor of Warsaw called?
model's answer >> President
question >> In what geographical portion of England is Abercynon located?
model's answer >> south Wales
question >> The successful searches for what showed that the elementary particles are not observable?
model's answer >> free quarks
question >> Where did France win a war in the 1950's
model's answer >> Algeria
question >> What does the world's first Museum of Riding have one of the largest collections of in the world?
model's answer >> art posters
question >> What ethnicity was Frederick William, Elector of Brandenburg?
model's answer >> Prussia
question >> Where is a palm house with tropic plants from all over the world on display?
model's answer >> the New Orangery
question >> Who had issued the Edict of Nantes?
model's answer >> Louis XIV
question >> Whose activities were the French able to gain knowledge of?
model's answer >> Shirley and Johnson
question >> According to Ellen Churchill Semple what type of climate was necessary for humans to become fully human?
model's answer >> the temperate zone
question >> Who has a codified constitution?
model's answer >> the European Union
question >> When was Radcliffe's curriculum secularized?
model's answer >> the 18th century
question >> Minister Robert Dinwiddie had an investment in what significant company?
model's answer >> the Ohio
</code></pre>
<h3 id="task-2-results">Task 2 Results</h3>
<p>BERT Sentence Classification</p>
<p><strong>Training Logs</strong></p>
<pre><code>======== Epoch 1 / 4 ========
Training...
Batch 40 of 241. Elapsed: 0:00:14.
Batch 80 of 241. Elapsed: 0:00:28.
Batch 120 of 241. Elapsed: 0:00:43.
Batch 160 of 241. Elapsed: 0:00:57.
Batch 200 of 241. Elapsed: 0:01:12.
Batch 240 of 241. Elapsed: 0:01:27.
Average training loss: 0.40
Training epcoh took: 0:01:27
Running Validation...
Accuracy: 0.81
Validation took: 0:00:04
======== Epoch 2 / 4 ========
Training...
Batch 40 of 241. Elapsed: 0:00:15.
Batch 80 of 241. Elapsed: 0:00:30.
Batch 120 of 241. Elapsed: 0:00:45.
Batch 160 of 241. Elapsed: 0:01:00.
Batch 200 of 241. Elapsed: 0:01:15.
Batch 240 of 241. Elapsed: 0:01:30.
Average training loss: 0.27
Training epcoh took: 0:01:30
Running Validation...
Accuracy: 0.83
Validation took: 0:00:04
======== Epoch 3 / 4 ========
Training...
Batch 40 of 241. Elapsed: 0:00:15.
Batch 80 of 241. Elapsed: 0:00:31.
Batch 120 of 241. Elapsed: 0:00:46.
Batch 160 of 241. Elapsed: 0:01:02.
Batch 200 of 241. Elapsed: 0:01:17.
Batch 240 of 241. Elapsed: 0:01:32.
Average training loss: 0.18
Training epcoh took: 0:01:33
Running Validation...
Accuracy: 0.82
Validation took: 0:00:04
======== Epoch 4 / 4 ========
Training...
Batch 40 of 241. Elapsed: 0:00:15.
Batch 80 of 241. Elapsed: 0:00:31.
Batch 120 of 241. Elapsed: 0:00:47.
Batch 160 of 241. Elapsed: 0:01:02.
Batch 200 of 241. Elapsed: 0:01:18.
Batch 240 of 241. Elapsed: 0:01:33.
Average training loss: 0.13
Training epcoh took: 0:01:34
Running Validation...
Accuracy: 0.82
Validation took: 0:00:04
Training complete!
</code></pre>
<p><img src="https://github.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/assets/bert-classification-training.png?raw=true" alt="bert classification model training loss plot"></p>
<p><strong>Sample Outputs</strong></p>
<pre><code>sentence > the table was wiped by john clean .
predicted < unacceptable
true cls = unacceptable
sentence > the book surprised many people .
predicted < acceptable
true cls = acceptable
sentence > of whom are you thinking ?
predicted < acceptable
true cls = acceptable
sentence > they were interested in his .
predicted < acceptable
true cls = acceptable
sentence > the dog bit the cat .
predicted < acceptable
true cls = acceptable
sentence > who always drinks milk ?
predicted < acceptable
true cls = acceptable
sentence > john is aware of it that bill is here .
predicted < unacceptable
true cls = unacceptable
sentence > in which way is clinton anxious to find out which budget dilemma ##s pan ##etta would be willing to solve ?
predicted < acceptable
true cls = unacceptable
sentence > john was seen the book .
predicted < unacceptable
true cls = unacceptable
sentence > the argument was sum ##med by the coach up .
predicted < unacceptable
true cls = unacceptable
</code></pre>
<h3 id="task-3-results">Task 3 Results</h3>
<p>Training Logs: <a href="https://wandb.ai/satyajit_meow/Paraphrasing%20with%20BART?workspace=user-satyajit_meow">https://wandb.ai/satyajit_meow/Paraphrasing%20with%20BART?workspace=user-satyajit_meow</a></p>
<p>BART Paraphrasing</p>
<pre><code>Text > How can I be less insecure about my short height?
Pred < How can I be less insecure about my height?
Truth = How can I be less insecure about my short height and looks?
Text > What are tips for saving money?
Pred < What are some tips for saving money?
Truth = What are the best methods of saving money?
Text > Why November 14, birthday of Jawaharlal is celebrated as Children's day? What exceptional thing did he do to be considered as dearest to children?
Pred < Why is the birthday of Jawaharlal Nehru celebrated as Children's Day? What exceptional thing did he do to be considered as dearest to children?
Truth = Why we celebrate Nehru's birthday as children's day?
Text > What is it like to visit India?
Pred < What is it like to visit India?
Truth = What's it like to visit India?
Text > Is a third world war coming?
Pred < Is there a third world war on the way?
Truth = Will there be a World War III soon?
Text > Can we pursue biotechnology after B.tech in mechanical engineering?
Pred < Can we pursue biotechnology after B.tech in mechanical engineering?
Truth = After completing B.Tech in mechanical engineering, what job secured course should you take?
Text > What does the MATLAB application do?
Pred < What does MATLAB application do?
Truth = What are application of MATLAB?
Text > Who is the best online dating headshot photographer in San Francisco?
Pred < Who is the best online dating photographer in San Francisco?
Truth = Who are the best Tinder photographers in San Francisco?
Text > How should I prepare for an Amazon SDE internship interview?
Pred < How do I prepare for Amazon SDE internship interview?
Truth = How should I prepare for the amazon internship written test and interview?
Text > English cooking dominated early national cooking ; but as new immigrants arrived from other countries, other colonial soups gained popularity.
Pred < Paraphrase: English cooking dominated early national cooking ; but as new immigrants arrived from other countries, other colonial soups gained popularity.
Truth = Cooking dominated early national cuisine, but as new immigrants arrived from other countries, other colonial soups gained popularity.
Text > What is the smallest decision you have made leading to the biggest impact on your life?
Pred < What is the smallest decision you have made leading to the biggest impact on your life?
Truth = What seemingly unimportant decision made the most profound impact on your life?
Text > What are the easy ways to earn money online?
Pred < How can I earn money online?
Truth = What is a way to make money online?
Text > What can I do with C after learning the basics?
Pred < What can I do after learning C?
Truth = What can I do after learning Basics of C?
Text > Where can I get professional advice for buying and selling a property in Sydney?
Pred < Where can I get professional advice for buying and selling a property in Sydney?
Truth = Where can I get necessary advice for any property transaction in Sydney?
Text > Whàt is the dark web?
Pred < What is the dark web?
Truth = Wat is the dark web?
</code></pre>
<h2 id="bert">BERT</h2>
<p><strong>B</strong>idirectional <strong>E</strong>ncoder <strong>R</strong>epresentations from <strong>T</strong>ransformers</p>
<p>BERT is basically a trained Transformer Encoder stack. Both BERT models (Large & Base) have large number of encoder layers (Transformer Blocks) - 12 for the Base version and 24 for the large version.</p>
<p>Model Input: The first input token is <code>[CLS]</code>, which stands for Classification. Just like a normal Transformer, BERT takes a sequence of words as input.<br>
Model Outputs: Each position of the sequence outputs a vector of size <code>hidden_size</code>. For sentence classification we only focus on the first position (the <code>[CLS]</code> token position. The vector can now be used to classify into the class you chose. If you have more classes, this last layer (Classifier Network) is only changed.</p>
<p>As opposed to directional models, which read the text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once. Therefore it is considered bidirectional, though it would be more accurate to say that it’s non-directional.</p>
<p>Before feeding word sequences into BERT, 15% of the words in each sequence are replaced with a <code>[MASK]</code> token. The model then attempts to predict the original value of the masked words, based on the context provided by the other, non-masked, words in the sequence.</p>
<p>The BERT loss function takes into consideration only the prediction of the masked values and ignores the prediction of the non-masked words. As a consequence, the model converges slower than directional models.</p>
<p>In the BERT training process, the model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document. During training, 50% of the inputs are a pair in which the second sentence is the subsequent sentence in the original document, while in the other 50% a random sentence from the corpus is chosen as the second sentence. The assumption is that the random sentence will be disconnected from the first sentence.</p>
<h2 id="bart">BART</h2>
<p><strong>B</strong>idirectional and <strong>A</strong>uto-<strong>R</strong>egressive <strong>T</strong>ransformers</p>
<p>BART is a denoising autoencoder built with a sequence-to-sequence model that is applicable to a very wide range of end tasks.</p>
<h3 id="pretraining-fill-in-the-span">Pretraining: Fill In the Span</h3>
<p>BART is trained on tasks where spans of text are replaced by masked tokens, and the model must learn to reconstruct the original document from this altered span of text.</p>
<p>BART improves on BERT by replacing the BERT’s fill-in-the-blank cloze task with a more complicated mix of pretraining tasks.</p>
<p><img src="https://github.com/satyajitghana/TSAI-DeepNLP-END2.0/blob/main/14_BERT_BART/assets/text_infilling.png?raw=true" alt="text infilling"></p>
<p>In the above example the origin text is <code>A B C D E</code> and the span <code>C, D</code> is masked before sending it to the encoder, also an extra mask is placed between <code>A</code> and <code>B</code> and one mask is removed between <code>B</code> and <code>E</code>, now the corrupted document is <code>A _ B _ E</code>. The encoder takes this as input, encodes it and throws it to the decoder.</p>
<p>The decoder must now use this encoding to reconstruct the original document. <code>A B C D E</code></p>
<hr>
<p align="center">
satyajit<br>
:wq 🐈⬛
</p>
</div>
</body>
</html>