-
Notifications
You must be signed in to change notification settings - Fork 24
/
Copy pathposts.html
430 lines (321 loc) · 20.6 KB
/
posts.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
---
layout: default
title: Blog
description: Intuitive explanations for some of my papers.
menu: yes
order: 1
---
<style>
#thumbnail {
box-shadow: 0 5px 10px rgba(0,0,0,0.19), 0 3px 3px rgba(0,0,0,0.23);
}
#thumbnail:hover {
box-shadow: 0 12px 24px rgba(0,0,0,0.19), 0 8px 8px rgba(0,0,0,0.23);
}
.fullCard {
width: 750px;
border: 1px solid #ccc;
border-radius: 5px;
margin: 10px 5px;
padding: 4px;
}
.cardContent {
padding: 10px;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
}
</style>
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<!--<h1 style="font-size:28px;">Language Modeling, Lexical Translation, Reordering</h1>-->
<h1 style="font-size:28px;">Neurons in LLMs: Dead, N-gram, Positional</h1>
<span style="font-size:14px;">
This is a post for the paper
<a href="https://arxiv.org/pdf/2309.04827.pdf" target="_blank">
Neurons in Large Language Models: Dead, N-gram, Positional.
</a>
</span>
<a class="float-right">
<img src="../resources/posts/ffn_neurons/suppressed_concepts-min.png" alt=""
style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
</a>
<br/>
<br/>
<span style="font-size:15px;">
<p>With scale, LMs become more exciting but, at the same time, harder to analyze.
We show that even with
simple methods and a single GPU, you can do a lot! We analyze OPT models up to 66b and find that
</p>
<ul>
<li>neurons inside LLMs can be:</li>
<ul style="margin-left:30px;">
<li><u>dead</u>, i.e. never activate on a large dataset,</li>
<li><u>n-gram</u> detectors that explicitly remove information about current input token;</li>
<li><u>positional</u>, i.e. encode "where" regardless of "what" and question the key-value memory view of FFNs;</li>
</ul>
<li>with scale, models have more dead neurons and token detectors and are less focused on absolute position.
</li>
</ul>
</span>
<a class="pull-right" href="/posts/neurons_in_llms_dead_ngram_positional.html" onMouseOver="document.readmore8.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore8.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore8" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/pdf/2309.04827.pdf" onMouseOver="document.readpaper8.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper8.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper8" width=120px class="pull-right"></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">September 2023</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<!--<h1 style="font-size:28px;">Language Modeling, Lexical Translation, Reordering</h1>-->
<h1 style="font-size:28px;">NMT Training Process though the Lens of SMT</h1>
<span style="font-size:14px;">
This is a post for the EMNLP 2021 paper
<a href="https://arxiv.org/abs/2109.01396" target="_blank">
Language Modeling, Lexical Translation, Reordering:
The Training Process of NMT through the Lens of Classical SMT.
</a>
</span>
<a class="float-right">
<img src="../resources/posts/nmt_training/morda-min.png" alt=""
style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
</a>
<br/>
<br/>
<span style="font-size:15px;">
<p>In SMT, model competences are modelled with distinct models.
In NMT, the whole translation task is modelled
with a single neural network.
How and when does NMT get to learn all the competences? We show that</p>
<ul>
<li>during training, NMT undergoes three different stages:</li>
<ul style="margin-left:30px;">
<li>target-side language modeling,</li>
<li>learning how to use source and approaching word-by-word translation,</li>
<li>refining translations, visible by increasingly complex reorderings
but not visible by e.g. BLEU;</li>
</ul>
<li>not only this is fun, but it can also help in practice! For example, in settings where
data complexity matters, such as non-autoregressive NMT.
</li>
</ul>
</span>
<a class="pull-right" href="/posts/nmt_training_through_smt_lens.html" onMouseOver="document.readmore7.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore7.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore7" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/abs/2109.01396" onMouseOver="document.readpaper7.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper7.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper7" width=120px class="pull-right"></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">September 2021</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">Neural Machine Translation Inside Out</h1>
<a class="float-right">
<img src="../resources/posts/nmt_inside_out/morda_test.png" alt=""
style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
</a>
<span style="font-size:14px;">
This is a blog version of my talk at the ACL 2021 workshop
<a href="https://sites.google.com/view/repl4nlp-2021/" target="_blank">Representation
Learning for NLP</a> (and an updated version
of that at NAACL 2021 workshop
<a href="https://sites.google.com/view/deelio-ws/" target="_blank">
Deep Learning Inside Out (DeeLIO)</a>).
</span>
<br/>
<br/>
<span style="font-size:15px;">
In the last decade, machine translation shifted from the traditional statistical approaches
with distinct components and hand-crafted features to the end-to-end neural ones.
We try to understand how NMT works and show that:
<ul>
<li>NMT model components can learn to extract features which in SMT were modelled explicitly;</li>
<li>for NMT, we can also look at how it balances the two different types of context: the source and the prefix;</li>
<li>NMT training consists of the stages where it focuses on competences
mirroring three core SMT components.</li>
</ul>
</span>
<a class="pull-right" href="/posts/nmt_inside_out.html" onMouseOver="document.readmore6.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore6.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore6" width=120px class="pull-right"></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">July 2021</span>
</div>
</div>
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">Source and Target Contributions to NMT Predictions</h1>
<video width="300" height="auto" style="float: right; margin-left: 15px;" loop autoplay muted>
<source src="../resources/posts/src_dst_nmt/src_dst_main.mp4" type="video/mp4">
</video>
<span style="font-size:14px;">
This is a post for the ACL 2021 paper
<a href="https://arxiv.org/pdf/2010.10907.pdf" target="_blank">
Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation.
</a>
</span>
<br/>
<br/>
<span style="font-size:15px;">
In NMT, the generation of a target token is based on two types of context: the source and the prefix of the target sentence.
We show how to evaluate the relative contributions of source and target to NMT predictions and find that:
<ul>
<li>models suffering from exposure bias are more prone to over-relying on target history (and hence to hallucinating) than
the ones where the exposure bias is mitigated;</li>
<li>models trained with more data rely on the source more and do it more confidently;</li>
<li>the training process is non-monotonic with several distinct stages.</li>
</ul>
</span>
<a class="pull-right" href="/posts/source_target_contributions_to_nmt.html" onMouseOver="document.readmore5.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore5.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore5" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/pdf/2010.10907.pdf" onMouseOver="document.readpaper5.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper5.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper5" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" onMouseOver="document.viewcode5.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode5.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode5" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">October 2020</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">Information-Theoretic Probing with MDL</h1>
<a class="float-right">
<img src="../resources/posts/mdl_probes/probe_main_orange-min.png" alt=""
style="max-width:350px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
</a>
<span style="font-size:14px;">
This is a post for the EMNLP 2020 paper
<a href="https://arxiv.org/pdf/2003.12298.pdf" target="_blank">
Information-Theoretic Probing with Minimum Description Length.
</a>
</span>
<br/>
<br/>
<span style="font-size:15px;">
Probing classifiers often fail to adequately reflect differences in representations
and can show different results depending on hyperparameters.
</br>
As an alternative to the standard probes,
<ul>
<li>we propose information-theoretic probing which measures
<font face="arial">minimum description length</font> (MDL) of labels given representations;</li>
<li>we show that MDL characterizes both <font face="arial">probe quality</font> and
<font face="arial">the amount of effort</font> needed to achieve it;</li>
<li>we explain how to easily measure MDL on top of standard probe-training pipelines;</li>
<li>we show that results of MDL probes are more informative and stable than those of standard probes.</li>
</ul>
</span>
<a class="pull-right" href="/posts/mdl_probes.html" onMouseOver="document.readmore4.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore4.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore4" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/pdf/2003.12298.pdf" target="_blank" onMouseOver="document.readpaper4.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper4.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper4" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/description-length-probing" target="_blank" onMouseOver="document.viewcode4.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode4.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode4" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">March 2020</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">Evolution of Representations in the Transformer</h1>
<a class="float-right">
<img src="../resources/posts/emnlp19_evolution/fugue_logo_on_white-min-min.png" alt="" style="max-width:350px; height:auto; float: right"/>
</a>
<span style="font-size:14px;">
This is a post for the EMNLP 2019 paper
<a href="https://arxiv.org/abs/1909.01380" target="_blank">
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives.
</a>
</span>
<br/>
<br/>
<span style="font-size:15px;">
We look at the evolution of representations of individual tokens in Transformers trained with different
training objectives (MT, LM, MLM - BERT-style) from the
<a href="https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf">Information Bottleneck</a>
perspective and show, that:
<ul>
<li>LMs gradually forget past when forming predictions about future;</li>
<li>for MLMs, the evolution proceeds in two stages of
<font face="arial">context encoding</font> and <font face="arial">token reconstruction</font>;</li>
<li>MT representations get refined with context,
but less processing is happening.</li>
</ul>
</span>
<a class="pull-right" href="/posts/emnlp19_evolution.html" onMouseOver="document.readmore3.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore3.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore3" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/abs/1909.01380" target="_blank" onMouseOver="document.readpaper3.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper3.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper3" width=120px class="pull-right"></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">September 2019</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">When a Good Translation is Wrong in Context</h1>
<video width="380" height="auto" style="float: right" loop autoplay muted>
<source src="../resources/posts/acl19_ctx/cadec_post_crop.mp4" type="video/mp4">
</video>
<span style="font-size:14px;">
This is a post for the ACL 2019 paper
<a href="https://www.aclweb.org/anthology/P19-1116" target="_blank">
When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion.
</a>
</span>
<br/>
<br/>
<span style="font-size:15px;">
From this post, you will learn:
<ul>
<li>which phenomena cause context-agnostic translations to be inconsistent with each other</li>
<li>how we create test sets addressing the most frequent phenomena</li>
<li>about a novel set-up for context-aware NMT with a large amount of sentence-level data and
much less of document-level data</li>
<li>about a new model for this set-up (<font color="#CA6F1E">C</font>ontext-<font color="#CA6F1E">A</font>ware
<font color="#CA6F1E">Dec</font>oder, aka <font color="#CA6F1E">CADec</font>) - a two-pass MT model which first produces a draft translation of the current sentence, then corrects it using context.</li>
</ul>
</span>
<a class="pull-right" href="/posts/acl19_context.html" onMouseOver="document.readmore2.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore2.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore2" width=120px class="pull-right"></a>
<a class="pull-right" href="https://www.aclweb.org/anthology/P19-1116" target="_blank"onMouseOver="document.readpaper2.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper2.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper2" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/good-translation-wrong-in-context" target="_blank" onMouseOver="document.viewcode2.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode2.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode2" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">July 2019</span>
</div>
</div>
<!-- ################################################################################### -->
<div class="fullCard" id="thumbnail" >
<div class="cardContent">
<h1 style="font-size:28px;">The Story of Heads</h1>
<a class="float-right">
<img src="../img/paper/acl19_heads-min.png" alt="" style="max-width:350px; height:auto; float: right"/>
</a>
<span style="font-size:14px;">
This is a post for the ACL 2019 paper
<a href="https://www.aclweb.org/anthology/P19-1580" target="_blank">
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.
</a>
</span>
<br/>
<br/>
<span style="font-size:15px;">
From this post, you will learn:
<ul>
<li>how we evaluate the importance of attention heads in Transformer</li>
<li>which functions the most important encoder heads perform</li>
<li>how we prune the vast majority of attention heads in Transformer without seriously affecting quality</li>
<li>which types of model attention are most sensitive to the number of attention heads and on which layers </li>
</ul>
</span>
<a class="pull-right" href="/posts/acl19_heads.html" onMouseOver="document.readmore.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore" width=120px class="pull-right"></a>
<a class="pull-right" href="https://www.aclweb.org/anthology/P19-1580" target="_blank" onMouseOver="document.readpaper.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" target="_blank" onMouseOver="document.viewcode.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">June 2019</span>
</div>
</div>