posts.html

---
layout: default
title: Blog
description: Intuitive explanations for some of my papers.
menu: yes
order: 1
---

<style>

  #thumbnail {
    box-shadow: 0 5px 10px rgba(0,0,0,0.19), 0 3px 3px rgba(0,0,0,0.23);
  }
  #thumbnail:hover {
    box-shadow: 0 12px 24px rgba(0,0,0,0.19), 0 8px 8px rgba(0,0,0,0.23);
  }

  .fullCard {
    width: 750px;
    border: 1px solid #ccc;
    border-radius: 5px;
    margin: 10px 5px;
    padding: 4px;

  }
  .cardContent {
    padding: 10px;

  }

  .center {
    display: block;
    margin-left: auto;
    margin-right: auto;
  }

</style>


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">

        <!--<h1 style="font-size:28px;">Language Modeling, Lexical Translation, Reordering</h1>-->
        <h1 style="font-size:28px;">Neurons in LLMs: Dead, N-gram, Positional</h1>

        <span style="font-size:14px;">
        This is a post for the paper
            <a href="https://arxiv.org/pdf/2309.04827.pdf" target="_blank">
                Neurons in Large Language Models: Dead, N-gram,  Positional.
            </a>
        </span>

        <a class="float-right">
            <img src="../resources/posts/ffn_neurons/suppressed_concepts-min.png" alt=""
                 style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
        </a>


        <br/>
        <br/>
        <span style="font-size:15px;">

            <p>With scale, LMs become more exciting but, at the same time, harder to analyze.
                We show that even with
                simple methods and a single GPU, you can do a lot! We analyze OPT models up to 66b and find that
            </p>
        <ul>
            <li>neurons inside LLMs can be:</li>
            <ul style="margin-left:30px;">
                <li><u>dead</u>, i.e. never activate on a large dataset,</li>
                <li><u>n-gram</u> detectors that explicitly remove information about current input token;</li>
                <li><u>positional</u>, i.e. encode "where" regardless of "what" and question the key-value memory view of FFNs;</li>
            </ul>

            <li>with scale, models have more dead neurons and token detectors and are less focused on absolute position.
            </li>
        </ul>
        </span>

        <a class="pull-right" href="/posts/neurons_in_llms_dead_ngram_positional.html" onMouseOver="document.readmore8.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore8.src='../resources/posts/buttons/button_read_more-min.png';">
        <img src="../resources/posts/buttons/button_read_more-min.png" name="readmore8" width=120px class="pull-right"></a>
        <a class="pull-right" href="https://arxiv.org/pdf/2309.04827.pdf" onMouseOver="document.readpaper8.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper8.src='../resources/posts/buttons/button_read_paper-min.png';">
        <img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper8" width=120px class="pull-right"></a>


        <span style="font-size:15px; text-align: right; float: right; color:gray">September 2023</span>

    </div>
</div>


<!-- ################################################################################### -->


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">

        <!--<h1 style="font-size:28px;">Language Modeling, Lexical Translation, Reordering</h1>-->
        <h1 style="font-size:28px;">NMT Training Process though the Lens of SMT</h1>

        <span style="font-size:14px;">
        This is a post for the EMNLP 2021 paper
            <a href="https://arxiv.org/abs/2109.01396" target="_blank">
                Language Modeling, Lexical Translation, Reordering:
                The Training Process of NMT through the Lens of Classical SMT.
            </a>
        </span>

        <a class="float-right">
            <img src="../resources/posts/nmt_training/morda-min.png" alt=""
                 style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
        </a>


        <br/>
        <br/>
        <span style="font-size:15px;">

            <p>In SMT, model competences are modelled with distinct models.
                In NMT, the whole translation task is modelled
                with a single neural network.
                How and when does NMT get to learn all the competences? We show that</p>
        <ul>
            <li>during training, NMT undergoes three different stages:</li>
            <ul style="margin-left:30px;">
                <li>target-side language modeling,</li>
                <li>learning how to use source and approaching word-by-word translation,</li>
                <li>refining translations, visible by increasingly complex reorderings
                    but not visible by e.g. BLEU;</li>
            </ul>

            <li>not only this is fun, but it can also help in practice! For example, in settings where
                data complexity matters, such as non-autoregressive NMT.
            </li>
        </ul>
        </span>

        <a class="pull-right" href="/posts/nmt_training_through_smt_lens.html" onMouseOver="document.readmore7.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore7.src='../resources/posts/buttons/button_read_more-min.png';">
        <img src="../resources/posts/buttons/button_read_more-min.png" name="readmore7" width=120px class="pull-right"></a>
        <a class="pull-right" href="https://arxiv.org/abs/2109.01396" onMouseOver="document.readpaper7.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper7.src='../resources/posts/buttons/button_read_paper-min.png';">
        <img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper7" width=120px class="pull-right"></a>


        <span style="font-size:15px; text-align: right; float: right; color:gray">September 2021</span>

    </div>
</div>


<!-- ################################################################################### -->


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">

        <h1 style="font-size:28px;">Neural Machine Translation Inside Out</h1>

        <a class="float-right">
            <img src="../resources/posts/nmt_inside_out/morda_test.png" alt=""
                 style="max-width:300px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
        </a>

        <span style="font-size:14px;">
        This is a blog version of my talk at the ACL 2021 workshop
            <a href="https://sites.google.com/view/repl4nlp-2021/" target="_blank">Representation
                Learning for NLP</a> (and an updated version
            of that at NAACL 2021 workshop
            <a href="https://sites.google.com/view/deelio-ws/" target="_blank">
                Deep Learning Inside Out (DeeLIO)</a>).
        </span>


        <br/>
        <br/>
        <span style="font-size:15px;">
            In the last decade, machine translation shifted from the traditional statistical approaches
            with distinct components and hand-crafted features to the end-to-end neural ones.
            We try to understand how NMT works and show that:
        <ul>
          <li>NMT model components can learn to extract features which in SMT were modelled explicitly;</li>
          <li>for NMT, we can also look at how it balances the two different types of context: the source and the prefix;</li>
          <li>NMT training consists of the stages where it focuses on competences
               mirroring three core SMT components.</li>
        </ul>
        </span>

        <a class="pull-right" href="/posts/nmt_inside_out.html" onMouseOver="document.readmore6.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore6.src='../resources/posts/buttons/button_read_more-min.png';">
        <img src="../resources/posts/buttons/button_read_more-min.png" name="readmore6" width=120px class="pull-right"></a>

        <span style="font-size:15px; text-align: right; float: right; color:gray">July 2021</span>

    </div>
</div>


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">

        <h1 style="font-size:28px;">Source and Target Contributions to NMT Predictions</h1>

        <video width="300" height="auto" style="float: right; margin-left: 15px;" loop autoplay muted>
          <source src="../resources/posts/src_dst_nmt/src_dst_main.mp4" type="video/mp4">
        </video>

        <span style="font-size:14px;">
        This is a post for the ACL 2021 paper
            <a href="https://arxiv.org/pdf/2010.10907.pdf" target="_blank">
                Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation.
            </a>
        </span>


        <br/>
        <br/>
        <span style="font-size:15px;">
            In NMT, the generation of a target token is based on two types of context: the source and the prefix of the target sentence.
            We show how to evaluate the relative contributions of source and target to NMT predictions and find that:
        <ul>
          <li>models suffering from exposure bias are more prone to over-relying on target history (and hence to hallucinating) than
          the ones where the exposure bias is mitigated;</li>
          <li>models trained with more data rely on the source more and do it more confidently;</li>
          <li>the training process is non-monotonic with several distinct stages.</li>
        </ul>
        </span>

        <a class="pull-right" href="/posts/source_target_contributions_to_nmt.html" onMouseOver="document.readmore5.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore5.src='../resources/posts/buttons/button_read_more-min.png';">
        <img src="../resources/posts/buttons/button_read_more-min.png" name="readmore5" width=120px class="pull-right"></a>
        <a class="pull-right" href="https://arxiv.org/pdf/2010.10907.pdf" onMouseOver="document.readpaper5.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper5.src='../resources/posts/buttons/button_read_paper-min.png';">
        <img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper5" width=120px class="pull-right"></a>
        <a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" onMouseOver="document.viewcode5.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode5.src='../resources/posts/buttons/button_view_code-min.png';">
        <img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode5" width=120px></a>

        <span style="font-size:15px; text-align: right; float: right; color:gray">October 2020</span>

    </div>
</div>


<!-- ################################################################################### -->

<div class="fullCard" id="thumbnail" >
    <div class="cardContent">

    <h1 style="font-size:28px;">Information-Theoretic Probing with MDL</h1>


<a class="float-right">
    <img src="../resources/posts/mdl_probes/probe_main_orange-min.png" alt=""
         style="max-width:350px; height:auto; float: right; margin-left:15px; margin-top:25px"/>
</a>


<span style="font-size:14px;">
This is a post for the EMNLP 2020 paper
    <a href="https://arxiv.org/pdf/2003.12298.pdf" target="_blank">
            Information-Theoretic Probing with Minimum Description Length.
    </a>
</span>

<br/>
<br/>
<span style="font-size:15px;">

    Probing classifiers often fail to adequately reflect  differences in representations
    and can show different results depending on hyperparameters.
    </br>
    As an alternative to the standard probes,

<ul>
    <li>we propose information-theoretic probing which measures
        <font face="arial">minimum description length</font> (MDL) of labels given representations;</li>
    <li>we show that MDL characterizes both <font face="arial">probe quality</font> and
      <font face="arial">the amount of effort</font> needed to achieve it;</li>
    <li>we explain how to easily measure MDL on top of standard probe-training pipelines;</li>
    <li>we show that results of MDL probes are more informative and stable than those of standard probes.</li>
</ul>
</span>

<a class="pull-right" href="/posts/mdl_probes.html" onMouseOver="document.readmore4.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore4.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore4" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/pdf/2003.12298.pdf" target="_blank" onMouseOver="document.readpaper4.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper4.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper4" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/description-length-probing" target="_blank" onMouseOver="document.viewcode4.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode4.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode4" width=120px></a>

<span style="font-size:15px; text-align: right; float: right; color:gray">March 2020</span>

    </div>
</div>


<!-- ################################################################################### -->

<div class="fullCard" id="thumbnail" >
    <div class="cardContent">
        <h1 style="font-size:28px;">Evolution of Representations in the Transformer</h1>


<a class="float-right">
    <img src="../resources/posts/emnlp19_evolution/fugue_logo_on_white-min-min.png" alt="" style="max-width:350px; height:auto; float: right"/>
</a>


<span style="font-size:14px;">
This is a post for the EMNLP 2019 paper
    <a href="https://arxiv.org/abs/1909.01380" target="_blank">
            The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives.
    </a>
</span>


<br/>
<br/>
<span style="font-size:15px;">
We look at the evolution of representations of individual tokens in Transformers trained with different
    training objectives (MT, LM, MLM - BERT-style) from the
    <a href="https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf">Information Bottleneck</a>
    perspective and show, that:
<ul>
  <li>LMs gradually forget past when forming predictions about future;</li>
  <li>for MLMs, the evolution proceeds in two stages of
      <font face="arial">context encoding</font> and <font face="arial">token reconstruction</font>;</li>
    <li>MT representations get refined with context,
        but less processing is happening.</li>
</ul>
</span>


<a class="pull-right" href="/posts/emnlp19_evolution.html" onMouseOver="document.readmore3.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore3.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore3" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/abs/1909.01380" target="_blank" onMouseOver="document.readpaper3.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper3.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper3" width=120px class="pull-right"></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">September 2019</span>
    </div>
</div>


<!-- ################################################################################### -->


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">
        <h1 style="font-size:28px;">When a Good Translation is Wrong in Context</h1>


<video width="380" height="auto" style="float: right" loop autoplay muted>
  <source src="../resources/posts/acl19_ctx/cadec_post_crop.mp4" type="video/mp4">
</video>


<span style="font-size:14px;">
This is a post for the ACL 2019 paper
    <a href="https://www.aclweb.org/anthology/P19-1116" target="_blank">
            When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion.
    </a>
</span>


<br/>
<br/>
<span style="font-size:15px;">
From this post, you will learn:
<ul>
  <li>which phenomena cause context-agnostic translations to be inconsistent with each other</li>
  <li>how we create test sets addressing the most frequent phenomena</li>
    <li>about a novel set-up for context-aware NMT with a large amount of sentence-level data and
        much less of document-level data</li>
  <li>about a new model for this set-up (<font color="#CA6F1E">C</font>ontext-<font color="#CA6F1E">A</font>ware
      <font color="#CA6F1E">Dec</font>oder, aka <font color="#CA6F1E">CADec</font>) - a two-pass MT model which first produces a draft translation of the current sentence, then corrects it using context.</li>
</ul>
</span>


<a class="pull-right" href="/posts/acl19_context.html" onMouseOver="document.readmore2.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore2.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore2" width=120px class="pull-right"></a>
<a class="pull-right" href="https://www.aclweb.org/anthology/P19-1116" target="_blank"onMouseOver="document.readpaper2.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper2.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper2" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/good-translation-wrong-in-context" target="_blank" onMouseOver="document.viewcode2.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode2.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode2" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">July 2019</span>

    </div>
</div>


<!-- ################################################################################### -->


<div class="fullCard" id="thumbnail" >
    <div class="cardContent">
        <h1 style="font-size:28px;">The Story of Heads</h1>


<a class="float-right">
    <img src="../img/paper/acl19_heads-min.png" alt="" style="max-width:350px; height:auto; float: right"/>
</a>

<span style="font-size:14px;">
This is a post for the ACL 2019 paper
    <a href="https://www.aclweb.org/anthology/P19-1580" target="_blank">
            Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.
    </a>
</span>

<br/>
<br/>
<span style="font-size:15px;">
From this post, you will learn:
<ul>
  <li>how we evaluate the importance of attention heads in Transformer</li>
  <li>which functions the most important encoder heads perform</li>
  <li>how we prune the vast majority of attention heads in Transformer without seriously affecting quality</li>
    <li>which types of model attention are most sensitive to the number of attention heads and on which layers </li>
</ul>
</span>


<a class="pull-right" href="/posts/acl19_heads.html" onMouseOver="document.readmore.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore" width=120px class="pull-right"></a>
<a class="pull-right" href="https://www.aclweb.org/anthology/P19-1580" target="_blank" onMouseOver="document.readpaper.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" target="_blank" onMouseOver="document.viewcode.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode" width=120px></a>
<span style="font-size:15px; text-align: right; float: right; color:gray">June 2019</span>

    </div>
</div>