abstract.tex

\chapter*{Abstract}
\addcontentsline{toc}{chapter}{Abstract}

Communication, as a system of messages, symbols, and cultural exchanges, is ubiquitous across all species.  Scholars have argued that communication represents one of the most transformative evolutionary transitions in life's history \cite{smith1997major}, alongside pivotal developments like chromosomal mechanisms, eukaryotic formation, sexual reproduction, and multicellular life. Its unique capacity to enable cooperation and facilitate the unlimited transmission of cultural information grants species an unprecedented form of adaptive flexibility \cite{kirby2008cumulative}.

Because of the critical role communication plays in the survival and advancement of the species, communication has been studied since the ancient times. The earliest known work on communication, called Precepts by Ptah-Hotep appeared more than 4500 years ago (\~2300 BCE) \cite{gray1946precepts}. Since then, communication has seen three distinct waves of intensified interest: the first one in Ancient Greece with great Sophists like Aristotle, Isocrates, and Plato producing seminal works like Rhetoric, Phaedrus, and Antidosis \cite{hackforth1972plato,rapp2002aristotle,norlin1928isocrates}, the second one with the rise of print, the reformation, the Renaissance, and the European colonial pursuits \cite{mack2011history}, the third and most recent one during the Second World War \cite{brinol2012history}. 
We currently stand at the cusp of a fourth such phase, precipitated not by political upheaval (like the ideas of democracy or world war) or mechanical innovation (like the printing press and the steam engine), but by the unprecedented accumulation of digital content and behavioral data. This data now serves as the foundation for developing large language and diffusion models, which hold transformative potential for behavioral scientific inquiry. We will show in this thesis that these tools, while still in their infancy, have the potential to solve many problems considered ambitious in behavioral sciences. 


Communication is composed of seven modalities: the communicator, message, channel, time of receipt, receiver, time of behavior, and receiver's behavior \cite{shannon-weaver-1949,lasswell1948structure,lasswell1971propaganda}. Critically, each communication turn's behavior becomes the subsequent turn's message, rendering communication a strategic interaction between sender and receiver aimed at optimizing shared or individual objectives \cite{smith2003animal}. Examples like legal defense and prosecution, scientific discourse, mating, organizational communication, diplomacy, political propaganda, and culture (like folk songs and maxims), present different types of goals.


This thesis explores behavioral sciences' enduring mission—first articulated by Aristotle 2,500 years ago—of identifying and leveraging persuasive mechanisms \cite{rapp2002aristotle}. The field has traditionally bifurcated into two epistemological approaches: explanation and prediction. Historically, behavioral scientists have sought explanations that can provide interpretable causal mechanisms behind human and societal functioning. However, societies and humans do not render themselves to clean-cut equations and formulas, as is evidenced by the limited success of behavioral explanations in predicting behavior. The emergence of extensive digital behavioral repositories has consequently shifted focus towards more robust predictive methodologies.


% We then advance to constructing generalized behavior models—Large Content and Behavior Models (LCBMs)—trained on extensive digital analytics repositories. These models aim to comprehend behavior holistically, unlike task-specific approaches. Our investigation critically examines large language models' limitations in addressing behavioral challenges, revealing that behavioral training data is often inadvertently filtered out as statistical noise. We demonstrate that reintegrating behavioral data not only restores models' behavioral capabilities but enables novel inferential approaches—such as deriving content insights through receiver behavioral responses. Finally, we pioneer content generation research across text and image domains, focusing on metrics of performance and engagement. This includes developing the first automated arena for benchmarking text-to-image model engagement potential, establishing new standards for evaluating and improving content generation systems.


In this thesis, we start with the more traditional approach of behavior explanation, where we cover persuasion strategies in advertising images and videos. We construct the largest set of generic persuasion strategies based on theoretical and empirical studies in marketing, social psychology, and machine learning literature. We introduce the first dataset for studying persuasion strategies in advertisements. Further, we also introduce methods called universal adversarial triggers to mine behavioral models to understand what they learn. While persuasion strategies help a human correlate and understand content and behavior, universal adversarial triggers help understand what models learn, which makes them successful in predicting behavior.

Next, we turn attention towards behavior prediction by constructing general behavior models. These models, similar to large language models, aim to understand behavior \textit{in general}, as opposed to designed for a specific behavioral task. We use the large repositories of digital analytics to train these models. The format of this data is the general communication model consisting of the communicator, message, time of message, channel, receiver, time of effect, and effect. We call these models, Large Content and Behavior Models (LCBMs). We further show that large language models, while being used as general purpose models for a variety of tasks in different domains, are unable to solve behavioral problems. We investigate the reason for this and find that while training LLMs, behavioral data is removed as noise due to which they lose the behavioral capabilities.


We also show that after including the behavioral training data back leads to other positive side effects. Namely, we show that since behavior is an after effect of content (message), therefore, we can make inferences about content by looking at the receiver behavior. An example for this is blood pressure or eye dilation levels upon watching the movie Jurassic Park indicates the excitement level of different scenes. We show results for this hypothesis on more than 40 content understanding tasks across all four modalities of text, image, video, and audio.


Finally, we make initial strides towards solving the problem of generating performant content. We show this both for performant text generation, by taking the illustrative case of the behavior of memorability, and images, by generating images that are more engaging. We also develop mechanisms to measure the engagement potential of text to image generation models. We show that existing metrics to benchmark the quality of text to image generation models are not correlated with engagement. We develop a model to measure the engagement potential of an image. We release the first automated arena to benchmark the engagement of text-to-image models. We rank several popular text-to-image models on their ability to generate engaging images and further encourage the community to submit their models to the arena.