-
Notifications
You must be signed in to change notification settings - Fork 13
The Halo Cross Media Measurement Framework
**Welcome to the Halo Cross Media Measurement wiki.
For as long as advertising has been around, an important question for marketers has been to understand how it is performing. It’s never been an easy problem to solve but in today’s complex media ecosystem, with rapidly changing consumer behaviours as well as with the proliferation of media channels, it takes on many new dimensions. A key one of those is Cross Media Measurement (CMM).
Advertisers seek a better understanding of the effectiveness of their media investments across a number of traditional and new media forms. We want to avoid audiences being excessively exposed to our ads while our consumers demand increased transparency and control over their data. In short, CMM is an extraordinarily important utility.
Yet, our ability to conduct this has been rather limited, being rooted in the bygone era of a much simpler media landscape. It’s our view that the panel approach, which has traditionally underpinned audience measurement, alone, is not sufficient for purpose. Likewise, techniques which are over-reliant on identifiers, like cookies and device IDs, are not in-step with the evolution of our ecosystem towards ‘privacy-first’ and comprehensiveness. Measurement systems had to evolve.
So we’ve sparked an extraordinary effort which, although not complete yet, is surely one of the industry’s most successful collaborations. From the development of North Star principles to the identification of a technological concept, to today - where we are now code-complete on the first full release of a new cross-media measurement system – we are driving change. And in this new system we are advocating for a hybrid approach, combining panels and census data, in privacy-safe and scalable ways.
It’s an incredible journey, and a significant achievement by anyone’s yardstick. The core concepts we’ve developed provide advertisers with the means to plan and optimise their media investments with more ease and accuracy, paving ways to revolutionise the way media is measured.
So it is very exciting to see this system now being deployed by two pioneering advertiser associations, ANA and ISBA/Origin, in two of the prominent markets, the US and the UK, respectively. While these two local pilots have yet to reach maturity, they are both showing considerable promise.
As a design principle, and as ANA and ISBA/Origin have shown it working in practice, it is up to local market groups to decide if and how they want to follow the routes available under the Halo CMM system, towards their own local deployment. Markets will need to go through similar processes: building governance groups, potentially standing up new panel assets and deploying the Halo code to local environments. Not necessarily a straight-forward exercise, let’s admit. But building on the back of the experience acquired from these two initial deployments we are looking forward to what can be scaled to other markets.
And beyond Reach and Frequency, we’re excited to see how we can further develop the core measurement technologies, potentially establishing the means to measure other outputs, and even outcomes.
This journey is just getting started and there’s much more to come. As ever, we encourage all actors in the industry to join us on the mission towards enabling a modern and holistic approach to cross-media measurement, that will serve not only the advertisers or the media owners, but most importantly, our consumers!
Atin Kulkarni, VP Global Media & Commercial Capabilities, PepsiCo
Supported by global brands and national advertiser associations, including ANA and ISBA, WFA has been facilitating a powerful programme of work (‘Halo’) designed to expedite the implementation of a new wave of cross-media measurement solution, globally. The programme has been participated in by partners from across the ecosystem.
The Halo Framework incorporates an innovative set of technologies and software components that empower local markets to use data assets like panels, exposure events, non-census measurement data, and other inputs to create measurement reports that are available via a set of Halo APIs. The resulting measurement is intended to meet the needs of the most ambitious advertiser use cases, and the demands of media owners for content measurement.
The initial objective is the secure computation of reach and frequency measurement reports, but with a roadmap that includes outcomes measurement.
The Halo Framework is built around two technological pillars: (1) the Virtual People Framework; and (2) the Private Reach and Frequency Estimator (PRFE). Halo’s Virtual People Framework provides a distributed method for calibrating and mapping raw measurement data, including digital events and non-census measurement data alike, onto a unified census-level representation of the population under measurement.
Three core data assets are required to achieve this: (1) a population Enumeration Survey of the population under measurement; (2) panel(s) for modelling and debiasing; and (3) when available, raw exposure event data, first party identifiers, and demographic profiles for the panellists (used for modelling) and for all users (used for measurement) from as many sources as possible. Additional non-census data, such as from set-top boxes or smart TVs, may also be used if desired, and access to a single-source panel is preferred.
The Private Reach and Frequency Estimator accepts the output of the Virtual People Framework and produces privacy preserving estimates of cross-media/publisher reach and frequency. This is done with a secure multiparty computation (MPC) protocol, which allows for the computation of desired cross-party outputs while guaranteeing the secrecy of all inputs and intermediate values through advanced cryptographic methods.
The Halo Framework is not a global centralised measurement service. Rather the Halo Framework and its constituent software components, which are available under the Apache 2.0 Licence, are intended to be deployed under the auspices of a local market as a Local Measurement Service. The open source software code is a critical foundation of the service but significant local effort will be required to address the issues of governance, commercials, contracts and auditing, not to mention the technical and operational customisation of the Halo Framework, given local data assets and market requirements.
At first reading, the technical infrastructure that surrounds the core hybrid measurement methodology may appear overly complex. However, the system is the first of its kind to deliver privacy-first media measurement while enabling the secure and fair use of census data combined with first party identifiers and demographics. If available, such data, corrected by the panel assets at the heart of the Halo Framework, help overcome the limits of those very same panels in the face of fragmented and personalised media. As such, the Halo Framework passes the parsimony test of Ockham’s razor.
This document outlines what is included in the Halo Framework and proposes how local markets can get started to evaluate and deploy their Local Measurement Service, if they decide to use it.
Throughout this document ‘Halo’ is principally used to describe the technologies underpinning the new Framework. But Halo is also used to refer to the industry consortium that has built this schematic, initially in service to the two markets leading local pilots of the tech.
Companies involved in the development process to date include advertisers, advertiser associations, digital platforms, measurement companies and others.
After two years of development work the Halo group is pleased to announce that the first release of the open-source Halo Cross-Media Measurement System (CMMS) is available under the Apache 2.0 software licence.
The Halo Framework employs a new set of technologies – primarily a Virtual People Framework and a Private Reach and Frequency Estimator (PRFE) – to enable comprehensive, always-on, privacy-preserving cross-media measurement. These technologies allow the system to deduplicate reach and frequency across multiple data providers’ (e.g. broadcasters and publishers) census data and non-census data alike, while using a panel for calibration and correction. Outputs are computed securely and in a way that preserves user privacy, without compromising accuracy.
However, the Halo Framework is not just a reference architecture and set of technology recommendations. Today it provides a set of documented ‘common components’ that include both software libraries and fully deployable systems that implement these technologies. Collectively, Halo's deployable components are called the Halo Cross-Media Measurement System (CMMS).
The CMMS orchestrates the interactions between participants that provide inputs like panels and event exposures, and others who desire measurement reports. To achieve this the CMMS provides various mechanisms for collecting the inputs and transforming them into the desired outputs. This broader framework that is composed of the CMMS and the participants with their inputs and desired outputs is called the Halo Cross-Media Measurement Framework (Halo Framework). To simplify integrations, the Halo Framework also provides a set of software libraries, documentation, and support channels.
The inputs to the CMMS and the outputs it produces allow for a range of configurations that can be determined by each market’s Local Measurement Service. A Local Measurement Service is operated by a local market under its own governance, commercial, auditing and contractual terms, keeping a view of global best practices in mind. These relationships are summarised in the following diagram.
Pic goes here
This document provides a high-level view of the Halo Framework and its CMMS, which includes both technological and non-technological aspects. The Halo Framework is designed to be deployed in multiple markets, so local markets need to understand what decisions are required to correctly deploy the Halo CMMS.
Therefore Halo is working on the next level of documentation that is split into two parts: Halo Core Specification: describes the roles and responsibilities of participants in any Local Measurement Service and the decisions that must be made to successfully deploy the CMMS locally. Halo Local Specification: provides a template that allows a local market to consolidate all the local decisions and configurations in one place.
These documents are introduced here so that the reader can appreciate in the following sections how local market deployments require local decisions, and allow for local market variations.
This document continues with a review of the requirements that drove the creation of the Halo Framework. We then move on to an overview of the technologies that power it. This is followed by a more detailed dive into the Halo Framework’s architecture. We conclude with a discussion of local market deployment considerations and proposed next steps.
In the WFA Industry Framework, Establishing Principles For A New Approach To Cross-Media Measurement articulates a set of North Star advertiser and advertiser supported industry requirements for cross-media measurement. These are summarised below.
-
Full Life Cycle Measurement: enable all phases of measurement Pre-campaign audience planning Intra-campaign audience and frequency management and optimization Post-campaign audience evaluation
-
Continuous: always-on data capture, no buy-side tagging required Advertisers who have opted-in and met other requirements as specified by the local market (e.g., subscriptions, cleared legal and data sharing aspects, etc.) can access measurement on an ongoing basis, rather than campaign-by-campaign, limited-duration, fee-based campaign tracking.
-
Comprehensive: cross-media reporting across all media formats Full comparable cross-channel measurement, including linear TV and all digital formats inclusive of all digital and traditional media platforms and publishers regardless of size, relationship to users, technical expertise, etc. Measurement of an entire campaign without regard to media format Applicability to all major global markets with the ability to adapt to market-specific innovations
-
Full Funnel: reach, frequency management and outcomes Deduplicated reach and frequency of media campaigns Integration of outcomes measurement to enable media audience and related analytics, such as attribution modelling, media mix modelling, brand lift, and sales lift
Privacy Centric: respect the consumer and safeguard user privacy by incorporating the highest level of technical privacy standards and guarantees in order to meet the privacy requirements of today and to provide a development roadmap for meeting them in the future
Fair & Objective: provide apples-to-apples comparison across TV and digital advertising through technology that solves for cross-media, open standards, and a neutral governance model
Global Trust & Transparency: technical design and implementation are sufficiently transparent to build trust in the measurement service via open-source implementations, audits and verification, etc.
Advertising & Content: capable of supporting both ads and content measurement with priority given to ads measurement
Privacy principles have a deep influence on technical design choices and use of specific privacy-centric technologies is required to meet current and future regulatory requirements, and evolving user expectations. The Halo Framework was designed to:
- minimise the risk of re-identifying consumers;
- allow users to control the collection and use of their data;
- protect panellist identity;
- allow for compliance with applicable global and local privacy laws and regulations. As such, the Halo Framework does not rely on identity graphs, browser fingerprinting, or third party cookies, although it is compatible with all of these.
Re-identification
Re-identification is the process by which records in a de-identified data set can be linked to specific individuals by combining them with records from another dataset, and is a risk that many industries face today. While it is impossible to fully eliminate the possibility of re-identification, to minimise its likelihood, data providers must have both strong contractual protections and quantifiable technical guarantees that guard against re-identification. As such, the Halo Framework incorporates technical measures like differential privacy and secure multiparty computation to help ensure that no participating entity learns more identifying information about individual users than the entity had before participating in this system, and that the ability to re-identify users is minimised.
User Control
Data providers should be able to provide their users with transparency and control over the collection and use of their data as it pertains to its availability in the measurement system. The Halo Framework has allowed for this through the adoption of measurement technologies that allow user data to remain on systems that are either controlled by the data provider itself or when that is not feasible, a trusted delegate. The system also employs technical guarantees that prevent the data from being used for anything other than verified measurement use cases.
Special Consideration for Panellists
Panellists are those individuals and households who have provided consent for data providers to share their data for measurement. As such, to ensure fair and objective measurement, panellist identities must not be divulged to any data provider, save for those that are responsible for the panel itself and, when necessary, for the construction of reach models by a model provider. As with re-identification, the Halo Framework assumes a set of strong contractual protections and quantifiable technical guarantees to ensure this, and further assumes that any controls available to non-panelist users are available to panellists as well. Finally, as the technology progresses, we look forward to model training techniques that would preclude even the model provider from learning panellist identities.
Privacy Laws and Regulations
While the Halo Framework has been constructed with an eye toward complying with applicable global and local privacy laws and regulations, whether the solution actually achieves this is left to local markets to determine as their particular circumstances warrant.
As built, data from publishers, including campaign metadata, event data, and any derived aggregate, may only be used for the purposes of enabling specific cross-media advertising measurement use cases for individual advertisers, agencies, or other authorised users as determined by local markets. Similarly, once computed, outputs of the system are only made available to those authorised users (or their delegates) who requested them. Achieving this currently requires that each advertiser or agency opt-in to using a Local Measurement Service deployment of the Halo CMMS with each participating data provider. Additional mechanisms are being explored to make this process less onerous.
Once opted-in, a cryptographic consent signalling system, the details of which are beyond the scope of this document, ensures that data can only be accessed and decrypted by authorised parties.
As part of the construction of the initial Halo 2020 Blueprint, the below requirements were consolidated and divided into three categories: foundational features, reach and frequency use cases, and an advanced feature set to include outcomes. The following stack rank was the result of this exercise:
Insert table here
We are pleased to report that the progress on this list has been substantial and, as of April 2023, the Halo CMMS supports appropriately granular, always-on reach and frequency metrics for the set of basic segments, while an additional component supports the R/F reporting capability. APIs are provided for all functions. We will return to this list in the last section of this document, when considering Halo’s future directions.
The Halo Framework is built around two technological pillars: (1) the Virtual People Framework; and (2) the Private Reach and Frequency Estimator (PRFE). Together these technologies allow the system to offer accurate measurement of reach, frequency and other key metrics, while simultaneously preserving consumer privacy. The next two sections explore each of these pillars in more detail.
Insert Pic here
A common method for measuring reach in the target population is desired to compute deduplicated reach across data providers and media types. This method must allow data providers (also known as Event Data Providers or EDPs) to combine their exposure data at the census-level, and this process must be consistent across all channels. The method must also be capable of accommodating any raw measurement data including single-source panels, measurement panels, non-census sources like set top box (STB) data, and full-census sources like digital event logs. Achieving this while providing a high quality solution that adheres to the above privacy principles is a considerable challenge.
Halo’s Virtual People Framework (VID Framework) consists of several processes, the overall job of which is to calibrate and map raw measurement data, including digital events and non-census measurement data, onto a unified census-level representation of the population under measurement. Three core data assets are required to achieve this: (1) a population enumeration survey of the population under measurement; (2) one or more panels for modelling and debiasing; and (3) when available, raw exposure event data, first party identifiers, and demographic profiles for the panellists (used for modelling) and for all users (used for measurement) from as many sources as possible. Additional non-census data may also be used if desired. Note that use of a single-source panel is preferred. The diagram below shows the overall flow, which is expanded upon in the next several sections.
Insert Pic here
The Virtual People Framework uses the Universe Estimates from the Enumeration Survey to generate a model of the population under measurement. In this process an identifier, which is called a Virtual Person ID or VID, is generated for each individual in the target population. The VID is then assigned demographic attributes and other metadata, for example a geographic location so that the distribution of the characteristics of VIDs match those of the population under measurement. Note however, that the population of VIDs is synthetic, which means that while all individuals in the population are represented by the set of VIDs, there is no correspondence between any particular VID and a real person. The output of this step is a Virtual Population that represents the population under measurement.
Insert Pic here
A Virtual People Model (VID Model) is created by combining the Virtual Population, the panel(s), and the panellists’ raw exposure event data from all available sources, along with any other desired measurement data. All panellists should be associated with the participating data providers’ ad and/or content exposure event information, including user device identifiers, and other contextual and demographic information useful for developing measurement models. To the extent that multiple panels are available (e.g, a single-source panel and other single-media panels), they may be combined.
To ensure completeness of exposure event information, the Halo CMMS requires that all panellists consent to having their digital exposure event logs shared with the Panel Provider. Digital exposure event logs for all panellists can then be provided by participating Census Event Data Providers (an Event Data Provider that provides census data to the CMMS), whose logs are queried by the Panel Provider via a double-blind cryptographic protocol. The protocol ensures that Census EDPs do not learn the identity of any panellist, while third-party audits ensure that Panel Providers do not query for logs to which they are not entitled. Throughout this process only data that belongs to consenting panellists leaves the Census EDP’s environment. Additional details are provided in the Design Overview below.
When combined, the panel data and the census logs allow the modeller to learn the relationship between email addresses, device IDs and other identifiers for panellists across all media properties, the details of which are used to construct a VID Model that faithfully reproduces the reach curve for each media property. As part of this process, demographic data associated with the panel serves as ground truth to debias the demographic signals present in the digital event data. During this process the relationship to non-census measurement data may also be considered.
In summary, common steps for building VID Models are:
de-biasing census data and data providers' demographic information
determining the relationship between user IDs and people across sources, and
determining how media consumption varies across different devices and media.
Virtual People Labelling uses the Virtual People Model to label raw digital event data with VIDs. During this process, each event record and its accompanying exposure metadata is labelled with one or more VIDs, where multiple VIDs may be produced to accommodate co-viewing or account sharing scenarios. The result is that each record is labelled with one or more VIDs, which can be easily counted and deduplicated.
Virtual People Modeling A Virtual People Model (VID Model) is created by combining the Virtual Population, the panel(s), and the panellists’ raw exposure event data from all available sources, along with any other desired measurement data. All panellists should be associated with the participating data providers’ ad and/or content exposure event information, including user device identifiers, and other contextual and demographic information useful for developing measurement models. To the extent that multiple panels are available (e.g, a single-source panel and other single-media panels), [they may be combined](https://research.google/pubs/pub42246/).
Figure: Virtual People Modelling
To ensure completeness of exposure event information, the Halo CMMS requires that all panellists consent to having their digital exposure event logs shared with the Panel Provider. Digital exposure event logs for all panellists can then be provided by participating Census Event Data Providers (an Event Data Provider that provides census data to the CMMS), whose logs are queried by the Panel Provider via a double-blind cryptographic protocol. The protocol ensures that Census EDPs do not learn the identity of any panellist, while third-party audits ensure that Panel Providers do not query for logs to which they are not entitled. Throughout this process only data that belongs to consenting panellists leaves the Census EDP’s environment. Additional details are provided in the Design Overview below.
When combined, the panel data and the census logs allow the modeller to learn the relationship between email addresses, device IDs and other identifiers for panellists across all media properties, the details of which are used to construct a VID Model that faithfully reproduces the reach curve for each media property. As part of this process, demographic data associated with the panel serves as ground truth to debias the demographic signals present in the digital event data. During this process the relationship to non-census measurement data may also be considered.
In summary, common steps for building VID Models are: de-biasing census data and data providers' demographic information determining the relationship between user IDs and people across sources, and determining how media consumption varies across different devices and media. Virtual People Labelling Virtual People Labelling uses the Virtual People Model to label raw digital event data with VIDs. During this process, each event record and its accompanying exposure metadata is labelled with one or more VIDs, where multiple VIDs may be produced to accommodate co-viewing or account sharing scenarios. The result is that each record is labelled with one or more VIDs, which can be easily counted and deduplicated.
WFA ANA ISBA
sidebar