- Blog
- First-party data
- Publishers
Publisher transformation: Next-gen data providers for audience targeting
This article focuses on three types of modelling that publishers and brands are exploring to scale data and increase audience reach.
As regulators and browsers increasingly focus on privacy practices in digital advertising, advertisers can only apply audience targeting for 30% of the open web. Equipped with the right tools and expertise in machine-learned modelling, publishers can fill the growing data gap and ensure advertisers don’t miss out on valuable audiences while growing advertising revenue.
Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones.
In this article, the first in a two-part series, we’re exploring three types of modelling that publishers can leverage to build a scaled data offering. See part two for what is required to operationalise those models at web scale.
Modelling isn’t new to the ecosystem. For a long time, traditional data providers have taken high-quality datasets and applied their proprietary modelling to make them scale for digital advertising applications. As those data providers are losing their capability to do so, publishers equipped with the right tools are stepping up.
There are three types of modelling that publishers and brands are exploring to scale data and increase audience reach.
1. Lookalike Modelling
Lookalike modelling is the most prevalent modelling technique across ad tech and marketing platforms. Lookalike modelling is a powerful tool to extend an interest, behavioural or intent-based audience. Simply, it allows you to find users who look similar to a specific audience – the seed dataset. Looking similar typically means that those users have shared characteristics with the seed audience, such as engaging with similar content or exhibiting comparable on-site behaviours.
With lookalike modelling, there are usually two sets: the seed dataset (positive set) and all other users who do not belong to the seed (audience pool). The audience pool is the audience base where we search for users who are similar to the seed dataset. Users in the audience pool will have various degrees of similarity to users in the seed dataset; some will be very similar, while others will be very dissimilar. The lookalike model assigns every user in the audience pool a similarity value, which can then be used to make targeting decisions.
There are many use cases for which lookalike models work very well, including when you want to find users that look similar to existing customers, similar to those that have engaged with a specific campaign or those that have shown a specific interest or intent.
2. Classification Modelling
However, there are use cases where you want to categorise users into discrete groups (classes). That is typically the case for socio-demographic targeting, where you expect distinct groups of users. If you were to build independent lookalike models, you would often find a significant overlap between the expanded audiences:
This is because both LAL models run independently from each other. Using the example from the above illustration, even if a user has a higher similarity score for one of the two models, they would erroneously be bucketed into both expanded audiences if they meet the minimum threshold for both.
The other challenge is that lookalike models only know of the seed audience (positive set) and “everyone else” (audience pool). If the positive set are declared “singles”, the negative set would include both declared “in relationship” users and those with no labels. This can result in the model prioritising the wrong features: Often, declared data is only available for registered users, so the model might pick up on this and predict that registered users are more likely to be singles. So not only would you get an overlap in expanded audiences, where you don’t expect any, but you might also get a result with suboptimal performance.
Classification models help solve those challenges. A single model takes multiple seed audiences and, therefore, knows of all possible classes for a specific category or attribute. In our example, it would know of users who are labelled as “in a relationship”, those who are labelled as “single”, and those for whom no relationship data is available. This has two main benefits:
- Feature selection: The model will pick those features that are most suitable to distinguish between the different classes. It can do so because it knows all of the possible classes.
- Mutual exclusivity: The model can give you a single classification when it makes a prediction for a user. That single score would be where the model has the highest confidence that it applies to the current user.
Below is an illustration of how a classification model takes three seed labels (let’s say three age buckets) as an input and then classifies the rest of the audience into one of those:
3. Event Predictions
Another powerful type of modelling in digital marketing is event predictions. These models give you a prediction of how likely a user is to perform a certain action. Sometimes, this model type is also called conversion prediction, but it’s not limited to a user purchasing a product. It can be any action that you want to optimise towards. This could be a user clicking on an ad, purchasing a subscription, or clicking on an affiliate link.
Lookalike and classification models are about reaching an audience that’s most likely to respond well to a specific campaign. Event prediction models, in contrast, are not about how a user can be described but about their likelihood to perform a certain action.
You can take this one step further with uplift modelling. While prediction models tell you how likely a user is to convert (or any other event), uplift modelling takes into account how this can be influenced by a certain action, e.g. showing a message. The difference is understanding how likely something is to happen with or without an action and, therefore, optimising the incremental uplift from an action. An action could, for example, be showing a special offer for subscriptions.
Uplift models typically categorise users into four groups:
1. sure things: users who are likely to convert anyway
2. persuadables: users who will only convert if they are shown the offer
3. do not disturbs: users who will only convert if they don’t receive the offer
4. lost causes: users who won’t convert regardless
Uplift modelling requires a control and variation group to understand the net effect of an action. Compared to pure prediction modelling, it’s about causation vs. correlation.
The opportunity ahead
Privacy is becoming one of the industry’s most important macro trends, driven by regulators and browser vendors. This creates a huge opportunity for publishers, as they are in a unique position to allow advertisers to continue reaching their customers and to do so in a privacy-safe and scalable way across all platforms on the open web. However, publishers need the right tools to create a robust and differentiated audience offering; machine learning is an important tool. Publishers need access to the right modelling solutions for specific modelling problems and the right infrastructure that enables them to scale this at web scale.
Want to know more? Speak to an expert.
You may be interested in
The Outcomes Era is here: How publishers can win back the Open Internet
Learn MoreLaunch Event: Meet Permutive’s Halo agents – infinitely scale direct buying
Learn MoreKeep going, there's more to uncover.
The Outcomes Era is here: How publishers can win back the Open Internet
Permutive CEO Joe Root announces the launch of Halo, a suite of sell-side AI agents, to help publishers build better ad products and compete with Walled Gardens on outcomes. This suite is designed to enable publishers to scale direct buying infinitely in the new era of advertising defined by AI and agentic buying protocols.
Launch Event: Meet Permutive’s Halo agents – infinitely scale direct buying
Join Permutive’s CEO, Joe Root, and VP Product, Chloe Grutchfield, as they unveil Permutive’s Halo, the agentic suite that automates, predicts, and scales outcomes across publisher and advertiser data collaboration.
Introducing Halo Agents
Halo, Permutive’s new agentic suite, is built to drive revenue growth by enabling – for the first time – direct buying to scale infinitely across publishers and campaigns
The Publisher’s Strategic Advantage (3/3)
Learn from Aline Zenses, Permutive’s MD of DACH, on why a powerful outcome-driven sales narrative is only as good as your team's ability to execute it. In this series, we’ve covered why publishers now hold the data advantage and how to package that data into an outcome-driven sales story.
The Publisher’s Strategic Advantage (2/3)
Learn from Aline Zenses, Permutive’s MD of DACH, on how the data advantage has shifted towards premium publishers, and why the critical next step is to translate that edge into a compelling sales narrative. Your first-party data isn't just for targeting; it's your most valuable product.
Navigating CTV: TelevisaUnivision’s addressability blueprint
Audience fragmentation is the top obstacle for maximizing CTV publisher revenue. Watch this on-demand webinar, “Navigating CTV,” to learn TelevisaUnivision’s successful blueprint. You’ll hear how they overcame fragmentation using a unified, privacy-safe ID to activate audiences at scale and unlock the full value of their premium video inventory.


