Blog
First-party data
Publishers

Publisher transformation: Machine learning experts

This article explores what is required to operationalise modelling to scale data and increase audience reach.

Written by

The Permutive Team

As regulators and browsers increasingly focus on privacy practices in digital advertising, advertisers can only apply audience targeting for 30% of the open web. Equipped with the right tools and expertise in machine-learned modelling, publishers can fill the growing data gap and ensure advertisers don’t miss out on valuable audiences while growing revenue from advertising.

Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones.

In this article, the second in a two-part series, we’re exploring what is required to operationalise modelling at web scale. See part one for three types of modelling that publishers can leverage to build a scaled data offering.

Taking this a step further and operationalising these models – making them scale to millions of users in real time – requires machine learning expertise and the right system design.

Operationalising Models

In the new era of digital advertising, publishers have to become experts in machine learning. This is a complex task that requires data science expertise but also the technical infrastructure to operationalise models and make them scale to millions of users in real time. Generally, there are four components to this:

Seed Data

The most important part of any ML effort is access to the right seed data. A model can only be as good as the data you feed into it. For publishers, this can be interest data that they collect from user interactions, it can be declared data from registered users or surveys, or it can be declared data from data partners. Generally, the larger the seed dataset is, the better the model can be trained. However, you don’t want to compromise quality for quantity. If you use loads of low-quality data, don’t expect great results.

Feature Engineering

If you want to maximise scale and quality, you will have to make predictions in real-time. This requires you to think about feature engineering, meaning the aggregation of raw events into a format (user state) that a model trains against and that is then used to make predictions. This step can be the hardest part: As a publisher, you need to update millions of user states in real time, enabling you to target users before they leave your site (and may never return). Edge computing can help with this: Moving the feature engineering to the device removes any latency and is highly scalable.

Model Selection

It’s critical to select the most appropriate model for your given use case. This requires an understanding of the problem at hand and a pragmatic decision of how complex your model needs to be. Luckily, there are a lot of algorithms to pick from, but it requires expertise, experience and experimentation to find the right mix of complexity and practicability. More complexity isn’t always better: The simpler your model can be to achieve a desired outcome, the better.

Inference

Once you have trained your model and you have access to a user’s real-time state, you can feed that into the model to get predictions. This requires an inference service that returns predictions in milliseconds. For simpler models, like logistic regression, the inference could happen on device, eliminating any latency. For complex and large models using a neural network for deep learning, it might be more practical for the inference to happen in the cloud.

Sourcing the right seed data and building the appropriate model are fundamental steps. But scaling the model to millions of users and making real-time predictions is the hard part. And real-time predictions are crucial for publishers: If you get your predictions in a nightly batch job, you will always be one step behind. This also means you’ve lost your chance to address all those users who have one session and then never return to your site. Slow predictions aren’t so much an issue when you want to send personalised emails, but they have a direct impact on your revenue when you need to serve campaigns to users while they are on your site.

The opportunity ahead

Privacy is becoming one of the most important macro trends in the industry, driven by both regulators and browser vendors. This creates a huge opportunity for publishers, as they are in a unique position to allow advertisers to continue reaching their customers and to do so in a privacy-safe and scalable way across all platforms on the open web. However, publishers need the right tools to create a robust and differentiated audience offering and machine learning is an important tool to achieve that. Publishers need access to the right modelling solutions for specific modelling problems, and they need the right infrastructure that enables them to scale this at web scale.

Want to know more? Speak to an expert.

In this article

You may be interested in

Blog

Agency performance

Agentic advertising

Curation

Data-enriched PMP

Programmatic efficiency

Programmatic waste

From scale to accountability: The new agency value equation

Learn More

Blog

Agency performance

Agentic advertising

Curation

Data-enriched PMP

Programmatic efficiency

Programmatic waste

The curation revolution: Rebuilding trust and transparency in programmatic

Learn More

Keep going, there's more to uncover.

Back to resources

Blog

From scale to accountability: The new agency value equation

Discover how leading agencies are rebuilding efficiency and accountability in programmatic media. Learn the 3-step blueprint to reclaim ROI, reduce waste, and deliver verifiable performance through data-enriched PMPs.

Learn more

Blog

The curation revolution: Rebuilding trust and transparency in programmatic

Learn more

Blog

The performance paradox: Why programmatic efficiency is broken

Learn more

Guides & whitepapers

The new agency value equation: Reclaiming client ROI from programmatic waste

Learn more

Videos

Curation as a revenue diversification strategy: Lessons from The Arena Group

The Arena Group is finding new ways to take control of its data and revenue. Discover their powerful new approach centered on curation.

Learn more

Panel with Ruowen and Angus Maclaine (Founder at Fundamental Group), moderated by Art Zeidman (MD, Agency & Programmatic Sales at Permutive).

Videos

How Acxiom and Fundamental Group are solving for the Outcomes Era

Acxiom’s Ruowen Liscio and Fundamental Group’s Angus Maclaine discuss solutions for the Outcomes Era in advertising. Discover how predictive targeting, curation, and AI are delivering results in a privacy-first world.

Learn more

Halo is Permutive’s agentic suite that infinitely scales direct buying - LEARN MORE

Publisher transformation: Machine learning experts

Written by

The Permutive Team

Operationalising Models

The opportunity ahead

You may be interested in

From scale to accountability: The new agency value equation

The curation revolution: Rebuilding trust and transparency in programmatic

Keep going, there's more to uncover.

From scale to accountability: The new agency value equation

The curation revolution: Rebuilding trust and transparency in programmatic

The performance paradox: Why programmatic efficiency is broken

The new agency value equation: Reclaiming client ROI from programmatic waste

Curation as a revenue diversification strategy: Lessons from The Arena Group

How Acxiom and Fundamental Group are solving for the Outcomes Era

Partners

Careers

GitHub

LinkedIn