Blog
First-party data
Publishers

Publisher transformation: Machine learning experts

This article explores what is required to operationalise modelling to scale data and increase audience reach.

Written by

The Permutive Team

As regulators and browsers increasingly focus on privacy practices in digital advertising, advertisers can only apply audience targeting for 30% of the open web. Equipped with the right tools and expertise in machine-learned modelling, publishers can fill the growing data gap and ensure advertisers don’t miss out on valuable audiences while growing revenue from advertising.

Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones.

In this article, the second in a two-part series, we’re exploring what is required to operationalise modelling at web scale. See part one for three types of modelling that publishers can leverage to build a scaled data offering.

Taking this a step further and operationalising these models – making them scale to millions of users in real time – requires machine learning expertise and the right system design.

Operationalising Models

In the new era of digital advertising, publishers have to become experts in machine learning. This is a complex task that requires data science expertise but also the technical infrastructure to operationalise models and make them scale to millions of users in real time. Generally, there are four components to this:

Seed Data

The most important part of any ML effort is access to the right seed data. A model can only be as good as the data you feed into it. For publishers, this can be interest data that they collect from user interactions, it can be declared data from registered users or surveys, or it can be declared data from data partners. Generally, the larger the seed dataset is, the better the model can be trained. However, you don’t want to compromise quality for quantity. If you use loads of low-quality data, don’t expect great results.

Feature Engineering

If you want to maximise scale and quality, you will have to make predictions in real-time. This requires you to think about feature engineering, meaning the aggregation of raw events into a format (user state) that a model trains against and that is then used to make predictions. This step can be the hardest part: As a publisher, you need to update millions of user states in real time, enabling you to target users before they leave your site (and may never return). Edge computing can help with this: Moving the feature engineering to the device removes any latency and is highly scalable.

Model Selection

It’s critical to select the most appropriate model for your given use case. This requires an understanding of the problem at hand and a pragmatic decision of how complex your model needs to be. Luckily, there are a lot of algorithms to pick from, but it requires expertise, experience and experimentation to find the right mix of complexity and practicability. More complexity isn’t always better: The simpler your model can be to achieve a desired outcome, the better.

Inference

Once you have trained your model and you have access to a user’s real-time state, you can feed that into the model to get predictions. This requires an inference service that returns predictions in milliseconds. For simpler models, like logistic regression, the inference could happen on device, eliminating any latency. For complex and large models using a neural network for deep learning, it might be more practical for the inference to happen in the cloud.

Sourcing the right seed data and building the appropriate model are fundamental steps. But scaling the model to millions of users and making real-time predictions is the hard part. And real-time predictions are crucial for publishers: If you get your predictions in a nightly batch job, you will always be one step behind. This also means you’ve lost your chance to address all those users who have one session and then never return to your site. Slow predictions aren’t so much an issue when you want to send personalised emails, but they have a direct impact on your revenue when you need to serve campaigns to users while they are on your site.

The opportunity ahead

Privacy is becoming one of the most important macro trends in the industry, driven by both regulators and browser vendors. This creates a huge opportunity for publishers, as they are in a unique position to allow advertisers to continue reaching their customers and to do so in a privacy-safe and scalable way across all platforms on the open web. However, publishers need the right tools to create a robust and differentiated audience offering and machine learning is an important tool to achieve that. Publishers need access to the right modelling solutions for specific modelling problems, and they need the right infrastructure that enables them to scale this at web scale.

Want to know more? Speak to an expert.

In this article

You may be interested in

Announcements

Press

Data Collaboration

Permutive appoints ad tech veteran Aline Zenses as Managing Director to lead DACH expansion

Learn More

Videos

Data Collaboration

First-party data

Non-endemic

Publishers

Turning trusted communities into revenue: How Mumsnet uses Permutive for first-party data collaboration

Learn More

Keep going, there's more to uncover.

Back to resources

Announcements Press

Permutive appoints ad tech veteran Aline Zenses as Managing Director to lead DACH expansion

Permutive, the data collaboration platform powering the advertising ecosystem, today announced the appointment of Aline Zenses as Managing Director for the DACH region.

Learn more

Videos

Turning trusted communities into revenue: How Mumsnet uses Permutive for first-party data collaboration

By empowering Mumsnet to segment its audience into meaningful cohorts, Permutive has enabled the publisher to surface new opportunities for both endemic and non-endemic brands.

Learn more

Upcoming events & webinars

Webinar: Keine Logins? Kein Problem

Wie Publisher die Nachfrage von Werbekunden mit demografischen First-Party-Daten aus Umfragen erfüllen können.

Learn more

Blog

Permutive earns top G2 rankings in Summer 2025 Report, including #1 Momentum Leader

Recognition across customer satisfaction, market momentum, and support highlights our dedication to enabling customer growth in the new era of data collaboration.

Learn more

Upcoming events & webinars

The Data Collaboration Summit

Permutive’s Summit showcases how collaboration and direct relationships can reimagine advertising and boost Open Internet performance.

Learn more

Upcoming events & webinars

Let’s meet at DMEXCO 2025!

Expand your data collaboration and achieve incremental outcomes. Meet the Permutive team and let’s reimagine advertising together.

Learn more

Publisher transformation: Machine learning experts

Written by

The Permutive Team

Operationalising Models

The opportunity ahead

You may be interested in

Permutive appoints ad tech veteran Aline Zenses as Managing Director to lead DACH expansion

Turning trusted communities into revenue: How Mumsnet uses Permutive for first-party data collaboration

Keep going, there's more to uncover.

Permutive appoints ad tech veteran Aline Zenses as Managing Director to lead DACH expansion

Turning trusted communities into revenue: How Mumsnet uses Permutive for first-party data collaboration

Webinar: Keine Logins? Kein Problem

Permutive earns top G2 rankings in Summer 2025 Report, including #1 Momentum Leader

The Data Collaboration Summit

Let’s meet at DMEXCO 2025!

Partners

Careers

GitHub

LinkedIn