• Blog
  • First-party data
  • Publishers

Publisher transformation: Machine learning experts

This article explores what is required to operationalise modelling to scale data and increase audience reach.

Relationship-Still_400x400

As regulators and browsers increasingly focus on privacy practices in digital advertising, advertisers can only apply audience targeting for 30% of the open web. Equipped with the right tools and expertise in machine-learned modelling, publishers can fill the growing data gap and ensure advertisers don’t miss out on valuable audiences while growing revenue from advertising. 

Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones. 

In this article, the second in a two-part series, we’re exploring what is required to operationalise modelling at web scale. See part one for three types of modelling that publishers can leverage to build a scaled data offering.

Taking this a step further and operationalising these models – making them scale to millions of users in real time – requires machine learning expertise and the right system design. 

Operationalising Models

In the new era of digital advertising, publishers have to become experts in machine learning. This is a complex task that requires data science expertise but also the technical infrastructure to operationalise models and make them scale to millions of users in real time. Generally, there are four components to this:

  • Seed Data

The most important part of any ML effort is access to the right seed data. A model can only be as good as the data you feed into it. For publishers, this can be interest data that they collect from user interactions, it can be declared data from registered users or surveys, or it can be declared data from data partners. Generally, the larger the seed dataset is, the better the model can be trained. However, you don’t want to compromise quality for quantity. If you use loads of low-quality data, don’t expect great results.

  • Feature Engineering

If you want to maximise scale and quality, you will have to make predictions in real-time. This requires you to think about feature engineering, meaning the aggregation of raw events into a format (user state) that a model trains against and that is then used to make predictions. This step can be the hardest part: As a publisher, you need to update millions of user states in real time, enabling you to target users before they leave your site (and may never return). Edge computing can help with this: Moving the feature engineering to the device removes any latency and is highly scalable.

  • Model Selection

It’s critical to select the most appropriate model for your given use case. This requires an understanding of the problem at hand and a pragmatic decision of how complex your model needs to be. Luckily, there are a lot of algorithms to pick from, but it requires expertise, experience and experimentation to find the right mix of complexity and practicability. More complexity isn’t always better: The simpler your model can be to achieve a desired outcome, the better.

  • Inference

Once you have trained your model and you have access to a user’s real-time state, you can feed that into the model to get predictions. This requires an inference service that returns predictions in milliseconds. For simpler models, like logistic regression, the inference could happen on device, eliminating any latency. For complex and large models using a neural network for deep learning, it might be more practical for the inference to happen in the cloud.

Sourcing the right seed data and building the appropriate model are fundamental steps. But scaling the model to millions of users and making real-time predictions is the hard part. And real-time predictions are crucial for publishers: If you get your predictions in a nightly batch job, you will always be one step behind. This also means you’ve lost your chance to address all those users who have one session and then never return to your site. Slow predictions aren’t so much an issue when you want to send personalised emails, but they have a direct impact on your revenue when you need to serve campaigns to users while they are on your site.

The opportunity ahead 

Privacy is becoming one of the most important macro trends in the industry, driven by both regulators and browser vendors. This creates a huge opportunity for publishers, as they are in a unique position to allow advertisers to continue reaching their customers and to do so in a privacy-safe and scalable way across all platforms on the open web. However, publishers need the right tools to create a robust and differentiated audience offering and machine learning is an important tool to achieve that. Publishers need access to the right modelling solutions for specific modelling problems, and they need the right infrastructure that enables them to scale this at web scale.

 

Want to know more? Speak to an expert

In this article
    You may be interested in
    Upcoming events & webinars

    The Data Collaboration Summit

    Learn More
    Upcoming events & webinars

    Let’s meet at DMEXCO 2025!

    Learn More

    Keep going, there's more to uncover.

    The Data Collaboration Summit

    With addressability declining, the Open Internet is under strain. The opportunity has never been greater to collaborate directly with premium publishers and tap into the power of their audience signals and media.

    Learn more

    Let’s meet at DMEXCO 2025!

    Expand your data collaboration and achieve incremental outcomes. Meet the Permutive team and let’s reimagine advertising together.

    Learn more

    Permutive and Ocado Ads launch unique audience targeting capabilities for publishers and advertisers to drive incremental outcomes

    Future, Immediate, The Independent and News UK join as inaugural publication partners. Permutive, the data collaboration platform powering the advertising ecosystem, is excited to announce a new partnership with Ocado Ads, the Retail Media Network of the UK’s fastest growing retailer. The collaboration means that publishers can now securely enrich their first-party data with Ocado’s extensive retail insights, creating unique targeting opportunities for brands and advertisers to achieve incremental full funnel outcomes on the Open Internet. The partnership builds on the 2024 launch of Ocado Ads, which offers the most complete data set in market…

    Learn more

    Permutive Data Collaboration Platform

    Expand your data collaboration and achieve incremental outcomes with Permutive's Data Collaboration Platform combining the capabilities of a DMP, a data clean room, and curation solutions.

    Learn more

    Data-driven publishing: How The Independent maximizes audience insights with Permutive

    Discover how The Independent leverages a data strategy powered by Permutive to transform audience understanding and boost advertising effectiveness.

    Learn more

    The hidden flaw in your data collaboration strategy (and how to fix it)

    Data collaboration is the accepted foundation for the next era of advertising. Across the industry, brands and publishers are investing in strategies to share data safely and drive performance. But a critical question remains: why are so many of these initiatives failing to deliver on their full potential? The answer often lies in a hidden, outdated assumption about the role of identity. For years, identity was the whole story. Today, relying solely on legacy IDs within a collaboration framework means your efforts are blind to over 70% of the open web, severely limiting your scale…

    Learn more