• Blog
  • First-party data
  • Publishers

Publisher transformation: Machine learning experts

This article explores what is required to operationalise modelling to scale data and increase audience reach.

Relationship-Still_400x400

As regulators and browsers increasingly focus on privacy practices in digital advertising, advertisers can only apply audience targeting for 30% of the open web. Equipped with the right tools and expertise in machine-learned modelling, publishers can fill the growing data gap and ensure advertisers don’t miss out on valuable audiences while growing revenue from advertising. 

Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones. 

In this article, the second in a two-part series, we’re exploring what is required to operationalise modelling at web scale. See part one for three types of modelling that publishers can leverage to build a scaled data offering.

Taking this a step further and operationalising these models – making them scale to millions of users in real time – requires machine learning expertise and the right system design. 

Operationalising Models

In the new era of digital advertising, publishers have to become experts in machine learning. This is a complex task that requires data science expertise but also the technical infrastructure to operationalise models and make them scale to millions of users in real time. Generally, there are four components to this:

  • Seed Data

The most important part of any ML effort is access to the right seed data. A model can only be as good as the data you feed into it. For publishers, this can be interest data that they collect from user interactions, it can be declared data from registered users or surveys, or it can be declared data from data partners. Generally, the larger the seed dataset is, the better the model can be trained. However, you don’t want to compromise quality for quantity. If you use loads of low-quality data, don’t expect great results.

  • Feature Engineering

If you want to maximise scale and quality, you will have to make predictions in real-time. This requires you to think about feature engineering, meaning the aggregation of raw events into a format (user state) that a model trains against and that is then used to make predictions. This step can be the hardest part: As a publisher, you need to update millions of user states in real time, enabling you to target users before they leave your site (and may never return). Edge computing can help with this: Moving the feature engineering to the device removes any latency and is highly scalable.

  • Model Selection

It’s critical to select the most appropriate model for your given use case. This requires an understanding of the problem at hand and a pragmatic decision of how complex your model needs to be. Luckily, there are a lot of algorithms to pick from, but it requires expertise, experience and experimentation to find the right mix of complexity and practicability. More complexity isn’t always better: The simpler your model can be to achieve a desired outcome, the better.

  • Inference

Once you have trained your model and you have access to a user’s real-time state, you can feed that into the model to get predictions. This requires an inference service that returns predictions in milliseconds. For simpler models, like logistic regression, the inference could happen on device, eliminating any latency. For complex and large models using a neural network for deep learning, it might be more practical for the inference to happen in the cloud.

Sourcing the right seed data and building the appropriate model are fundamental steps. But scaling the model to millions of users and making real-time predictions is the hard part. And real-time predictions are crucial for publishers: If you get your predictions in a nightly batch job, you will always be one step behind. This also means you’ve lost your chance to address all those users who have one session and then never return to your site. Slow predictions aren’t so much an issue when you want to send personalised emails, but they have a direct impact on your revenue when you need to serve campaigns to users while they are on your site.

The opportunity ahead 

Privacy is becoming one of the most important macro trends in the industry, driven by both regulators and browser vendors. This creates a huge opportunity for publishers, as they are in a unique position to allow advertisers to continue reaching their customers and to do so in a privacy-safe and scalable way across all platforms on the open web. However, publishers need the right tools to create a robust and differentiated audience offering and machine learning is an important tool to achieve that. Publishers need access to the right modelling solutions for specific modelling problems, and they need the right infrastructure that enables them to scale this at web scale.

 

Want to know more? Speak to an expert

In this article
    You may be interested in
    Blog
    Agency performance
    Agentic advertising
    Curation
    Data-enriched PMP
    Programmatic efficiency
    Programmatic waste

    From scale to accountability: The new agency value equation

    Learn More
    Blog
    Agency performance
    Agentic advertising
    Curation
    Data-enriched PMP
    Programmatic efficiency
    Programmatic waste

    The curation revolution: Rebuilding trust and transparency in programmatic

    Learn More

    Keep going, there's more to uncover.

    From scale to accountability: The new agency value equation

    Discover how leading agencies are rebuilding efficiency and accountability in programmatic media. Learn the 3-step blueprint to reclaim ROI, reduce waste, and deliver verifiable performance through data-enriched PMPs.

    Learn more

    The curation revolution: Rebuilding trust and transparency in programmatic

    Discover how leading agencies are rebuilding efficiency and accountability in programmatic media. Learn the 3-step blueprint to reclaim ROI, reduce waste, and deliver verifiable performance through data-enriched PMPs.

    Learn more

    The performance paradox: Why programmatic efficiency is broken

    Discover how leading agencies are rebuilding efficiency and accountability in programmatic media. Learn the 3-step blueprint to reclaim ROI, reduce waste, and deliver verifiable performance through data-enriched PMPs.

    Learn more

    The new agency value equation: Reclaiming client ROI from programmatic waste

    Discover how leading agencies are rebuilding efficiency and accountability in programmatic media. Learn the 3-step blueprint to reclaim ROI, reduce waste, and deliver verifiable performance through data-enriched PMPs.

    Learn more

    Curation as a revenue diversification strategy: Lessons from The Arena Group

    The Arena Group is finding new ways to take control of its data and revenue. Discover their powerful new approach centered on curation.

    Learn more

    Panel with Ruowen and Angus Maclaine (Founder at Fundamental Group), moderated by Art Zeidman (MD, Agency & Programmatic Sales at Permutive).

    How Acxiom and Fundamental Group are solving for the Outcomes Era

    Acxiom’s Ruowen Liscio and Fundamental Group’s Angus Maclaine discuss solutions for the Outcomes Era in advertising. Discover how predictive targeting, curation, and AI are delivering results in a privacy-first world.

    Learn more