Audience data, both 1st and 3rd-party, have so far not been discussed as a concern in the fight against ad fraud. Because of this, now there are serious misonceptions about what might be effective as a method for reducing exposure to ad fraud.
Recently there has been suggestions in the adtech industry towards direct buying of media being a solution for reducing or even removing exposure to ad fraud. These claims are not founded on facts, and are easy to show to be misleading at best and dangerous at worst. To understand on what basis these such claims are unfounded, further investigation on the topic is required.
Now that 1st or 3rd party data is used together with media buys, an increasingly popular practice, all such buys become vulnerable to fraud. This applies to smart-TV media buys as well and smart-TV botnets is something the industry should be talking about much more. Instead there seems to be the assumption that with TV and more direct buying things would somehow be better.
The claim this article makes and provides evidence and reasoning for, is that as long as a direct buy is somehow associated with 1st or 3rd party data, the associated transactions will be vulnerable to fraud.
To understand this claim better, as a starting point, we have identified four principal modes of direct buying. These are the variations of the method of buying which is in all cases direct, if the audience extension technique is used, and if 1st or 3rd party data is used.
The four resulting variations are:
- Direct buy, no audience extension or data used (A1)
- Direct buy, no audience extension used, data used (A2)
- Direct buy, audience extension used, no data used (B1)
- Direct buy, audience extension used, data used (B2)
The case A1 is the only one of the four where it could be argued that in some cases well known vulnerabilities in regards to fraud are eliminated or at the very least significantly reduced. Even this becomes questionable as it is increasingly hard to distinguish genuine premium major sites from spammy major sites.
As more spam sites gain legitimacy, and increasingly populate global and local top500 website rankings, it will become essential to develop deep understanding of the ‘major site’ in question. It is no longer enough to trust the name or appearance of a publisher. Great example of this phenomena are the countless buzzfeed.com clone sites, some of which have massive audiences.
In the case of A2, we can conclude that as long as 1st or 3rd-party data is used in association with executing media buys, those transactions become vulnerable to fraud. This will be explained in some detail below.
The key point here is that in the current programmatic infrastructure, both 1st and 3rd party data are vectors that can be used for effectively exposing any audience targeted media transactions to fraud. Virtually on any website.
In both B1 a B2, the fact that audience extension is used, creates two risks. One is that there is no established baseline for exposure levels on major publishers, which means that it can not be confirmed with confidence if a given major site is one of the major sites that have lower levels of ad fraud. Or even if major sites actually have lower levels of ad fraud than other legit sites. As spam sites grow bigger (and some have grown very big), ‘major’ increasingly means a website that might not have any employees listed in LinkedIn. Major does not always mean the Times.
Further, B2 both suffer from the vulnerability where the inventory buy is poisoned using data.
The Three Stages of Evolution in Ad Fraud
To better undersand the arguments in this article, we have to realise that not all fraud is connected to the website. Because not all fraud is connected to the website, buying direct (a specific website) does not provide the protection major publishers would like buyers to think. Later this same will apply to TV.
The principal modes of fraud can be explained as three:
- fraud where website is used as a front
- fraud where a platform (e.g. youtube) is used as a front
- fraud where data is used as the front
While for the first 20 years, fraud was mostly focused on the mode where a website is used as a front, it is becoming increasingly popular to opt for using a 3rd-party platform, or data, as the front. Data Fraud is the key to understanding how TV related fraud will play out, even eventually in respect to the biggest and most popular TV shows.
Audience Extension, the Dirty Little Secret of Publishers
Audience extension is the practice where the publisher has an audience (cookies), and they use those cookies to buy media on websites others than their own. Part of this opaque practice is to not make it evident for the buyer that the seller (the publisher) does not know on which sites inventory is being bought.
In addition, the publisher is at this stage unlikely to be sophisticated enough to keep their own audience data clean, creating another threat. Now when they buy against that data (or sell their own inventory associated with it), the data effectively “poisons” the otherwise possibly legit inventory.
To state it simply, audience extension is the publisher variation of retargeting and is widely used by major publishers as a way to make money.
While something similar had been a practice from the beginning of the online ad networks, it was with Right Media in 2005 when the approach became scalable. Before 2005 it was only possible to target the available inventory within a given ad network (run of network) on one given platform. In run-of-network the entire ad network is monitored for inventory that matches a given cookie. In other words pages within the sites in the ad network where browsers associated with certain cookies land.
What Right Media introduced to larger audiences, was the idea of “run of exchange” (ROE), a bigger and badder version of RON. Now in 2013 DSPs aggregate exchanges, allowing buyers to execute buys against “run of exchanges” targeting. It’s the ultimate spray and pray tactic of the programmatic era. It looks for matches with a given cookie across up to 100 exchanges. In some cases covering a substantial part of web traffic on any given user at any given time. It’s much better than email spamming ever was.
Based on a reliable source in the adtech industry, run-of-exchanges buys are very common. It remains the accepeted best practice for campaigns where 1st or 3rd party data is used for targeting, such as retargeting or audience buy campaigns.
The premise of run-of-exchanges targeting is to say something on the lines of “I want to buy this audience I have on any website where they go”.
Many major publishers use audience extension, and buy run-of-exchanges inventory based on their own audience. This means the buyer nor seller have certainty of which sites will be bought. For this reason alone, whenever there is no 100% certainty about audience extension NOT BEING USED, it seems that the argument about direct buys being a solution for fraud breaks completely down and is left without merit. Unless some of the basic building blocks of the current system are changed.
Getting Our Hands Dirty with Data Fraud
Many of the vulnerabilities in adtech are not hard to exploit. This is to say that it’s too easy to compromise programmatic ad trading systems in various ways. Data fraud is no different, and is not difficult at all. Actually it’s easier to execute than website or platform fraud. Also it’s much harder for researchers to catch someone who is engaging in data fraud related spamming activities. The fact that NOBODY is talking about this vulnerability, makes it very dangerous. There is absolutely no way for patching against this with the available tools and knowhow when the spammer takes the required precautions.
The fact that it will be now roughly laid out here in this article, perhaps suggests that something should be done about making it more harder to do this. So do what, and just how does it work? Below a very simple outline (for obvious reasons) is provided.
1 Create a botnet that is just like any other botnet, capable of sending fake traffic to websites and capable of controlling the information bots reveal about themselves to log-files and other endpoints. It’s very easy to setup botnets or rent access to one.
2Send the bots to visit major publisher sites and by doing so making your bots become part of what publishers consider 1st-party data and use for audience extension purpose. The bots should appear on log files roughly ordinary web users.
3 Send the bots to visit major advertiser sites and by doing so making your bots become part of what advertisers consider 1st-party data and use for audience targeting purposes. Leaving the advertisers data poisoned.
4 Make your data available in data market places and by doing so allowing yourself to make money from all your bots in the the fifth and final step in this outline. This will allow the actual DIRECT_BUY_POISONING attack to take place.
5 Have your bots frequently visiting the sites with most liquidity and by doing so allowing yourself to make money from campaigns where 3rd-party data is used, on any website you’ve set your bots to keep visiting. Black hat mission succesful.
In case you are wondering, there are multiple ways to “make your data available in data market places”. These include websites that you control, social sharing buttons and other widgets you control for 3rd-party websites and dropping cookies on ad impressions that you control.
In the scenario where audience extension is in use by the publisher, your bot data will be considered as valid audience and will be bought on the websites you are sending them to. You will not make money from this, but the buyer is getting bots on spam sites, even though they think they are buying human on a major site.
The above spam site scenario was a common problem in the early days of Right Media exchange. Where it was not uncommon for large advertising brands who had done a direct buy with one of the seat holders, to have their ads on porn sites instead. In some cases sites even worse than porn.
The scenario where the money is being made, is when a buyer uses 3rd-party audience as a basis for executing direct buys. Because your data is available in the market places, and the same bots are frequently visiting common websites, there will be many opportunities for the algorithms to decide on buying your spam.