MEDIA5K | OPEN AD FRAUD INTELLIGENCE

OPEN COUNTER AD FRAUD INTELLIGENCE KIT

100%, transparent and open, MEDIA5K is administrated by a non-profit organization and is actively build by a community of interdisciplinary researchers and technologists.

- a scored index covering 5,000 top .com sites (media5000 index)
- a way to export the data to a variety of formats
- detailed explanation of how the scores were created and how to read them
- all computer codes that were used for creating the scores
- details on an open internet challenge for researchers and technologists 

WHAT IS MEDIA5000 INDEX?

It is a vision of a community driven, open and transparent, scoring system that is managed, operated and audited above any level of standard and rigor possible in centralized approaches. Curated by a non-profit foundation, and developed together in close partnership with a growing network of people spanning over 4 continents. Each and every one motivated by the same goal; cracking the website spam code. Nothing hurts ad fraud revenues more than understanding and mitigation of the spam site problem. 

TRAFFIC

SOCIAL

WEBSITE

So

share of all traffic

Ra

rank vs. actual traffic

Sp

traffic normality

Ip

unique vs. total traffic

Tg

twitter network graph

Te

shannon entropy of tweets

Tu

all sharing vs. unique sharing

Tr

website trust factors

En

entropy of the landing page 

Qu

bounce rate vs. search visit

wdt_IDDoSoRaSpIpTgTeTuTrEnQu
wdt_IDDoSoRaSpIpTgTeTuTrEnQu
1 yahoo.com 7.60 4.50 4.60 1.40 0.00 4.90 9.90 9.40 0.00 5.40
2 answers.com 9.90 6.50 5.00 1.80 0.00 6.10 9.90 8.90 7.50 6.80
3 dailymotion.com 7.90 5.60 5.70 0.90 0.00 8.80 9.90 9.10 8.10 7.80
4 coolmath-games.com 6.20 7.10 0.00 0.30 7.60 6.10 0.50 8.30 0.00 6.80
5 spotify.com 3.30 5.80 4.50 2.60 0.00 7.20 9.90 8.90 7.50 6.50
6 topix.com 0.10 6.50 4.00 4.90 0.00 6.50 9.90 8.50 7.60 6.60
7 ebay.com 4.30 4.30 3.30 2.60 0.00 7.50 9.90 9.40 7.90 6.00
8 wikia.com 1.60 5.00 6.00 1.70 0.00 6.70 9.90 8.80 7.80 8.40
9 drudgereport.com 3.30 5.90 3.60 1.30 2.80 6.90 0.60 8.80 0.00 5.70
10 youtube.com 3.20 3.00 5.30 2.70 0.00 6.30 9.90 9.70 8.40 6.40


M5K CHALLENGE: BLOCKCHAIN COMES TO ADVERTISING

DEMO: WHAT IS THE DEMO ON THIS SITE FOR?

DEMO: WHY INCLUDE ONLY 5000 .COM SITES?

DEMO: HOW TO USE IT FOR COUNTER AD FRAUD INTELLIGENCE?

DEMO: READING THE FOUR TRAFFIC-BASED SCORES

DEMO: READING THE THREE SOCIAL-BASED SCORES

DEMO: READING THE THREE SITE-BASED SCORES

DEMO: WHAT DATA SOURCES WERE USED? 

DEMO: WHAT ARE THE KNOWN CAVEATS?

SCIENCE: WHAT STATISTIC MODELS WERE USED?

SCIENCE: HOW DID WE MANAGE SCALE?

SCIENCE: HOW WERE THE CALCULATIONS MADE?

CHALLENGE: WHAT IS MEDIA5K?

CHALLENGE: WHAT QUALIFIES AS A WINNING SOLUTION?

CHALLENGE: WHY MEDIA5K IS A HARD PROBLEM?

CHALLENGE: HOW CAN I BECOME PART OF IT? 

CHALLENGE: WHAT IS THE PRIZE FOR WINNING? 

GET UPDATES TO EMAIL

Your email is safe with us, and will never be shared.

BECOME A CONTRIBUTOR

Learn more about how to contribute data, resources or donate money to botlab.io to support continuous development of MEDIA5K Challenge.

SHARE MEDIA5K

The only reason we volunteer to do this is awareness. Please help. 


THANK YOU!

Massive thanks goes to those that contributed to the exchange dataset, which ended up growing in to tens of billions of rows. Thanks to both A and R1 who tirelessly answer anything from pre-algebra to snap, crackle and pop. Thanks to Ruben Cuevas from Universidad Carlos III de Madrid who together with his team have been an invaluable partner for botlab.io. Last but not least, I want to personally thank everyone who in anyway contributed to the fact that together with the botlab.io team and our partners I was able to work as a full-time volunteer on the MEDIA5K project since April 2014. Also I want to thank my teachers, without whose wisdom and patience even simple things would be hard.

Mikko Kotila, researcher @ botlab.io