WhoTracks.me: Find out where you're being tracked on the web

Part two of our blog series about WhoTracks.me explains how you can easily check, how many and which trackers are active on a particular website, using the most comprehensive transparency tool for online-tracking on the web. You will also learn where the data powering WhoTracks.me comes from.

Björn GreifEditor

In the first part of our blog series we took a look at the magnitude and level of detail of information on trackers available on WhoTracks.me. Additionally, the website, which is currently in a pilot phase, provides a similar level of detail on the tracking landscape of the 500 most visited websites.

The website overview page tells you that currently nearly 83 percent of all traffic to the top 500 websites contains tracking. On average, there are nine trackers on each website and 33 tracking requests per page spying on you.

If you click on one of the 500 listed websites, you will be taken to a profile page which displays

  • what data trackers consume per page.
  • how many trackers were found on average per page.
  • how many trackers have been discovered and to which categories they belong.
  • which tracking methods were detected.
  • how high the proportion of page loads with tracking requests is in relation to total traffic.
  • the number of tracking requests per page load.

A “Tracker Map” shows to which categories and companies the trackers found on a website belong. Below the tracker map, all recognized trackers are listed again individually and can be sorted by frequency or company (alphabetically). This way, you can easily find out which trackers are active on the site and to what extent. A click on an entry takes you to the corresponding profile page of the tracker.

Where does the data come from?

WhoTracks.me builds upon the anti-tracking technology that powers Cliqz and Ghostery. Its database contains only anonymous statistical data of Cliqz users participating in the Human Web. For each page loaded in the browser (except in private tabs), Cliqz receives a message, which describes the third-party requests required to load that page. The following measures ensure that Human Web data is irreversibly anonymized:

    • Any personally identifiable information (PII) is removed before the data transfer. The address of the page is split into hostname and path, and these are both obfuscated with a truncated hash. This means that only well-known hostnames and paths can be revealed by Cliqz, and private pages will remain undecipherable. Let’s take twitter.com/username as an example: Cliqz can only find out twitter.com, because we already know that twitter.com is a popular website. But unless username is well known, Cliqz won’t be able to reverse it.
    • Third-party requests are aggregated at the subdomain level, all paths are removed (since they may contain PII), and Cliqz just sends counters of signals. For example, Cliqz would only send the number of requests which contained a cookie header for a domain.
    • Each page load is sent as an independent message and via a proxy network, which obfuscates the IP address of the sender. This prevents linking page loads back together at the server-side by looking at the IP address of the sender, for example. Thus, subsequent deanonymization is impossible.

The data collected was audited by external researchers in April 2017. Some theoretical attacks to link messages were found which affected a small subset of messages. These issues were subsequently fixed to remove this attack vector.

The Human Web data is primarily used to automatically generate the list of tracking domains which Cliqz anti-tracking will work on. The side-effect is that this data can also be used by WhoTracks.me to generate this census of trackers across the web.

Full transparency thanks to open source

New “Tracking the Trackers” studies will be published on WhoTracks.me, which will highlight interesting anomalies discovered in the tracker data.

The data on trackers and websites as shown on WhoTracks.me, as well as the code to generate the site itself is already open-sourced on GitHub. An API to facilitate loading the data will be provided soon.

In the medium term, you can participate in maintaining the database by reporting trackers. Your support in visualizing the data is also welcome. This way, you can actively participate in the most comprehensive transparency tool for online-tracking on the web, which is available for free to all Internet users worldwide.