Our current usage of the robots.txt web-crawler control directives as well as indexing directives.

What is Cliqzbot?

Cliqzbot is the web-crawler of Cliqz, a start-up company based in Munich, Germany, and majority-owned by renonwed media group Hubert Burda Media. Cliqz offers a new kind of user experience by searching directly in the browser. The company operates a self-developed search technology. Cliqzbot is the automated mechanism that fetches web pages, and processes them to be included in Cliqz’s website index.

You don’t want Cliqz to access your web content? Here’s how to block web crawlers

Website owners, can instruct Cliqzbot or any other internet bot on how they should crawl a website, by using a robots.txt file. You can restrict ‘Cliqzbot’ from visiting your website by following one of the below listed ways.

Protect server directories using passwords

The simplest and most effective way to block URLs from appearing on Cliqz is to store them in a password-protected directory on your website server. Cliqzbot or any other web bot is unable to access content in password-protected directories.

Robots.txt

The first thing a good internet bot looks at when it is visiting a page is the robots.txt file. The robots.txt file defines how a search engine spider like Cliqzbot should interact with the web pages and files of your web site. You can use a robots.txt file to restrict the access of the website. Please note that instructions listed in the robots.txt are only directives, it is recommended that you also use the first blocking method on your server. A general robots.txt file might look like below:

User-agent: Cliqzbot
Disallow: [the URL path you want to block]
Allow: [the URL path in of a subdirectory, within a blocked parent directory, that you want to unblock]

URL blocking commands to use in robots.txt file

/* The entire site with a forward slash (/) */
User-agent: Cliqzbot
Disallow: /  

/* A directory and its contents by following the directory name with a forward slash */
User-agent: Cliqzbot
Disallow: /example-directory/

/* A webpage by listing the page after the slash */
User-agent: Cliqzbot
Disallow: /private_file.html 

/* Files of a specific file type (for example, .jpeg) */
User-agent: Cliqzbot
Disallow: /*.jpeg$

/* Any other scenario */
User-agent: Cliqzbot
Disallow: /private_file.html

If you have any issues to report or you want Cliqzbot to fetch/re-fetch your website, you can drop us a note at cliqzbot@cliqz.com. We will be happy to help!