How to Set Robots.txt File to Avoid Crawling Subdomain

Fajar Nindyo
Previously I had bought an expired domain for me to develop as a PBN and tried to do monetization directly on the web.

But the obstacle that arises is the existence of a subdomain on the expired domain which contains many articles and to this day search results still appear on Google even though when I click on the URL it will not be able to see its contents. I suspect the existence of a URL error in this subdomain is the reason why my Google Adsense submission has to be rejected so I have to fix the problem before resubmitting the Adsense submission.
subdomain
Some blog experts claim that to be able to restrict Google from crawling subdomains, it is necessary to modify the robots.txt file on each subdomain (usually set on cPanel hosting). But what if we only use free hosting like in Blogger where before we have done the custom domain process?

The alternative is to register a subdomain with a free hosting service wherein the root folder, a robots.txt file which has been modified will be uploaded so that web crawler will avoid crawling all content in the subdomain.
free hosting
Next, check the existence of the initial robots.txt file on the subdomain via the URL https://www.google.com/webmasters/tools/robots-testing-tool. In the example of the case that I experienced, I obtained information that "robots.txt not found (404)" with further information, "It seems like you don't have a robots.txt file. In such cases, we assume that there are no restrictions and crawl all content on your site. "
error robots.txt
Because the existing robots.txt file was not stored in the subdomain, the next task is to add a row to the available robots.txt column. However, it should be noted that the input of the robots.txt file is correct because it can affect the results of crawlers on your main domain that have been indexed by Google. The robots.txt file structure that is already installed in your domain is usually as follows:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: / search
Allow: /

Sitemap: https://www.ourdomain.com/sitemap.xml

Setting the Robots.txt File on Subdomains to Avoid Googlebot and Other Webcrawler Bots

Next step is how we arrange the lines of code in the robots.txt file which will be placed in the subdomain root on the free hosting that was made before. Modify the robots.txt file above to be as follows:

User-agent: *
Disallow: /

With the syntax above, the web crawler will be told not to crawl all pages in the domain or subdomain (depending on its property), including the homepage. After entering the code above in the GSC, do the test first to see the effect on the web browser. After sure, click the "Submit" button to apply the robots.txt file to the subdomain you are aiming for.
robots.txt
The next step is to download the robots.txt file then place it in the root subdomain in your hosting panel. 
robots.txt
If it is correctly stored then when checked through a browser by typing the robots.txt URL, a result will appear as in the example below. Thus, from now web crawlers will no longer crawl all the content in the subdomain.
Robots.txt

Share this :

Previous
Next Post »
0 Komentar

Write your comments
  • Please leave your comments here. Any comments with active link or advertisment will be deleted.
  • © 2020 Tutorial-Pedia ✔