Subscribe Us

Header Ads

Brief Description About Robot.Txt Files


What is Robot.txt files?

  • Robot.txt file allows you to control how Google's website crawlers crawl and index publicly accessible websites.
  • But on 1st July 2019, Robot.txt protocol is becoming internet standard which is announced by Google.
  • Robots.txt is a simple text file that sites in the root directory of your site. It tells “robots” or you can say crawlers which pages to crawl on your site and which pages to ignore.
  • In other words, the content or a webpage which we donot want to be crawled or index is done through robot.txt files for an instance we can say that website owner doesnot want admin panel to be crawled by the google spiders is done through robot.txt files.
  • You can define robot.txt files using a document or  by using the metatags.


1st method
           
By uploading a document in root directory of the website and the document must be written in notepad because it has .txt extension.

Syntax

user-agent:*
Disallow/AdminPanel/

2nd method-
By using metatags which must be put in the head part of the that page which you do not want to be crawled by the Google spider.

Syntax-
<meta name=''robots" content= "nofollow">
or
<meta name="googlebot" content="noindex"> 

Options which are finished-

1) 404 and 410 HTTP status codes- Page not found 
2) Meta tags- Search console remove url tools and disallow in robot.txt.

Alternate options-

1) Noindex- Stop showing page in search results. 
2) Disallow- Stop it from being crawled. 

Post a Comment

0 Comments