Papers

Published on May 31st, 2018 📆 | 5619 Views ⚑

0

The robots.txt


iSpeech

What is a robots.txt file?

Search engine through a program robot (also known as spider), automatically access the Internet page and access to web information.
You can create a plain text file, robots.txt, in your website that declares that the site does not want to be accessed by the robot so that part or all of the site’s content can not be included in the search engine, or Specifies that the search engine only includes the specified content.

Where is the robots.txt file?

The robots.txt file should be placed in the root directory of the site. For example, when a robots visit a website (such as http://www.abc.com ), it will first check whether the site exists http://www.abc.com/robots.txt this file, if the robot to find this file, it will be based on the contents of this file to determine the scope of its access.

The format of the robots.txt file

The “robots.txt” file contains one or more records, separated by blank lines (CR, CR / NL, or NL as the end), and the format of each record is as follows:





"<field>:<optionalspace><value><optionalspace>"

 

[adsense size='1']

In the file can be used # for annotations, the specific use of the same practice and UNIX. The records in this file usually begin with one or more lines of User-agent, followed by a number of Disallow lines, as follows:

  • User-agent:
    The value of this item is used to describe the name of the search engine robot. In the “robots.txt” file, if there are multiple User-agent records that have multiple robots that are limited by the protocol, Say, at least one User-agent record. If the value is set to *, the protocol is valid for any robot. In the “robots.txt” file, there is only one record of “User-agent: *”.
  • Disallow:
    the value of the item used to describe the URL you do not want to visit, the URL can be a complete path, it can be part of any Disallow at the beginning of the URL will not be access to the robot. For example, “Disallow: /help” does not allow search engine access to /help.html and /help/index.html, and “Disallow: /help/” allows the robot to access /help.html without access to /help/index .html. Any Disallow record is empty, indicating that all parts of the site are allowed to be accessed, in the “/robots.txt” file, at least one Disallow record. If “/robots.txt” is an empty file, then for all the search engine robot, the site is open.



Comments are closed.