A few points on the Robots txt

A few points on the Robots txt

know these, presumably we are aware of the importance of robots, so in the end robot written what point? The approval of the Yantai home today to talk about their own views:

3. for User-agent writing, some aspects need to pay attention to

Sitemap: defines the search engine crawl site map take the address of the

Allow: definition allows search engines address

in Shanghai Longfeng optimization staff must understand Robots.txt, this is a qualified Shanghai dragon er must understand knowledge. So, in the end what is the need to understand the robots

for special restrictions on certain types of written documents, you need to know about the $symbol. $represents the URL. The end of characters, such as /.jpg$.

first, as a qualified Shanghai Longfeng staff must understand that Robots.txt is a protocol rather than a command. Robots.txt is the first time the file search engine to access the site to view. Robots.txt file tells spider program on the server is what files can be viewed, and what files are not allowed to be spiders crawl.


User-agent: *

User-agent: definition of search engine

1. robots.txt must be placed in the root directory of a site, and the file name must be all lowercase.

normally, when the spider came to your site, first of all is to see whether the root directory robots.txt this file, if it exists, then spiders follow the robots.txt protocol; if not, it will default to climb all the files from the spider web.




* on behalf of all search engine types, the * is a wildcard. If you want to limit the specific spider, you need to change a way. To add specific instructions to the noble baby spiders, you need to write User-agent: BOT written in this noble baby; the definition of the type, below write for special agreements, such as Allow and Disallow etc..

Disallow: defined against search engine included

4. Disallow and Allow respectively represent the prohibition of grasping and allow capture, write behind the corresponding address. Such as Disallow: /tmp/, that is to prohibit the capture TMP directory.


Leave a Reply

Your email address will not be published. Required fields are marked *