# ------------------------------------------------------- # More info on robots.txt # ------------------------------------------------------- # Def: http://www.robotstxt.org/wc/robots.html # Tutorial: http://www.outfront.net/tutorials_02/adv_tech/robots.htm # -------------------------------------------------------- # Known Spiders/crawlers (user-agents) # -------------------------------------------------------- # Google - Googlebot # Altavista bot - Scooter # Lycos - T-Rex # -------------------------------------------------------- # Syntax # -------------------------------------------------------- # User-Agent: [Spider or Bot name] # Disallow: [Directory or File Name] # Allow index of entire site for all crawlers User-Agent: * Disallow: /_vti_bin/ Disallow: /_vti_cnf/ Disallow: /_vti_log/ Disallow: /_vti_map/ Disallow: /_vti_pvt/ Disallow: /_vti_txt/ Disallow: /aspx/ Disallow: /guld/ Disallow: /lang/ # -------------------------------------------------------- # Examples # -------------------------------------------------------- # 1. Exclude a file from an individual Search Engine # # User-Agent: Googlebot # Disallow: /private/privatefile.htm # # 2. Exclude a section of your site from all spiders and bots # # User-Agent: * # Disallow: /newsection/ # # Note that there is a forward slash at the beginning and end of the directory name, # indicating that you do not want any files in that directory indexed. # # 3. Allow all spiders to index everything # # The second, disallow, line you just leave empty, that is your disallow from nowhere. # # User-agent: * # Disallow: # # 4. Allow no spiders to index any part of your site # # This requires just a tiny change from the command above - be careful! # # User-agent: * # Disallow: / #