demosthenes.info

I’m Dudley Storey, the author of Pro CSS3 Animation. This is my blog, where I talk about web design and development with , and . To receive more information, including news, updates, and tips, you should follow me on Twitter or add me on Google+.

web developer guide

my books

Book cover of Pro CSS3 AnimationPro CSS3 Animation, Apress, 2013

my projects

CSSslidy: an auto-generated #RWD image slider. 3.8K of JS, no JQuery. Drop in images, add a line of CSS. Done.

tipster.ioAutomatically provides local tipping customs and percentages for services anywhere.

Search Engine Robots

seo / search engines

Estimated reading time: 1 minute, 36 seconds

There is one other further use of meta tags that is of immediate interest, and that is using them to controlling the behaviour of search engine spiders. At the level of the individual page, meta tags with the name attribute of robots can be used with content values of the following:

NameDescription
noindexDo not index the content of this page (i.e. do not add it to a search engine database, and do not show it in search results.
nofollowDo not follow outbound links from this page.

Together, they are typically used as follows:

<meta name="robots" content="noindex, nofollow" />

This can be used to hide information that is publicly available, but that you do not wish to appear in search results: personal information you have no problem in sharing if someone comes across it, but do not want accessible to Google, for example.

(The opposite commands, index and follow, are assumed – they do not have to be placed on a page in order to have it searched and indexed.)

To control access to many pages at a time, or to folders, a different approach is used. Before they attempt to index a site, all search engine robots will look for a file called robots.txt at the root of the site (i.e. alongside the index.html page). The file will usually contain two very simple lines. The first is:

User-agent:

User-agent is the name of the spider that you wish to control: Google’s, for example, is called googlebot. In this way you can command different spiders from different search engines to do different things. Typically, however, you want to command all spiders to do the same thing, and so use a wildcard:

User-agent: *

This means “the following command is true for all spiders”.

The next line is the actual command to the spider. The most common is:

Disallow:

Meaning: do not index the file specified after the colon, or files in the specified directory. To disallow everything on your site from being indexed, use /.

comments powered by Disqus

This site helps millions of visitors while remaining ad-free. For less than the price of a cup of coffee, you can help pay for bandwidth and server costs while encouraging further articles.