demosthenes.info

I’m Dudley Storey, the author of Pro CSS3 Animation. This is my blog, where I talk about web design and development with , and . To receive more information, including news, updates, and tips, you should follow me on Twitter.

featured articles

popular favourites

Search Engine Robots

There is one other further use of meta tags that is of immediate interest, and that is using them to controlling the behaviour of search engine spiders. At the level of the individual page, meta tags with the name attribute of "robots" can be used with content values of the following:

NameDescription
noindexDo not index the content of this page (i.e. do not add it to a search engine database, and do not show it in search results.
nofollowDo not follow outbound links from this page.

Together, they are typically used as follows:

  1. <meta name="robots" content="noindex, nofollow" />

This can be used to hide information that is publicly available, but that you do not wish to appear in search results: personal information you have no problem in sharing if someone comes across it, but do not want accessible to Google, for example.

(The opposite commands, index and follow, are assumed – they do not have to be placed on a page in order to have it searched and indexed.)

To control access to many pages at a time, or to folders, a different approach is used. Before they attempt to index a site, all search engine robots will look for a file called robots.txt at the root of the site (i.e. alongside the index.html page). The file will usually contain two very simple lines. The first is:

  1. User-agent:

“User-agent” is the name of the spider that you wish to control: Google’s, for example, is called “googlebot”. In this way you can command different spiders from different search engines to do different things. Typically, however, you want to command all spiders to do the same thing, and so use a wildcard:

  1. User-agent: *

This means “the following command is true for all spiders”.

The next line is the actual command to the spider. The most common is:

  1. Disallow:

Meaning: do not index the file specified after the colon, or files in the specified directory. To disallow everything on your site from being indexed, use /.

You must be signed up in order to leave comments.

web developer guide

featured comment

by Aisling Brock in New Business Card Design

what i'm reading

A Feast for Crows: A Song of Ice and Fire: Book Four
A Feast for Crows: A Song of Ice and Fire: Book Four

what i'm watching

Prometheus: Collector's Edition (Bilingual) [Blu-ray 3D + Blu-ray + DVD + Digital Copy]
Prometheus: Collector's Edition (Bilingual) [Blu-ray 3D + Blu-ray + DVD + Digital Copy]

what i'm playing

Borderlands
Borderlands

what i'm hearing

Planets
Planets

blogs

podcasts

no ads ever

This blog is free of advertising, and always will be.

creative commons licensed

The content of this blog is free to use in whatever way you wish under the Creative Commons license.