Restricting search indexing to sections of a web page

If you think about websites as having different page types, with each page type having different sections within it, such as content sections, navigation sections, and footer sections, it becomes apparent that the value of a particular page is defined by the content on the page that is unique to that page. Sections like footers or site-wide navigation systems are repeated on each page, and give no specific extra value to that page.

So, it would be helpful to be able to instruct search engine robots to not index specific areas of the web page. Here’s a wireframe of what I’m thinking.

Wireframe showing regions of a page that should and shouldn't be indexed by search engines.

Wireframe showing regions of a page that should and shouldn't be indexed by search engines.

How? Well, I haven’t found a real solution. Here’s an idea though.

We could extend XHTML with a schema that would include the ability to add attributes to elements like DIVs, ULs, OLs, Ps, and so on.

The attributes could be along these lines:

<div robot-follow=”yes” robot-index=”no”>Stuff you don’t want indexed here</div>

Of course, then the makers of the bots would need to program to heed these attributes.

So, yeah, all-in-all, a fairly impractical idea as nothing is implemented. However, if it were, I would use it on many websites.

XML file of shooting ranges in Michigan

As another small step in this process of manipulating a data set to upload to Google Maps, I took the cleaned XHTML I had from a few days ago, and used TextWrangler to do some quick search and replaces on the source code in order to produce this XML file.
ranges-data.xml

Next, I think, I’ll load this XML file into PHP using the simplexml features which will make it easy to run the data through a PHP-based GeoCoding processor that I’m sure I can dig up. The goal is to transcode the addresses of the ranges into latitude/longitude points, which seem to be required pieces of data for the KML file I’m trying to piece together.

I may at the same time output the whole thing into KML format, since I’ll be in there with the data nodes anyway.

Sample KML structure for the shooting ranges data

And here is a sample of what the intended shooting ranges KML feed will look like.

A couple notes:

  • the Placemark node will repeat for every shooting range
  • I’ll have to find a way to process the address information and generate latitude/longitude points—there are bound to be problems when the GeoCoder will have trouble parsing an address, though I’ve gone through this before on a prior Web development project
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.1">
<Document>
<name>Shooting ranges in Michigan</name>
<description><![CDATA[Places to shoot in Michigan: Public/DNR ranges, shooting clubs, and businesses with firing ranges available.]]></description>

<Placemark>
<name>Flushing Rifle &amp; Pistol Club</name>
<description><![CDATA[165 Industrial Dr., Flushing, MI 48433<br>http://www.flushingrifleandpistol.com/<br>]]></description>
<Point>
<coordinates>-83.866898,43.068909,0.000000</coordinates>
</Point>
</Placemark>

<!-- Repeat Placemark for each range -->

</Document>
</kml>