Monday, June 11, 2007

The Google Freshness Factor

e is a patent application in the US Trademark Office from Monika Henzinger, published in July, 2005, that certifies that she has figured out a way of determining a document’s “freshness.” In an attempt to associate this new term with Google’s other patented terminology (namely PageRank and TrustRank), forum posters are now referring to the concept as “FreshRank.”

The abstract of this patent application states that one of the problems of determining the freshness of a document indexed in a search engine is that the “last-modified-since” attribute isn’t always correct. Some webmasters have figured out they can change the modify date, but obviously a pattern of abuse developed. It doesn’t fool Google, because what Google looks for is actual modified content. As far as how Google determines how old or “fresh” a document may be is still somewhat of a secret. Lately, in the estimation of many, Google has done a very poor job in determining which web sites present as the freshest content in relation to relevancy.

This brings to mind a pertinent question. How does the freshness factor rank in determining relevancy? It has been determined by some that it doesn’t necessarily matter how fresh a document is to Google, especially if that document has many inbound links pointing to it. Henzinger is attempting to patent a more explicit form of freshness, since not all search engines use the “last modified since” attribute anyways, and stating that search engines need a more reliable way of determining overall updated content.

Unfortunately, with the implementation of the duplicate content penalty, we’ve been seeing problems with the freshness attribute of documents. With Google, in particular, the filter employed to whittle out duplicate content doesn’t appear to be taking into consideration the actual origin of the content. For many, this is becoming a great frustrating point. With the onslaught of the technological advances that Google has placed into the public realm within the last decade, it seems impractical and almost ridiculous that they would leave out the very concept of being able to determine the source of the fresh content. Yahoo and MSN do not appear to have this particular problem, so why does Google?

Another particular problem recently presented to the freshness factor, is Google’s own Removal Tool. Experiences with this tool have been much on the side of unpleasant, if at all useful. For some, the Google removal tool has been often mentioned “as a cure against many diseases.” Diseases such as duplicate content or temporary redirects, for example. While I have used it from time to time, I have done so with a cautionary tone, and never used it on a commercial website; rather only on my personal website or blog. Some of the side effects observed are definitely worth mentioning here, and I know I’m not alone.

If you’ve ever used the removal tool, you’ll notice that the page count of the website in question has not been changed, but rather the pages simply fail to show up. Why is this? Because with the URL Removal Tool, these URLs have not been deleted; they have only been filtered out. So even though these pages appear to have been removed, they are certainly still in the database somewhere.

The period of time Google uses to remove these URLs from their index is anywhere between three and six months. I say from three to six months, even though the Google documentation tells us 180 days; in my personal experience, it has been more like 90 days. Regardless of the period of time, rest assured, they are actually still there. How do I know this? Two reasons: one, I mentioned before that the number of pages are still listed as the same amount before the pages were removed; two, after the removal period, they show right back up in the index, as if they’d never left.

Steve Buchanan writes article on many topics including John Deere Lawn Mowers, Honda Generators and Snow Blowers

http://www.articles-hub.com/Article/156421.html