Duplicate Content / Non-Canonical URL’s
Posted November 17th, 2008 by Daniel SchulmanCategories: SEO
I just got back from PubCon 2008 in Las Vegas where I went to a very good session on Duplicate content.
The information on identifying duplicate content, the problems of duplicate content, and dealing with duplicate content was very thorough and presented very clearly. The main issue I had, though, was what was being referred to as duplicate content was mostly not what I would call duplicate content.
Much of the presentations dealt with spidertraps, useless parameters on URL’s, lack of a preferred domain (i.e www or no www), etc. In other words, situations where different URL’s load the exact same webpage. If there is not one and only one URL for a web page, though, I would refer to this as a lack of canonical URL problem.
This got me to thinking then what is all that stuff that I do call duplicate content such as:
- syndicated articles on different web sites with different templates
- overlapping product or category descriptions where the writer only changes a few words here or there
- poorly implemented A/B testing that lets multiple versions of a page get indexed
In other words, situations where page content was close-enough to trigger a search engine’s duplicate content penalty. Ironically, though, none of these situations are exact 100% page duplicates.
I came to the conclusion that if content is exactly the same, one most likely has a canonical URL problem; if content is close but not exact, it is a duplicate content problem.
I am not sure if this distinction is useful. Certainly, the problems of lack of canonical URL’s and duplicate content are the same. However, I think they might have different sources and subsequent different solutions.


