How Does Duplicate Content Hurt Your SEO Efforts?

Roberto Mejia
by Roberto Mejia on March 8, 2013 in Visibility
Share on Twitter Share on Linkedin Share on Facebook

Duplicate content has always existed on the Internet, but it came to the forefront with the last two Google algorithm updates, known casually as Panda and Penguin. The updates enabled the algorithms to identify content duplicated substantially elsewhere on the web. Duplicate content largely differs from plagiarism in that it usually isn’t malicious or an attempt to “pass off” someone else’s work as your own – much of the time it’s accidental, based on a lack of understanding of how it all works.duplicate content

Why it’s a Problem

So if it isn’t malicious, why is it a problem? Because of search engine optimization (SEO), that’s why. To offer comprehensive results to users, search engines index all the sites on the web that they can read. When an engine finds two or more identical blocks of text, it doesn’t know which is the most important and in what order to rank them for query results. In addition, with so much “junk” on the Internet, engines are attempting to identify – and rank – sites that provide real value to users, and eliminate more dubious ones. Sites that appear to have copied others get pushed down the list because they are possibly fraudulent.

So How Does it Happen?

Of course, there are occasions when duplication is deliberate. However, often it’s the result of some of these issues:

  • Over-categorization of products: Some product types fit into several categories. A home décor site promoting products such as paint, hardware and electrical appliances appeals to a range of target markets including homeowners, renters and builders. If the website is categorized by target market and each category contains the same products, the search engines assume the content has been duplicated.

  • Poor site architecture: Apart from bad tagging issues, a habit of inexperienced webmasters is the publishing of placeholder pages – web pages with little or no content that carry a heading and a “Page under construction” or similar message. The search engines index the site and find several pages with different URLs and the same content, and immediately assumes a duplication issue.

  • Too many similar product descriptions: When the installers of standby home generators created their website, they set up a page for each generator model they carried. Apart from the capacity, the various generator models were all very similar and so were the descriptions on each individual page. This results in Google identifying the site as carrying duplicate content and removing it from the results pages.

  • Enthusiastic boilerplates: A common PR practice when issuing a press release is to add a boilerplate message about the company in the footer.  A line or two is fine, but when the boilerplate runs to 10 lines and the same wording is used each time, it constitutes a “block” of identical information that appears on each press release on the website – and you have a dupe content issue once again.

Best Practices

Avoiding unintentional duplicate content is relatively easy. Before you publish any copy, run it through a dupe checker such as Copyscape which highlights any site where five consecutive words or more appear that are identical. Rewrite common information such as product descriptions, or run them through spin software to get a new angle on each sentence. Expand on or combine pages with brief copy, and use the noindex metatag to block indexing of empty pages - and potential penalties - until they are populated. While the “penalty” is more something that doesn’t happen rather than something that does, the non-indexing of your site by the engines can cause you to lose a fair amount of business.

*Image courtesy of FreeDigitalPhotos.net

lets_talk2.jpg
Roberto Mejia

Roberto Mejia

While specializing in web development and inbound marketing, Roberto Mejia prides himself in always learning and improving as much as possible.