As a business owner and webmaster, you may have heard about duplicate content and wondered what it is. Perhaps you’ve worried that you may unintentionally have some and wondered how it might affect your SEO rankings.

Duplication can happen innocently – for instance, when you create similar landing pages or republish your content as guest posts.

So let’s figure out what duplicate content really means, how it occurs, and what you can do about it in order to protect your SEO rankings.

Ready to do more business with email marketing?

Start your free 60-day trial today. Learn more about our 60-day free email marketing trial.

What is duplicate content?

Any two pages on the internet that are identical or have a lot of common content could be considered duplicates by a search engine.

Google defines duplicate content as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”

The two similar pages may be on the same website —  internal duplication. Or they may be on two different sites — external duplication.

Why does duplicate content matter?

You may have noticed that the Google algorithms are shrouded in mystery. Experts can only guess how much your SEO could be affected by duplicate content.

Search engines ignore duplicated content and will show only one version in search results, so some internet pundits say duplicate content penalties are mostly a myth.

However, others say that webmasters need to be aware of possible duplicates because innocent practices could reduce your SEO ranking. This school of thought claims that search results could be divided between similar pages. This would dilute each page’s strength in the rankings.

You don’t want your site’s cornerstone content to lose steam because of your own attempts to promote it with internal linking strategies and guest posts that reuse blocks of text from the post.

No one can say for certain whether a site that has innocently duplicated its own content might be penalized by Google – and it’s even harder to know what could happen in the next update to the search engine’s algorithm. That’s why most content creators try to eliminate potential duplicates.

How does duplicate content happen?

It’s easy to think you don’t need to worry about duplication if all your content is original, but that’s not necessarily the case. Even websites that are careful not to plagiarize can innocently host content that a search engine might count as duplicate.

Here are some ways that a webmaster might innocently create duplicate content:

  • Sharing an identical “About Me” description on multiple sites
  • Reusable blocks: Repeating blocks of content word-for-word across multiple pages
  • Running an A/B test on two landing page designs with the same text
  • Creating separate product pages for the same item in different colors
  • Using product descriptions supplied by manufacturers that other resellers might also use
  • Republishing sections of content as a guest blogger (or hosting guest bloggers who do this)
  • Having your material scraped or plagiarized by another website

The way your system organizes information could potentially cause duplication as well. 

  • The same content appears on www, non-www, HTTP, HTTPS, and AMP versions of your site
  • A page might be in your system only once but appear in multiple subdomains or directories
  • Session IDs created to track visitor actions could create multiple URLs for the same content

‌How to check for duplicate content

Tools like Copyscape help you scan for duplicates

Free tools are available that let you screen for duplicate content. Bookmark these tools and use them as needed:

  • Copyscape tells you if any of your content is being scraped or stolen by other websites.
  • Siteliner analyzes your site for internal duplicate content, with benchmarks against industry averages.
  • SEO Review Tools checks for duplicates within your site and across the web.
  • Grammarly offers a plagiarism checker alongside its legendary grammar tools, so you can ensure that content is original before you publish.

Sometimes these tools will turn up duplicates that are not really an issue. For instance, a testimonial quote may appear across several of your service pages, and you’re not able to edit it.

But in some cases, you may learn that the structure of your site is creating duplicates, or that a scraper is stealing your content. In those cases, you’ll want to take action. 

Solutions for duplicate issues

If you discover that you have a duplicate content problem – either internally or externally – there are ways to fix the situation. Here are solutions to some of the most common duplicate content issues you are likely to encounter:

Someone is scraping your content

Scrapers use your RSS feed to repost your content on their own sites. The first time you discover someone else publishing your content as their own can create strong feelings of violation and even rage — but you probably don’t need to worry too much about its impact on your SEO.

Search engines tend to ignore sites that are populated with stolen content. In fact, those are the sites that need to worry most about duplicate content penalties, because scraping is what the algorithms are designed to weed out. Google shouldn’t be sending any of your potential search traffic to these competitors.

Still, there are a few actions you can take to prevent any damage.

Your content is covered by copyright law from the moment you publish it – no paperwork or filing is necessary. If someone steals your work, you can file a request with Google and ask them to delist the plagiarizer.

If you’re using the Yoast SEO plugin, it’s already adding a snippet to your RSS feed that says, “This article first appeared on yourwebsite.com.” This should help both Google crawlers and human visitors find your original content. If you’re not a Yoast user, the RSS Footer plug-in will let you add a similar snippet yourself.

You host lookalike pages

You may be hosting pages that are very similar on purpose. For instance, you may use multiple landing pages that describe the same services or products, but they are slightly altered for different target markets. Some webmasters use a cloning tool to create similar pages, others just copy and paste. Either way, the results can be confusing to a search engine.

To prevent SEO issues, it’s a good idea to change the titles, topic headings, and some of the text on each of these pages to make them unique. Replacing a few of the images is a good idea, too.

If you’re hosting similar versions of the same content, you can let Google know which page you want to be treated as the primary one with a rel=canonical tag, which looks like this:

<link rel=”canonical” href=”https://www.exampleewebsite.com”/>

Add this tag to all the secondary pages and point it to the primary one. 

You’re republishing content from someone else

There are several cases where you might be republishing text word for word. Some examples include:

  • Republishing press releases
  • Using manufacturer product descriptions
  • Taking calendar event information from other sites
  • Extensively quoting resource material

The best way to avoid a duplicate content penalty in these situations is to rewrite the content. If you’re handling a large amount of content – like hundreds of products from a catalogue – at least replace some of the key words with synonyms and shuffle the paragraph order a bit.

For your most important pages, it’s best to hire a writer who understands SEO. To draw from more than one source rewrite the copy using original language.

You republish material as guest posts

Republishing your own posts on another site is an easy way to get inbound links, reach a new audience, and establish your expertise.

To make sure Google knows which is the original post, get the page indexed by submitting it through your Search Console, then wait a few weeks before offering it elsewhere.

You can also ask the site you’re working with to add a rel=canonical tag to the guest post.

An SSL certificate or www prefix creates a duplicate site

If you’ve installed an SSL certificate on your website, you now have two versions of your entire site – one with an HTTP protocol and a new, secure version with an HTTPS protocol. Google can view these as two sites with tremendous overlap.

The same problem can occur with a www prefix. Google can see www.yoursite.com as a subdomain of yoursite.com.

Google’s Webmaster Tools used to let you set a preferred domain for your site, but that feature disappeared in an update a few years ago. The search engine is now set to choose a preferred domain based on environmental cues. Hopefully the developers made sure that it would overlook duplicates in this process.

If you want to take matters into your own hands, Google recommends four methods for setting your preferred domain:

  • Use rel=“canonical” link tag on HTML pages
  • Use rel=“canonical” HTTP header
  • Use a sitemap
  • Use 301 redirects for retired URLs

Using a sitemap is probably the simplest to implement across an entire website. 

An ounce of prevention

Once you’ve resolved any duplicate content issues that might be impacting your SEO, follow these practices to prevent any problems from arising in the future:

  • Keep creating original content for your site.
  • Don’t use posts on your site that are only lightly changed versions of someone else’s work.
  • Allow robots to crawl your site so search engines can figure out which content is primary or original.
  • If you use session IDS to create an interactive site, set them up to store data in cookies instead of creating new URLs which can be read as duplicates.
  • In your discussion settings, turn off comment pagination so that long discussions don’t create duplicate pages.

A sensible step

In general, smart search engine algorithms should be able to tell innocent cases of content duplication from blatant plagiarism and website piracy. Still it’s considered good website hygiene to consider how your content might appear when a robot crawler is indexing it, and to prevent any confusion about which pages are meant to be the primary sources.

Now that you understand how unintentional – or intentional – duplicate content can occur, take some time to experiment with locating it. These strategies will help you to eliminate any duplicate content you find and to prevent it from occurring in the first place.

Your SE rankings will shine.