Duplicate Content in WordPress – What It Is and How to Fix It
Almost all of a website’s traffic comes from search engines. Internet users look up the content they are interested in and a search engine serves them links to it. Stands to reason, then, that a website operator will do their best to avoid getting penalized by search engines and get the best search rankings possible. That’s just scratching the surface of search engine optimization. And for a casual user, duplicate content doesn’t sound like a big deal. However, in terms of best SEO practices, duplicate content is anything but innocuous. Not only does it harm a website’s rating, it can also lead to the website being penalized by a search engine.
But what exactly is duplicate content? How can you make sure a search engine doesn’t penalize you for it? Can it be fixed, and how? We will try to answer these questions, and more, in this article.
Superficially, the term is self-explanatory: it refers to exact or near-exact copies of content which appear in more than one place, be it on the same website or across different websites. If the same content is available at two different URLs, you are looking at duplicate content.
There is a variety of reasons this could happen. It may be that an article has been republished on multiple websites, or the same block of text, such as a quote, was used in multiple articles, or similar. For ecommerce websites, it is often the case of there simply being a lot of similar content on a large number of pages – how many useful descriptors can you think of for a T-shirt? You could have cloned a page and saved it on accident, or your CMS could allow for the same content to be reached through different URLs. You could also have generated printer-friendly versions of pages for your visitors‘ convenience, or someone may have scraped or otherwise stolen your content and published it as their own.
Whatever the reason, though, search engines don‘t like duplicate content. And there are consequences to that.
Links with more views tend to rank better in searches. So, if a search engine finds the exact same content in two places in response to a query, it might not distinguish between them. It may rank them similarly, and therefore dilute the traffic by splitting it into two links. This will lead to your content getting fewer views and poorer rankings.
Sometimes, search engines can figure this out and group all the URLs linking to the same content into one cluster. But it doesn‘t always happen.
Furthermore, a thing you should understand about search engines is that they do not, as a matter of fact, search the internet the moment they receive a query. Instead, they do it beforehand, using programs called bots to crawl websites and create an index. Your search results will come from the index, rather than the internet at large. This means that duplicated content can cause a search engine to be slower in indexing a website hosting it, which also means a slower recrawling rate – crucial for content which gets updated from time to time. Duplicate content, therefore, makes it less likely for your visitors to get the latest from you.
Finally, search engines may penalize a website if they perceive duplicate content as SEO spam. Simply put, this means that a search engine might delete from its index a website or page which violates their guidelines. In the specific case of Google, the most frequently used search engine in the world, this means you should avoid creating multiple pages hosting duplicate content and avoid overreliance on content scraping.
You could attempt to search for your content and see whether it appears multiple times, but search engine results may be inconsistent, and the whole process is bound to be grueling, especially if you are working with a lot of content. Fortunately, there are solutions.
Online tools such as Siteliner, Copyscape, and DupliChecker’s Plagiarism Checker are immeasurably useful detectors of copied content and plagiarism. Siteliner will sort out your website, alerting you to duplicated content hosted on one domain, Copyscape can be used to find copies of a page across the internet, while plagiarism checker can look for blocks of text and copies of documents online. In fact, you should consider adding copy and plagiarism checks to your regular content audit.
Deleting Your Duplicated Content
It should go without saying that you could solve your duplicated content problem simply by deleting the superfluous pages and getting your website recrawled. We recommend this in case you have generated duplicate content by accident. Deleting a page, however, may result in broken links and lost equity (poor ranking on SERPs).
In a lot of cases you‘ll want to keep all the URLs. Specifically, if there are other websites linking to them, you can avoid creating broken links and in fact improve your page‘s ranking in SERPs by creating a 301 redirect. This way, all traffic is ascribed to one page for the purposes of ranking, and you don‘t lose any equity.
Another way you could go about is setting up one of your pages as canonical, using the rel=”canonical” attribute in the between the <head> tags of a page. It tells search engines that the page is a copy of another page, and has much the same effect as a 301 redirect. To indicate a canonical page, simply add this code to your copy page:
<link href="ORIGINALURL" rel="canonical" />
You will, of course, need to substitute ORIGINALURL with the URL of your original page. This makes your original page the official, canonical copy, and ascribes equity to it.
Nofollow Links And Noindex Tags
You can choose not to delete your content, but also not to show it to search engine crawlers or tell them not to include them in their indices. This way, you keep all your content but only show the search engines what they need to see. Consider noindexing a page or creating nofollow links for pages hosting duplicate content. We recommend noindexing tag and category pages, as these are generated by your CMS and you have less direct control over their contents. This is especially useful for ecommerce websites, as your filters can be used in different order to browse your content, creating loads of pages with different URLs and exactly the same content.
Google Search Console
If your Google ranking is what‘s mainly bothering you, you can use your Google Search Console to dictate Google‘s bots‘ behaviour with regards to your website. To avoid www and non-www URLs leading to the same content, you can set your preferred domain name version to include or exclude www and so avoid indexing two URLs to the same content. As you may gather, though, this will only affect Google. Other search engines may still see duplicates of your content, which is why we suggest you try some of the above methods, which work for all search engines.
Duplicate content happens, especially as your website grows and develops. As you generate more and more content, chances grow that you will inadvertently repeat yourself or use the same quote or similar. Small amounts of duplicated content are nothing to write home about, and in most cases instances of duplicate content you have created can be easily fixed. Stopping others from duplicating your content poses a few challenges, but there are steps you can take to remedy it. Regardless of whatever your duplicate content trouble is, there‘s a fix in this article. Think of it as your toolkit and pick the right troubleshooting tool for you.