A guide on duplicate content, URLs, and canonicalization Duplicate content and duplicate URL’s are a high level SEO topic that’s often a challenge for many organizations to solve correctly. In this post we’ll be taking a look at Canonicalization, choosing the right URL, how many people get it wrong, and how you should go about fixing it.

WWW vs Non WWW

The most common duplicate URL issue is having your website work for both “http://www.example.com” and “http://example.com”. Your IT department may have told you this is “better”, your website works both ways no matter what someone types, but the fact is they are wrong. Without getting too technical, “www” is considered a subdomain, so having it work in both places creates a duplicate content situation and Google will simply choose one to put in the search engines. However this “let Google figure it out” mindset doesn’t work to your advantage.

By having both versions in the search engines you are dividing up your inbound link equity, so instead of having one strong domain you end up with two weaker ones. The correct course of action is to choose one and have the other send a 301 redirect to the browsers and search engines.

Which one do you choose? Look at your existing online and offline marketing materials. Do you use one version more than the other? If so, that’s probably the right one to choose. The other thing you want to consider is your inbound link equity. Does the “www” or “non www” have more links or stronger links? You’ll need to use a tool like Majestic SEO, Moz’s Site Explorer, or aHref’s to help you make a full assessment.

HTTP vs HTTPS

Having your website viewable on both “http://example.com” and “https://example.com” at the same time is another example of duplicate URLs. Each page should render as “HTTP” or “HTTPS”, or you will have created a condition where the page’s strength is divided into two weaker pages.

That’s not to say that you can’t have a site where some pages are HTTP and others are HTTPS; it means that one page should only exist at one URL, not both. So when a user requests an HTTPS page in HTTP a 301 redirect should send them to the correct version. It is perfectly acceptable for a site to have most of its pages in HTTP and only have the contact, lead generation, and checkout sections are in HTTPS.

In August of 2014, Google announced they were going to use HTTPS as a ranking signal, which means that sites where the entire site is secure would receive a ranking boost. Before you rush out and change everything, understand this is a pretty dramatic change that is fairly complex to do properly and is not without some downside risk. (At the time this was written, the boost for switching to HTTPS was marginal and not something most sites should do yet.)

Parameters

Parameters in your URL are when your URL ends with a question mark followed by a group of terms and values like this:

http://example.com/?utm_source=email&utm_content=backtoschool&utm_campaign=backtoschool14

Search engines like Google are very hit and miss about how they handle things like that. If you use common well established tracking parameters like the ones shown above, or any of the ones used by the Google Campaign Tracker, you can be reasonably confident, Google will handle them properly and ignore them. If you start making up your own, like the ones shown below, you can’t always count on Google always doing the right thing.

http://example.com/bluewidget/?src=email427&link=newtext

If you need to use push information through the URL, and can’t use well know url parameters there are a two main options:

Use hashtag parameters. Be aware Google Analytics won’t track these properly using the standard tracking script.
Use the URL Parameters section in Google’s Webmaster Central Console to tell Google how to handle these parameters. Be aware that Google only takes this as a recommendation and there is no guarantee they will do what you tell them to.

Another point to keep in mind with parameters is their order, for example most search engines will consider these two different URLs:

http://example.com/bluewidget/?src=email427&link=newtext

It’s a best practice to always keep your URL parameters in the same order.

Get your free seo analysis today

File Extensions and Page Types

Another common situation is to have links to both a directory and a directory with a file type on your website. An example of two pages that are the same but with different URL’s is shown below:

http://example.com/page/ http://example.com/page/index.jsp

Your developer or IT department may try and convince you these are the same file and it’s nothing to worry about, but to the eyes of a search engine these are two different URL’s with identical content. The first thing you want to do is fix any internal and external links that point to different versions. Usually, the best URL to keep is the one without the file extension. If this is a very severe problem, you will want to set up 301 redirects to clean up the issue.

Trailing Slashes

Another fairly common, but much more minor, canonicalization problem is the inconsistent use of trailing slashes, like the examples shown below:

http://example.com/page/ http://example.com/page

Usually this is just caused by sloppy coding, and the easiest way to fix it is to correct all internal and external links to use the same format. In most cases, forcing the trailing slash is the best choice. If the problem is fairly widespread and can’t be easily corrected by cleaning up your code you’ll want to modify your htaccess file. Be careful whenever you touch your htaccess file. You can break your entire website if even one mistake is made.

Printer Friendly Pages

Many sites have printer friendly pages that offer a simpler, easier to read layout. However, if the search engines come across both pages, you’ll end up with two weaker pages instead of one stronger one. The easiest way to solve this problem is to block printer friendly pages from being crawled or indexed. You can do this using the “no index” robots tag or by blocking the pages using your robots.txt file. Be careful that you are only blocking the printer friendly page and not the main one (it’s a very easy mistake to make).

Mobile Only Pages

In recent years it was pretty common to have a mobile version of your site that existed on a subdomain. While this may have made things easier for your IT department, it didn’t help you with the search engines. Examples of common mobile subdomains are shown below:

Again, what usually happens here is we end up with two weaker pages in the search engines, instead of one stronger page. The best way to solve this problem is not to have a separate mobile website and instead use a responsive design. If you can’t do that, you’ll need to develop a rock solid user detection program. This program will determine whether the users coming to your site are desktop or mobile based and direct them to the correct version of the site. If a normal user requests a mobile page you’ll want to do a 301 redirect to the normal version and vice versa for mobile users requesting the normal version.

Another option is to block the mobile subdomain from the search engines, however this will prevent you from appearing in the search results for mobile searches, so it’s not something I would do unless you know something is causing problems.

Canonical Tag

Another tool to help the search engines make sure they use the correct URL is to make sure each page uses the Canonical Tag. This tag allows you to say which URL you want the search engines to use should they encounter any of the issues mentioned above.

While this can be helpful it’s important to understand the search engines only take this as a recommendation and may not use the URL you specify.

Conclusion

In this post we’ve addressed the most common duplicate URL and canonicalization conditions and given you some recommendations on how to fix the issues so they don’t become bigger problems in the future. By making sure that content exists on only one URL as defined by the search engines, and not your developer or IT department, you will get the best results. When you remove the guesswork and the ability for search engines to make the wrong decision, from a technical perspective, your website will be at its strongest.

Contact AWG today!

Michael Gray

With over 20 years of SEO and Internet Marketing experience, Michael Gray has helped companies develop and implement effective online campaigns. His unique approach regarding SEO strategies has brought high levels of success for many eCommerce and informational websites. Michael has been a speaker at many SEO conferences including SMX, PubCon and Search Engine Strategies.

Blog

Duplicate Content and URLs – Understanding Canonicalization