Motivation

Unique - Courtesy Irina Souiki

Unique - Courtesy Irina Souiki

Google and the other search engines are trying to convince the webmasters to use the “what so called” Canonical URLs. The Canonical URLs will help the search engines distinguish the dublicated content which comes from different calls on the same domain. For example :

lead all to the same information, the entry point to this portal. Although a person does not care what he typed as long as he gets the information he expects, a search engine will get some(!?) confusion. The Search engine will get the same identic result, from the same domain, in 3 different URLs. Which one should the search index?!

Canonical URL Explained

Canonical URL Explained

The Canonical URL is just a simple “link tag” added to the header of your page. This link gives the owner of the page the power to tell Search Engines which one is his favourite URL for his page.

<link rel=”canonical” href=”http://www.avhumboldt.net/index.php” />

The above example is borrowed from Official Google WebMaster Central Blog and some more information on the canonical Url can be found on the article: Specify your canonical. Although the Blog of Google which is advertising this article does not include a “canonical link” (funny huh), it is a nice practice to have this feature on your website.

Implementation

If your project runs on Zend Framework, it is very easy to create canonical link by retrieving the controllers/actions and Parameters that you are using in the current URL. (And if you don’t have experience with ZF, the rest of the post will look chinese to you) If we have a closer look to the Canonical URLs, we will realize that we have to decide which domain should we use for the content to be shown. The rest of the parameters (the second part of the URLs) is the same as what it is shown on the page, in some cases we will just need to remove some arguments (like color/red/). In other words, if my URL is

http://www.avhumboldt.net/humboldt/publications/books/did/25/title/Aspects-of-Nature

or

http://avhumboldt.net/humboldt/publications/books/did/25/title/Aspects-of-Nature

I have to decide via Canonicals which one is my domain of choice and tell the Search Engines to use that one (The rest is handled by the search engine, we don’t care anymore). So all we need to do, is insert a

<link rel=”canonical” href=”http://www.avhumboldt.net/humboldt/publications/books/did/25/title/Aspects-of-Nature” />

in the header of the page. If we see the href in the canonical above, it can be divided in 2 parts, the domain name, (together with the subdirectory where I have placed my project) and the parameters which decide the content.

  1. Domain Name (+ subdirectory): www.avhumboldt.net/humboldt
  2. Parameters: /publications/books/did/25/title/Aspects-of-Nature

Since my parameters are always the same, in canonical URLs we should determine only the domain name we prefer and place the parameters afterwards. For those that are familiar with the Zend Framework MVC, the parameters of the URL are composed of /controller/action/parameters+. More information can be found in the Zend Framework Documentation. A quick solution should be by using:

“http://www.avhumboldt.net/”.$_SERVER[“REQUEST_URI”];

as the Canonical Link. The $_SERVER[“REQUEST_URI”] will return the URI which was given in order to access a page. Although this looks easy it is not recommend to anyone for use “as it is”. It will lead to some security issues with your website. (There are a lot of posts and resources out there about parameter security). A better approach to have the canonical URL is to recreate the full parameters used in the URL. This can be easily done by using Zend Frameworks Request Object. Within a Controller in Zend you can call:

$this->getRequest()->getControllerName() – to return the Controller name $this->getRequest()->getActionName() – to return the action name

and

$this->getRequest()->getParams() – to return an array with the parameters used in the URL

So we can get the Controller/Action straight forward by calling the getControllerName() and getRequestName(). We will need a little function to retrieve and place in a string all the parameters/values which are stored in the getRequest()->getParams().

public function canonicalUrl()
{
$request = Zend_Controller_Front::getInstance()-&gt;getRequest();
$filter = new Zend_Filter_Alnum(true);
$params = array();
foreach($request-&gt;getParams() as $key =&gt; $value) {
if(in_array($key, array(‘controller’, ‘action’, ‘module’))) {
continue;
}
array_push($params, $key ./. $filter-&gt;filter($value));
}
return implode(/, $params);
}

Once we have all the parameteres ordered in a /varname/value/varnam2/value… fashion, all we need to do is mix them in a Canonical URL and the best way is to create a view variable in the predispatch method of my controller:

The view variable is created for all the actions of the controller and can be accessed by any View script (those .phtml files under the view/scripts/controllername folder) Inserting them in the page is as easy as calling:

$this->headLink ()->headLink(array(’rel’ => ‘canonical’, ‘href’ => $this->canonicUrl), ‘PREPEND’);

The code above is used in the View Scripts of Zend Framework and it will create a link tag which can called from the main layout. In the Main Layout (it should be main.phtml by default) just add:

echo “\n\r”.$this->headLink().”\n\r”;

some where in the <head> section. You should have some nice canonical URLs in every page generated by your controller. A better way should be to create a plugin to have the canonical Urls for every controller, but this is what I needed so far.