How we solved Magento core_url issues once and for all

website speedToday, we tell you the story of how we solved our Magento core_url issues, which might be very familiar and similar to you, since you’ve landed here.

If you have been working with Magento for long enough, chances are that sooner than later you’d have learnt that there are a two fundamental steps to follow if you are having issues:

  • It’s always the cache! Whenever something doesn’t work, clear the cache, and I mean, all of them! And the problem will (or not) go away, it’s like rebooting in Windows!
  • If that doesn’t work, run a full reindex, and repeat the above step 🙂

That’s why it’s is not only recommended, but also it’s very common to run reindexes daily, and most likely even more often, whenever we want to immediately reflect some changes in the front-end, such a new stock, new products, or new category-products assignments.

Reindexing often shouldn’t be a problem for small stores, but the more products you have, the more probable is that reindexing is going to cause you trouble due the long time that it takes, the amount of resources that it needs, and so on, as well as the fact that they need to run overnight to avoid slowing down your website dramatically.

Whenever a full site reindex is taking a long time, 99% of the cases “catalog_url” reindex is the one spending most of the time, specially if you have a large catalog of products and or if you have a multi-store setup.

Unfortunately or otherwise, that was our case at our bikes shop. We had several thousand products and, even though a big part of them are currently disabled, the reindex was taking far too long, and the table core_url_rewrite was growing exponentially, causing other problems such as slowing down the backup process, or preventing the sitemap generation from running, due big SQL queries taking far too long to run.

Like in most existing online stores, all of these could have been easily avoided if it had been addressed properly from the beginning, but you know… could’ve, would’ve, should’ve… but didn’t!

So, there we were, with a core_url_rewrite table with over 210k rows, and expanding beyond maintainability. We had only two chances:

  • Upgrade the servers and ignore/delay the problem until we reach to this same stage in a couple of months time.
  • Tackle the problem and invest the time needed to find out what’s going on, and sort it out properly

We love challenges, and the first option wasn’t even considered, as it wasn’t a solution really, so we got our hands on it and started to look into the possible causes.

We identified two major problems looking into core_url_rewrite table:

  • There were several thousand rows with URLs about products that were disabled or not visible individually. We had already installed Dn’D module that skips the reindex of disabled and not visible products, speeding up the reindex by keeping them out of the table, but there “leftovers” from the past, before the module was installed. If you don’t have Dn’D Patch Index Url module installed, please take a few minutes and install it right know, it’s a must have for any Magento store.
  • We noticed a very interesting issue: Every time we run a catalog_url reindex, the number of rows of the core_url_rewrite table was increased, even when there were no recent changes on the catalog. So, basically, the table was growing and growing each time without any obvious reason.

In order to address the first problem, we built two queries to identify the rows corresponding to:

Disabled products:

SELECT count(*) FROM core_url_rewrite WHERE product_id IN 
(SELECT entity_id FROM catalog_product_entity_int WHERE attribute_id = (SELECT attribute_id FROM eav_attribute WHERE attribute_code ='status') AND VALUE = 2 AND entity_type_id = 4);

Roughly 2k rows.

Then, we did the same to identify the products with Visibility set to Not Visible Individually:

SELECT count(*) FROM core_url_rewrite WHERE product_id IN 
(SELECT entity_id FROM catalog_product_entity_int WHERE attribute_id = (SELECT attribute_id FROM eav_attribute WHERE attribute_code ='visibility') AND VALUE = 1 AND entity_type_id = 4);

Nearly 8k rows.

Since the above queries return rows that correspond to products not being displayed on the front-end (because they are disabled of not visible in the catalogue), they can safely be removed without any impact on the front-end/SEO.

The infinite growing “duplicated” redirects on core_url_rewrite

This is a very interesting issue which is widely known among Magento developers. There is a good thread on StackExchange discussing it, where you can find responses of all kinds. In the above link, people have done a great job analysing the problem, and some even propose (partial) decent solutions to it, but given the complexity, many people have also given up to find a fix. In fact, Alan Storm himself, one of the most popular Magento developers advises so, and his answer is the second most voted, which by the way is pretty disappointing to say the least.

In our case, it was unacceptable to leave it as it was, because we were aware of the issue, and we had been delaying it for a while. Eventually, it was causing us serious issues, and it was having a pretty bad impact in the overall performance of our store, so it had to be addressed.

As we mentioned previously, every time we reindex the “catalog_url”, the amount of rows on the table core_url_rewrite was increasing, and we were trying to figure out why. It seems that the problem is caused when there are two or more products/categories with the same URL key. During the reindex, Magento checks whether the url key is unique, if it’s not, it appends the entity_id of the product to the URL to make it unique. The problem is that next time the reindex runs, it will do the same process, but the new URL (url_key + entity_id) already exists, so Magento tries increasing the numeric part of the url key (product’s entity id) until it finds a unused URL, and it keeps creating crazy redirects on each run.

One way to avoid/overcame this issue is making sure that the url_keys of the products are unique. We tested that approach in first place, running a query that modified the url_key (attribute of the product) of each product that has duplicates, appending the entity_id followed by a dash. Ie. product with id 123 and url_key “my_product” would become “my_product-123”. This way when the reindex runs, all URLs would be unique, and Magento wouldn’t have to do “his thing”. However, this would be messing up with existing URL redirects in core_rewrite_url and potentially would lead to broken links. Also, it wouldn’t fully resolve the issue, as nothing would be preventing new products from being created/imported with duplicated URL keys in the future, so we had to find a better solution.

Since many people had been investigating and debugging this problem, we decided to have a look at the proposed solutions on StackExchange and tested some of them. The solution proposed by @Simon was the one that made more sense for us, and it seemed to do the job. After applying the code changes, reindexing wouldn’t increase the number of rows of the table. However, the proposed solution was overriding the core, which is something we never do unless it’s something unavoidable. It’s a nice solution, wrapped  as an “unofficial Magento patch”, but we didn’t want to take the risk of having future upgrade/patching issues. So, we decided to create a new Magento extension, which would override the model Mage/Catalog/Model/Url.php, and we put Simon’s fix there, leaving the core untouched. It all seemed to work fine, and the table core_url_rewrite stabilised so the first step was done, now it was time for the tricky part, the remaining “leftovers”.

We had 210k rows (well, actually 200k after the previous clean-up of disabled/not visible products) on the core_url_rewrite table, but as we mentioned previously, we have only a couple of thousand products enabled in our catalog, so presumably many rows were not really “needed”, and most likely the “oversize” of the table was related to the infinite rewrites/redirects.

Looking at the latest entries of the core_url_rewrite, there was a clear pattern on the redirects:

Product_idRequest_pathTarget Path
1234product-url-key-1234.htmlproduct-url-key-1235.html
12product-url-key-13.htmlproduct-url-key-14.html

Both, request_path and target_path look the same except for the number prepended to the url_key. So, we came up with this query to identify all the redirects:

SELECT count(*)  FROM core_url_rewrite WHERE product_id IS NOT NULL AND LEFT(request_path, length(`request_path`) - 5 - length(product_id)) = LEFT(target_path, length(`target_path`) - 5 - length(product_id)) AND
is_system = 0

Result: 195k rows!!! Unbelievable, we thought those rewrites were bad, but we didn’t expect that they would be THAT BAD!

Right, so we wanted to get rid of all that “junk” redirects, but we couldn’t just delete them. Well, technically we could, but that could have a very bad SEO impact, it would generate many 404 errors for products already indexed in the search engines, and there would be existing links to the products in newsletters, social media, etc. So we needed an strategy to fix it.

After a lot of testing and back and forth, eventually we did the following:

  • Create a table core_url_rewrite_tmp which contains only the redirects (can be done with one query, using create table as select):
    SELECT category_id, product_id, store_id, request_path, target_path  FROM core_url_rewrite WHERE product_id IS NOT NULL AND LEFT(request_path, length(`request_path`) - 5 - length(product_id)) = LEFT(target_path, length(`target_path`) - 5 - length(product_id));
  • Override Mage_Cms_IndexController::noRouteAction to capture 404 events
  • Then, whenever a 404 happens, check if the request matches the below pattern: '/\\/(.*)-\\d+.html/'
    • If it doesn’t, let Magento handle the 404 normally
    • If it does, query our new table and search for matches on the current store, fetching the product_id or category_id.
      • If there are no matches, let Magento handle the 404 normally
      • If there are matches, redirect the user to the corresponding product/category (301 – permanent redirect) URL

After having this kind of fallback mechanism, the redirect entries can be removed from the core_url_rewrite table.

Then, the core_url_rewrite table is completely clean and light, which, being one of the critical and most used and joined tables of Magento,  has a shocking impact in the overall performance of the site, not to mention the massive speed increase on heavy tasks such as sitemap generation, core url reindex, and so on.

In our case, as a result of this whole clean-up, the core_url_rewrite now has 5k rows. We still have to keep the “core_url_rewrite_tmp”, but that table is queried in very rare occasions (vs the frequency of core_url_rewrite), when an “old URL” is requested.

Last but not least, since we are sending all the requests to the right destination with a 301 redirect, within a couple of months, all the search engines would have updated their indexes and the table can be “safely” removed.

Note that there might still be old links in emails, social media, etc., but the fallback can then be amended to redirect users either to the catalog search (searching for the requested URI), or it can do a best-match on the core_url_rewrite and redirect the user to  the closest match. This won’t be 100% accurate as the products with the duplicated URL keys would have similar url keys/names. However, on this fast moving age, after 6 months those links are highly unlikely to be hit.

We deleted our table after 3 months, and two weeks later we haven’t found any related 404 in our reports.

Finally, our zipped database (excluding orders) dropped from 80Mb to 5Mb, all our sitemap generation and reindex issues were resolved, and our website overall speed/performance improved noticeably, so we couldn’t be happier 🙂

Magento 2 Composer Module Update

People often asks us how to do a Magento 2 Composer Module Update for their extensions whenever they want to release a new version of their modules.

There are different ways of doing it, we will show you how to do with with git tags and a semantic versioning strategy for the releases, which we thing is the most professional and reliable way.

We will continue with the example of our previous post where we showed you how to create a hello world extension in Magento 2 with composer.

For the newcomers to composer, et me quickly give you an brief explanation of how does it work, more specifically, how we use it together with Magento 2 while developing new modules:

  • To install a new extension developed by ourselves (or by a third party): We normally use “composer require vendor-name/extension-name” from the root folder of Magento 2. Which, by default (depending on the value we’ve set for “minimum-stability” and “prefer-stable” in our composer.json file) will attempt to install the latest version of that extension that meets its dependencies/requirements within your system/application. This means that, for example, if you are using php 5.5.x and the latest version of the extension that you are installing has a dependency (composer.json “require” section) like:
    "php":">=5.6"

    Then, the latest version of the extension cannot be installed on your system. Composer will firstly try to install a newer version of the dependency, and failing that, will search backwards previous versions of the extension until it finds the most recent version that works with php 5.5. Eventually, it will fail if it cannot find any version installable (that meets all the dependencies) in your system . Normally, you just run the command and everything runs smoothly, composer will take care of everything from you, but sometimes it will fail with an error like below:

    composer require lumbrales-software/magento2-first-module 
    Using version dev-master for lumbrales-software/magento2-first-module     
    ./composer.json has been updated
    Loading composer repositories with package information
    Updating dependencies (including require-dev)                             
    Your requirements could not be resolved to an installable set of packages.
    
      Problem 1
        - magento/project-community-edition dev-master requires magento/product-community-edition 2.0.2 -> no matching package found.
        - magento/project-community-edition 2.0.2 requires magento/product-community-edition 2.0.2 ->  no matching package found.
        - magento/project-community-edition 2.0.1 requires magento/product-community-edition 2.0.1 ->  no matching package found.
        - magento/project-community-edition 2.0.0-rc2 requires magento/product-community-edition 2.0.0-rc2 ->  no matching package found.
    

    This can sometimes require a system update, or like in the above example, adding a custom repository where composer can find the required package.

  • To release a new version (code changes) of your extension: This is the interesting part. People are usually confused and run “composer update” and expect it to do the job, but the problem is that the above command can often lead to undesired and unexpected results, since it will try to update every single package, which is not always (almost never) what we want. Normally, you want to update an specific package, or maybe even a whole vendor, but not everything at once. Therefore, we recommend to tell composer what to do, like in the example of our previous post:
    # Update an specific package
    composer update lumbrales-software/magento2-first-module
    # Update all packages of the vendor
    composer update lumbrales-software/*
    

Having said that, now let’s focus on how to release a new version of our extension.

The process is actually simpler than it might seem, all you have to do is implement your new functionality, code changes, whatsoever, using whichever git branching strategy you want, and once you are ready, you can do the following steps:

  • Merge your changes into the master branch
  • Choose a version for the update/release, following the guidelines of semantic versioning:
    Given a version number MAJOR.MINOR.PATCH, increment the:

    1. MAJOR version when you make incompatible API changes,
    2. MINOR version when you add functionality in a backwards-compatible manner, and
    3. PATCH version when you make backwards-compatible bug fixes.

    Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
    Note: If you want to release a stable version, it has to be at least 1.0.0, otherwise it will be considered unstable by composer and might lead to issues while installing it.

  • Update your composer.json with the new version (ie. “version”: “1.0.1”,) and commit the changes of the file.
  • Create a git tag with the new version:
    git tag 1.0.1
  • Push the changes and the tag:
    git push origin master
    git push --tag
  • That’s it! Now, you should be able to run composer update your-vendor/your-module and it will fetch your changes:
composer update lumbrales-software/magento2-first-module -v
Loading composer repositories with package information
Reading composer.json of lumbrales-software/magento2-first-module (1.0.2)
Importing tag 1.0.2 (1.0.2.0)
Reading composer.json of lumbrales-software/magento2-first-module (1.0.1)
Importing tag 1.0.1 (1.0.1.0)
Reading composer.json of lumbrales-software/magento2-first-module (master)
Importing branch master (dev-master)
Updating dependencies (including require-dev)
  - Removing lumbrales-software/magento2-first-module (1.0.1)
  - Installing lumbrales-software/magento2-first-module (1.0.2)
    Downloading: 100%         
    Extracting archive

Writing lock file
Generating autoload files

Note: Sometimes composer seems to keep cached the repository changes and won’t fetch your new code changes, I found a workaround for that by clearing the cache with the below command (run it as your own risk):

rm -fr ~/.composer/cache/*

That’s about it, you now should be able to release code changes nicely, and for every new code changes you just need to bump the version accordingly and remember to push both the version update in the composer.json and the git tag.

Good luck!

Let us know if you have any issues with it and we’ll try to assist you.

Speeding up Magento (evil extensions)

Speeding up magentoSpeeding up Magento is the headache of many shop owners, which usually contact us frustrated after seeing how their page loads are way above the market average, causing a really high bounce rate.

Last week, we were tasked to investigate some spikes in a website that were even temporarily bringing the server down in some cases. Thankfully, our customer had Newrelic, which was very helpful pointing us to the right direction.

Tinkering around within Newrelic, we noticed quite a few issues:

  • It was a multi-website shop, with four different domains. One of them had the “shell” folder had directory listing enabled, and some malware/malicious attempt/bot had been crawling it and calling multiple times a php script that performs heavy operations, and was taking nearly 30 minutes to finish.
  • There quite a few 404 entries in newrelic with a load time suspiciously high.
  • There were also some other pages with an abnormally high load time, and they were all coming from the same user agent, also the 404 ones, which seemed to be some sort of SQL injection attempts by the looks of the URLs:
/checkout/-1%22%20OR%203%2b936-936-1%3d0%2b0%2b0%2b1%20--%20/add/uenc/aHR0cDovL3Nob3AuZ2Zpbml0eS5uZXQvb3RoZXItc3R1ZmYuaHRtbA,,/product/13953/form_key/5kBBRqrthVBCRE1e

/category-name/subcategory/shop%25'%20AND%202%2b1-1-1%3d0%2b0%2b0%2b1%20AND%20'FAEs'%21%3d'FAEs%25/custom-filter/custom-filter-2.html

Which url-decoded looks like:

/checkout/-1" OR 3+936-936-1=0+0+0+1 -- /add/uenc/aHR0cDovL3Nob3AuZ2Zpbml0eS5uZXQvb3RoZXItc3R1ZmYuaHRtbA,,/product/13953/form_key/5kBBRqrthVBCRE1e
/category-name/subcategory/shop%' AND 2+1-1-1=0+0+0+1 AND 'FAEs'!='FAEs%/custom-filter/custom-filter-2.html

Then, we started looking at the stack traces, and spotted a third party extension which was the slowest part of execution flow:

Slowest componentsCountDuration%
QuBit_UniversalVariable_Model_Uv::_getLineItems111,100 ms31%

A quick look at the code confirmed our suspicions, we had stumbled upon with another evil magento extension.

The third party extension had an observer on the basket that for each item, was loading the product, and for each product, it was loading each of its assigned categories, with the purpose of generating a JSON file for tracking purposes. It was running “fine” with low traffic, and with a low amount of items in the basket, but the load time was growing exponentially was new products were added to the basket, with together with a malicious behaviour adding over 400 products, was capable of bringing the server down. Interestingly, the tracking code was only required for the checkout cart page, but it was actually running on every single page, which is why the 404 and other non-cached pages were affected.

We deactivated the quotes with over 200 products in the basket (ie. not real users), did a few amendments in the code and deployed the changes. Below you can see the result:

Speeding up magento

A massive drop in the average response time, and guess what? the spikes problems disappeared 🙂

Once again, please be aware of the risks of installing third party extensions in your store, as they can really hinder the user experience of your customers and damage your business.

If you are facing similar problems with your store, do not hesitate to get in touch.

7 ways to speed up Magento on micro-instances

Magento Speed upThere are plenty of posts showing ways to speed up Magento, and this is yet another one, but we’ll try to come up with some original techniques that hopefully will help you to improve the performance of your store.

It’s worth mentioning that all the below tips have been tested thoroughly, and we are currently taking advantage of them on many production websites. Today, we’ll focus on how we did it for our motorbikes shop.

The infrastructure where store is currently running is a skinny micro-instance with 0.6GB of RAM, shared CPU, and 10GB of hard disk. We’ve been constantly tunning all the settings trying to get the best performance out of it, and here you have the major changes that made a difference:

  1. Cache:
    1. Start using FPC (Full Page Cache) if you aren’t already, this is definitely the most important thing and should be your priority. We recommend Varnish + Turpentine even though it might not be the easiest to install & configure, depending on the amount of customizations of your store. Having FPC is essential to be able to scale and handle multiple simultaneous requests, specially important in a machine where you’ve a very limited amount of RAM and CPU.
    2. Start using the PHP extension APC (or OPCACHE + APCU) and use APC for the Magento config cache (you can enable it on the local.xml). This will definitely increase the speed of your site for the non-fpc-cached pages.
  2. Memcached for sessions: Many other people use Redis, and that might be even better, but as long as you don’t use files or database, you are doing just fine. You can change this setting in your local.xml. If you don’t use it for anything else, you don’t need more than 16/32MB until you have a serious amount of users.
  3. Robots.txt & Blocking bots: When you start running a new website, (I completely made up that number) probably around 90% of your traffic is just bots, and even when you grow, it will still represent a very high amount of the number of requests of your site. Therefore our advise is to spend some time carefully tunning your robots.txt to prevent bots from crawling pages or images or whatever you don’t want to be crawled. Moreover, there are just too many bots out there and unless you are really interested in them all to crawl your site, my advise is to block them either by the robots.txt or by any other mechanism, such as nginx/apache config by looking at the user agent. Depending on how aggressive you are willing to be against unwanted bots, you can drastically reduce the number of requests of your server and save all those resources to delivery a much better user experience (aka speed) to your actual customers.
  4. Reindex optimizations: This is one of the most painful tasks, and it can only get worse as your catalog of products increase. A good start is usually installing this url reindexer patch extension, which basically skips the disabled or not visible products from the catalog url reindex. This has proven to reduce the reindex time from nearly an hour to just a few minutes, but it will depend a lot on your catalog of products.
  5. MySQL tunning: A proper configuration of your mysql instance can have a huge impact on your site’s performance. Give mysqltunner a go, it will most likely give you useful advise, but once again, it all depends on each custom setup. One setting that works well for one store, might not work for another one. Always monitor your config changes and if something doesn’t work, revert the changes.
  6. Varnish/Turpentine tunning: After installing and configuring varnish+turpentine, we noticed a boost in the capacity and performance of the site, but we noticed as well that, due the CSRF protection (aka formkeys) added in the latest versions of Magento, Turpentine’s FPC stopped working as it used to do. Now, the default config will never serve a cached page to a new user, as it has to be processed by Magento in order to generate a valid session and be able to get a valid formkey through an ESI. This is a very good solution to preserve the FPC functionality and the CSRF protection, but has the downside of always giving a non-cached page to the first time visitors, which if it’s not quick enough, can make de difference between bouncing or not. Therefore, we recommend enabling the setting “Use VCL Fix” in Turpentine, but that would imply disabling the CSRF protection as well, which you should do at your own risk. Be very careful doing this as this can potentially break the add to cart and login in your store. Please, note that after enabling that setting you might still notice that the first time visitor still gets served a non-cached page, if thats the case, check this pull request and make sure you have those changes in your VLC file.
  7. CDN: This one is pretty obvious, but somehow many people have the impression that having a CND is very complex and/or costly. We’ve tried a few of them and the best one we’ve found so far is KeyCDN (register through this link and you’ll get 10$ credit month trial!), mostly because of the price, but also because of the performance and the features.

Conclusion: The above list is a small subset of the many ways to speed up Magento, and it should be just the beginning of all the things that you should do in order to make your store quicker and better. There is an endless list of things that you can try, but this list highlights the changes that had a bigger impact in our micro-instance, and hopefully will be useful for not only low resource machines, but also for the majority of the Magento store setups.

If you have any questions and/or need any help, do not hesitate to get in touch.

How-To Install Magento 2 on OpenShift – II

Magento 2

Magento 2

Last week we posted a guide to Install Magento 2 on OpenShift. It was just an experiment, and we didn’t intend to do anything else after we showed the proof of concept. However, since the guide has become very popular, we’ve decided to dig out a bit more into this and see if there was a less hacky way of install Magento 2 in OpenShift, and there  we are glad to say that we’ve managed to get it up and running with MySQL 5.7, by using another community cartridge.

It’s been very challenging due the hard disk limitation of 1GB, and we’ve had to delete everything that wasn’t absolutely necessary, as well as give up on performance in MySQL settings to squeeze a few more extra space, but eventually it worth the effort!

The steps to spot a new instance are almost the same as in our previous post, but this time we’ve simplified them even more within a simple bash script.

All you have to do is copy and paste this in a .sh file, set your API KEY & SECRET and run it from your favourite terminal (sh filename.sh). Remember that you need to have rhc installed and your Magento API Keys (check our previous post if you need any help with that):

#!/bin/bash

#CHANGE THIS SETTINGS
myapp='myapp' #App name
API_KEY='YOUR API KEY'
API_SECRET='YOUR API SECRET'
# Database settings, change them if you want
USERNAME='magento'
PASSWORD='123456'

regex='Git remote: \K(.*) (?=Cloned)'
echo Creating app...
newapp=$(rhc create-app $myapp http://cartreflect-claytondev.rhcloud.com/github/boekkooi/openshift-cartridge-nginx)
echo $newapp
GIT_REMOTE=`echo $newapp | grep -Po "$regex"`
echo $GIT_REMOTE

echo Setting env variables
rhc set-env API_KEY=$API_KEY API_SECRET=$API_SECRET OPENSHIFT_MYSQL_DB_USERNAME=$USERNAME OPENSHIFT_MYSQL_DB_PASSWORD=$PASSWORD -a $myapp

echo Installing cartridges
rhc cartridge add -a $myapp http://cartreflect-claytondev.rhcloud.com/github/icflorescu/openshift-cartridge-mysql
rhc cartridge add -a $myapp http://cartreflect-claytondev.rhcloud.com/github/boekkooi/openshift-cartridge-php

echo Cloning repository
git clone https://github.com/javilumbrales/magento2-openshift.git
cd magento2-openshift
git pull
git checkout magento-2

echo Deploying!
git remote add $myapp $GIT_REMOTE
git push $myapp magento-2:master -f

That’s it, your Magento 2 instance should be up and running on http://$myapp-$namespace.rhcloud.com.

The admin url and credentials are the same (/admin123 with admin / OpenShiftAdmin123)

Please, note that after deploying the app, the gear is almost out of space, so you would only be able to add minor data/changes to it. Hopefully, it should be enough for you to test it and see how it performs, and if you like it you could always upgrade your account to get some more space.

Enjoy!

Show me the code