Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Giuliana Benedetti - Can Magento handle 1M products?

Your catalog is way too big, you e-commerce won’t work” Well..wrong!
We shared our approach to huge catalogs, and our experience in handling an e-commerce with 700K products.
Following this peculiar case, we decided to go further and we created a 1M products catalog to test Magento response to an environment with such a huge amount of data. We talked about Magento potentiality and analyzed the points of weakness that emerged during our developments.

Giuliana Benedetti - Can Magento handle 1M products?

2.
What’s in the menu?
• Huge catalog
• What we did
• What we are doing

3.
Once upon a time..
• The project began as a migration from a proprietary platform to
Magento 1 Community
• Shoes and accessories E-commerce
• We developed the integration between their management software,
that was handling products anagraphic, warehouse and orders
anagraphic
• Integration with Amazon e Ebay

7.
Updating the catalog (1/2)
• Initially 150k products, this is what we planned:
• Massive initial import
• Frequent update during the day via webservice
• When the catalog started growing, the data exchange volumes via
webservice began unsustainable. The exchange procedure needed a
redesign.

8.
Updating the catalog (2/2)
• Today we have700k products
• Based on Magmi and CSV file exchange (product anagraphic)
• Nighttime update – the DIFF
• Exceptional whole catalog update
• The client accepted that the new products will be published with a delay of 1
day

9.
Warehouse update (1/2)
• No warehouse fully dedicated to web
• Shared with the offline shops
• It’s not possible to update the warehouse nighttime only and use that
stock during the day
• Frequent updates

11.
Reindex (1/2)
• The bigger the catalog, the slower the reindex
• Initially, the reidex was lauched after each update (15 min)
• After a while, the reindex started being too much time demanding:
the update cycle was starting when the previous update reindex cycle
was still running.

12.
Reindex (2/2)
• Solution:
• All the reindexes have been disabled, except for the stock reindex
• All reindexes are now performed after the nighttime import
• Today a full reindex takes around 75 minutes and generates a heavy
load on the database

13.
Catalog_url_rewrite (1/2)
• Magento 1 has a critical point with URL rewrite process:
• All product URLs are rewritten, also simple products that are «Not visible
individually» and exist only to be associated to a configurable.
• With 700k products catalog, this meant:
• Creating millions of rows in the catalog_url_rewrite table
• An URL rewrite process that takes hours to be completed

15.
Images generation (1/2)
• One of the main problems that we had to face was the product
thumbnails generation, done by Imagemagik
• Every day hundreds of products are published
 We verified that the frontend CPUs were often stressed because of
Imagemagik process and the writing operations on database

16.
Images generation (2/2)
• We found a solution in generating the thumbnails during the massive
import, so Imagemagik could work together with the import
procedure
• Nighttime, the images are generated and saved in a dedicated server,
without interfering with user navigation
• Today we have around 881K images saved

17.
Server response time
• With such a huge catalog, some categories hold even hundreds of
products
• The first loading time (if they are not cached) is indeed high
• We activated caching on Redis and Varnish
• Not enough, the first loading time was anyway too heavy

18.
Solutions 1/2
• Moving the cache clearing process during the night
• At 8 in the morning, the website navigation was starting to suffer
• We planned a job to pre-cache all the critical pages
• Minimized cache invalidation
• Clear cache only for products for which the stock quantity was updated via
WS

19.
Solutions 2/2
• Client training to better handle the cache erasing
• Minimized the number of filters in layered navigation
• Each filter increases the reindex time and the pages combinations not cached

21.
Solutions
• Initially a new backoffice server have been introduced
• MySql load problem was not solved. Reindex re-caching as well.
• We introduced a new process to handle the catalog, using an excel
file
• This improved the efficiency of who was managing the anagraphic data
• Massive excel file import performed each 3 days via FTP
• Categories still handled from backoffice

22.
Third party modules integration
• Critical point
• Not all the modules found in the Marketplace are developed in an
optimal way
• They «simply» load the products collection without pagination
• They execute nested query
• There are cycles on collections that initialize all products unnecessarily
• …
• A big profiling and optimization work was needed

23.
Feed export (Google Shopping & Co.) 1/2
• While the catalog was growing, the feed time export was encreasing
as well
• In the very beginning, the exports were handled by a Magento
module

24.
Feed export (Google Shopping & Co.) 2/2
• Solution steps:
• The module have been replaced with ad-hoc procedures, with high level of
optimization
• The exportation jobs are executed on backoffice server during the night, to
not load the frontend
• It have been introduced a MySql slave as data source, to not load the master
and the website as a consequence

29.
Elasticsearch
• For two reasons:
• Improve the search functionality offered to the client
• Minimize the load produced by the Magento internal search engine
• Critical issues to be faced:
• Catalog index time
• Only configurable products?
• What about the sizes?

30.
1M products
• Expected growth: in 1 year we’ll have 1M products
• At the moment we are performing tests with fake products
• We didn’t detect other critical aspects
• At the moment, we had to develop some more data exchange and feed
generation procedures optimization

31.
More sells, more page views
• Sessions are increasing  the number of not cached pages views is
increasing
• Pre – caching extension
• Increasing Varnish cache TTL
• Minimize products in categories and filters used
• Sales are increasing  increasing also frequency of out-of stock
products
• To be evaluated: the impact of new reindex and re-caching politics on client

32.
What if..?
• We’re planning with the client a Magento 2 migration
• We started our tests by migrating the actual Magento 1 environment
(700K products) to a Magento 2 installation
• We collected the results and still performing some other tests

37.
Magento 2 migration (4/4)
• We had some issues with the Catalog Fullsearch reindex (Magento 2)
• we had to apply a patch 
https://github.com/magento/magento2/issues/5146
• Catalog Fullsearch reindex without patch takes around 2 hours with
patch applied took around 1 hour, so the times are quite comparable
02:12:37
02:12:37