Dealing with large collections in Magento

Large collections are inevitable. If your Magento store has been running for more than a minute, you’ll have orders. You might have hundreds of orders. Thousands of orders. Hundreds of thousands of orders! When you’re constructing an extension for Magento, in all likelihood you’re not rocking a database with hundreds of thousands of items to test against.

Let’s run with orders as an example, and let’s build something that exports those orders.

You might come up with something like this; a quick, and dirty script to export …

Quick

PHP

1

2

3

4

$orders=Mage::getModel('sales/order')->getCollection();

foreach($ordersas$order){

echo$order->getIncrementId().PHP_EOL;

}

Let me explain why it’s quick and dirty…

Say you’ve got twenty orders. This is probably going to be “not too bad”. It’s probably going to take a couple of seconds, and you’ll get what you want.

Let’s break this down a little.

$orders = Mage::getModel('sales/order'); tells Magento to go and get a sales/order model – Mage_Sales_Model_Order. When you call getCollection(), you’re getting essentially, a collection of product objects. A this point, all Magento does is start building an SQL query. Nothing gets executed. Magento hasn’t gone anywhere near the database. You can add attributes, add extra filters, you can even decide, hey, I don’t even want to use this anymore. No expensive database calls. This is all thanks to Magento’s Lazy Loading. Essentially, Magento will only go to the database when you need access to the data – in our case, when you start the foreach.

So, let’s take our example above. It’s going to give you something like – SELECT main_table.* FROM sales_flat_order AS main_table – which goes and gets everything, and loads it into an object.

Using the same code, let’s say you’ve got a thousand orders. It’s probably going to take a while longer, but you can wait.

Let’ jump up to 100,000 orders. You’re probably going to be tapping that foot for a while. While you’re waiting, think about your server. It’s probably screaming. If it doesn’t timeout, it could run out of memory or resources before you get all 100,000 orders exported.

So, let’s have a think about how we can make a more lightweight way, server friendly (and hopefully, quicker).

Using an Iterator

PHP provides an array_walk function that will send each item through another function. One at a time. Magento provides a collection iterator – core/resource_iterator – Mage_Core_Model_Resource_Iterator which will send each element of a collection through another function. Given Magento’s lazy loading, this has the affect that it will LIMIT the SQL queries each time – essentially getting each row, one at a time.

The Mage_Core_Model_Resource_Iterator Class has a walk method which allows you to iterate over your collection.

array(array($this,'ordersCallback')),// our function to pass each order into

array('store_id'=>$storeId)

);

publicfunctionordersCallback($args)

{

$_product=Mage::getModel('sales/order')->setData($args['row']);

//process

}

Paging a collection

So, we’ve got a solution that allows for less memory blocking, and quicker execution, but it comes at the expense of having to hammer the database with a large number of small queries. This in some respects is a lot better than the previous solution, but by using techniques from the first and second solution, we can produce something better.

What we’re looking for is dealing with collections essentially, in managable chunks. From the PHP side, this is a question of just keeping arrays and objects to a “managable” size. To do this from the database side we need to pull out a set number of results at a time, in big enough chunks to make processing easy, but small enough chunks to not over-load the database.

In MySQL, what we’re looking at is using LIMIT so that we can “page” through results. LIMIT takes two values. The first is where to start, and the second is how big a chunk of data you want. So you could use LIMIT 0,10 to get the first 10 results. Then use LIMIT 11,10 to get the next 10 results.

Magento implements this using two functions.

setPageSize this informs Magento how many results you want to retrieve.setCurPage this sets our offset.

Example code

PHP

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

$ordersCollection=Mage::getModel('sales/order')->getCollection();

$ordersCollection->setPageSize(100);

$pages=$ordersCollection->getLastPageNumber();

$currentPage=1;

do{

$ordersCollection->setCurPage($currentPage);

$ordersCollection->load();

foreach($ordersCollectionas$_order){

// process

}

$currentPage++;

// clear collection (if not done, the same page will be loaded each loop) - will also free memory

$ordersCollection->clear();

}while($currentPage<=$pages);

}

This seems like a very nice solution and is one that I try to use where possible – as you never know how big a collection is likely to be.

One issue we do have though is that of moving goalposts. As each of our queries hit the database at different times and we don’t grab all the data at once, there is a possibility that new orders will bump our paging out, or that new orders will be added to a page that we haven’t captured.

Using PHPStorm

If you’re using PHPStorm, I’d suggest setting up a “live template” with the following code to make it quick and easy for you to deal with large collections.

Worth looking at

There is a generic batched iterator by Kalen Jordan which can be found here: https://gist.github.com/kalenjordan/5483065 – this introduces the concept of using an integrator, whilst batch processing a collection! Really neat to keep in a core module for code reuse that you can pass any collection to.

Your email address will not be published. Required fields are marked *

Warning: Use of undefined constant AUTH_KEY - assumed 'AUTH_KEY' (this will throw an Error in a future version of PHP) in /nfs/c11/h03/mnt/200050/domains/douglasradburn.co.uk/html/wp-content/plugins/stop-spam-comments/stop-spam.php on line 104

Comment

Name *

Email *

Website

Warning: Use of undefined constant AUTH_KEY - assumed 'AUTH_KEY' (this will throw an Error in a future version of PHP) in /nfs/c11/h03/mnt/200050/domains/douglasradburn.co.uk/html/wp-content/plugins/stop-spam-comments/stop-spam.php on line 104

Notice: It seems you have Javascript disabled in your Browser. In order to submit a comment to this post, please write this code along with your comment: 60d4e0c86489a901e6e49a4e8a254d0c

Will consult for Bananas

But some of your fine British Pounds will do equally well

If you want to work together or simply chat about an article I've written?
Then this button is for you...