I have a huge file that I would like to read so I can fill my mysql database. I tried to use the PHPExcel library but I get an error when I want to load my file :

I have already increase the value of the memory_limit in the php.ini file but it's still not enough. My Excel file is 55MB (5 109 719 cells). Do you have any idea how to solve this problem ? Thanks :)

There's plenty been written about the memory usage of PHPExcel here on the forum; so reading through some of those previous discussions might give you a few ideas. PHPExcel holds an "in memory" representation of a spreadsheet, and is susceptible to
PHP memory limitations.

The "rule of thumb" that I've always used is an average of about 1k/cell, so your 5M cell workbook is going to require 5GB of memory. However, there are a number of ways that you can reduce that requirement. These can be combined, depending on exactly
what information you need to access within your workbook, and what you want to do with it.

If you have multiple worksheets, but don't need to load all of them, then you can limit the worksheets that the Reader will load using the setLoadSheetsOnly() method.
To load a single named worksheet:

$inputFileType = 'Excel5';

$inputFileName = './sampleData/example1.xls';

$sheetname = 'Data Sheet #2';

/** Create a new Reader of the type defined in $inputFileType **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);

/** Advise the Reader of which WorkSheets we want to load **/

$objReader->setLoadSheetsOnly($sheetname);

/** Load $inputFileName to a PHPExcel Object **/

$objPHPExcel = $objReader->load($inputFileName);

Or you can specify several worksheets with one call to setLoadSheetsOnly() by passing an array of names:

$inputFileType = 'Excel5';

$inputFileName = './sampleData/example1.xls';

$sheetnames = array('Data Sheet #1','Data Sheet #3');

/** Create a new Reader of the type defined in $inputFileType **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);

/** Advise the Reader of which WorkSheets we want to load **/

$objReader->setLoadSheetsOnly($sheetnames);

/** Load $inputFileName to a PHPExcel Object **/

$objPHPExcel = $objReader->load($inputFileName);

If you only need to access part of a worksheet, then you can define a Read Filter to identify just which cells you actually want to load:

/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/

$objReader->setReadFilter($chunkFilter);

/** Loop to read our worksheet in "chunk size" blocks **/

/** $startRow is set to 2 initially because we always read the headings in row #1 **/

for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {

/** Tell the Read Filter, the limits on which rows we want to read this iteration **/

$chunkFilter->setRows($startRow,$chunkSize);

/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/

$objPHPExcel = $objReader->load($inputFileName);

// Do some processing here
// Free up some of the memory
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);

}

If you don't need to load formatting information, but only the worksheet data, then the setReadDataOnly() method will tell the reader only to load cell values, ignoring any cell formatting:

$inputFileType = 'Excel5';

$inputFileName = './sampleData/example1.xls';

/** Create a new Reader of the type defined in $inputFileType **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);

/** Advise the Reader that we only want to load cell data, not formatting **/

$objReader->setReadDataOnly(true);

/** Load $inputFileName to a PHPExcel Object **/

$objPHPExcel = $objReader->load($inputFileName);

Use cell caching. This is a method for reducing the PHP memory that is required for each cell, but at a cost in speed. It works by storing the cell objects in a compressed format, or outside of PHP's memory (eg. disk, APC, memcache)... but the more memory you
save, the slower your scripts will execute. You can, however, reduce the memory required by each cell to about 300bytes, so your 5M cells would require about 1.4GB of PHP memory.

Cell caching is described in section 4.2.1 of the Developer Documentation

Thanks a lot for your answers !! I tried to implement some of the above solutions.

This works fine... except for the time processing. I just access to 10 records to display them and it longed 5 minutes. Is my code well written? Is there a way to improve the speed of processing?

I have to agree that it seems excessively slow, and I can't explain why... even my little netbook should process 10 records from a 5M cell worksheet lot faster than that. However, I wouldn't use a chunksize of 1, but would set the value to as high
as I could without exceeding memory limits. For every "chunk" that's loaded, you're still reading the entire file; so you're reading and parsing 55MB every iteration of the chunk loop.... that will take time. You should try to minimize the number
of iterations... read "chunks" of 100 rows at a time, or 500 rows at a time, as many as you can handle in memory.

If you remove the lines

$objPHPExcel->disconnectWorksheets(); unset($objPHPExcel);

you will get a degree of memory leakage, but it will also process the loop faster.

For every "chunk" that's loaded, you're still reading the entire file; so you're reading and parsing 55MB every iteration of the chunk loop.... that will take time. You should try to minimize the number of iterations... read "chunks" of 100 rows at a time,
or 500 rows at a time, as many as you can handle in memory.

As far as I understand, each time it parses the whole file to know where to read the needed rows. So it creates a certain map of the file.

So why should it create the map of the file each time? Coudn't it be cached? So it gets creates the cache only at the first pass, and then either uses the cache (if exists or passed to the method), or generates the new one. There can be an option to use
the cache (and possibly a link to the cache file) or to generate the new one.

For each "chunk" it simply instantiates a new PHPExcel object, reads through the (entire) Excel file, and tests each cell read from the file against the read filter to determine whether to store it or discard it. If the choice is "discard" then no memory
is used within the PHPExcel object for that cell, only in the reader. It doesn't create any "map" during this process.

First point to note is that readFilter is designed simply to allow reading of part of a worksheet rather than the entirety of that worksheet... it doesn't create any map, because that is not a part of its purpose which is simply to identify whether a cell
should be read or not. Reading a file in chunks is simply a way of taking advantage of the existence of this filtering feature to process a large file in several stages rather than all in one go. The filter doesn't even know it's being used in a loop.

If you were to create such a map, it would require a substantial amount of memory.... almost exactly the amount of memory required as though you'd actually read in every cell... so there would be no real benefit, and a lot of drawbacks.

I want to read line, process it with my some tasks, read next line and so on.

This is how I work with a huge CSV. I have a class and here is the logic:
1. I read line by line and check if the time passed is not close to the time limit. So when reading a line I don't use much memory. I never handle more then a line in the memory.
2. If I'm close to the time limit, then I save the current file position to a class variable, store my class to a file (json encoded) and rerun my web-page with some parameters in QUERY, like "continue".
3. When my web-page page is loaded, it reads the QUERY and and if there is a key to continue, reads the stored class, decodes it (remember it knows the file position stored in the class) and reads the next line.

Time is not important in my case. Memory limit is anyway important, but only for one step.

Even with not important time you note above that chunksize=1 it extremly slow. I'm afraid of the work "extremly".

You suggest "However, I wouldn't use a chunksize of 1, but would set the value to as high as I could without exceeding memory limits". This can be a way, but demands some extra logic on determining the optimal chunksize. But the xml is a non-uniform
and some rows can be small, others can be larger. One row is anyway within the limits, but a number of rows - depends.

I used above code, it is working fine with .xls file. But I am facing an issue with .xlsx file. I need to get the total number of rows of sheet but I am unable to get it with .xlsx file. It ways return 1 total rows. When I removed chunkfilter code then it
returns correct total rows. Following is code.

I also check by increasing or decreasing chunk size but i always got 1 total rows in case of .xlsx file. Please again note that there is no issue with .xls file. I also checked by setting read only data ($objReader->setReadDataOnly(true);) but no difference.

Thanks. It works. But in this way we need to load file two times. First without read filter & second time with read filter. Isn't take extra time to load file two times? If we have big file then it will take long time.

But you say you're already loading the entire file once already, before you reread it using the read filter.... I'm clearly missing something in what you're trying to do because I can't see why you should need to load it twice... once without the read filter
("First without read filter") the second time with ("second time with read filter")?

Thanks for your kind help. I understand, there is no need to reload the file. Thanks.

Actually I am facing "out of memory" issue on writing when I process large file. I have a file of near about 4MB having 55308 rows & 16 columns. I am processing this file on 64 bit Windows Server 2008 with 8GB RAM. I assigned memory limit up
to 6GB for PHP script.

Following is flow of the program.

Step 1: Verify Header Columns & Reset Styles
In this step, program verify the header row & reset background colors and borders of all cells. First program verifies the columns heading, if required column headings are missing on the sheet then program automatically adds these new columns on appropriate
columns indexed & reset the styles. Then program saves this file.

When I process above file of 4MB, I got the following error.

Fatal error: Out of memory (allocated 1431568384) (tried to allocate 58 bytes) in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Worksheet.php on line 1161

Please check following image to see system memory status before processing of this file.

http://s7.postimage.org/jbkhppqvv/Before.png

Following image is taken when file is in-process & this is peak memory status.

http://s17.postimage.org/f5cyw7rf3/In_process.png

As you can notices, there are lot of free memory available but program still generate out of memory error.

Can you please let me know how I can process this file with above code?

Step 2: Reading Data.
In this step, program reload the saved file & read all the data & store into database. Memory issue resolved from here by using above technique.

Step 3: Export Sheet
In this step, program performs some calculations & update data on database. To export this calculated data into new sheet, program takes long time on writing & at the end out of memory error generated.

Please let me know how to resolve this issue. Is there any way we write data into multiple files & at the end merge these files into one file?

>> I assigned memory limit up to 6GB for PHP script.
>> Fatal error: Out of memory (allocated 1431568384) (tried to allocate 58 bytes) in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Worksheet.php on line 1161

1,431,568,384 doesn't equate to a limit of 6GB PHP memory limit, it's about 1.33GB - double check how much memory PHP is really set to use.

If you're running 32-bit PHP, you have an upper limit of 4GB, irrespective of the 64-bit Operating System. My "rule of thumb" is about 1k/cell on 32-bit PHP, 1.6k/cell on 64-bit PHP.

Rather than saving the file then reloading and doing the additional processing, simply loop through the worksheet a second time doing your database update, and only save the file and present it to the user at the very end of the script once all processing
has been finished.

1. Yes, I have rechecked memory limit is 6GB. I donot know why it stops on 1.33GB. Is it restricted on PHPExcel library?

2. If 1.6k/cell then this file should be processed within 1.36GB.

3. Actually after storing data into database, there are lot of operations performed on this data & lot of reports & analysis generated from this data. User has also ability to perform many operations. So due to large number of operations, I needed
to store into database. Program also populate many other data from this data. So on exporting step, program export this data on different ways depends on the user's selection. 55308 rows may increase upto 3 times. That's why data stored on the database &
program creates new excel file on export. The file is exported with formatting, every cell or every row may contains different styles.
This all functionality is working fine with small file. The only issue occurred with large file on writing.

4. I used cell caching & I got following error.

Fatal error: Call to a member function getCellCacheController() on a non-object in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Cell.php on line
108

This error occurred when we reset background color and borders of all cells. If I commented this code then it worked fine.

I also analyzed that reset background color & borders take more resources. When I commented this code & enable cell caching then I were able to process this file but it took long time. But reset background color & borders functionality is also
necessary. So I need to process this file with this functionality. Please let me know further about this. Thanks.

5. Is there any way we write data into multiple files & at the end merge these files into one file?

1. Yes, I have rechecked memory limit is 6GB. I donot know why it stops on 1.33GB. Is it restricted on PHPExcel library?

PHPExcel does nothing to restrict the size of files that it can process, other then applying restrictions of columns and rows as per Excel itself within the writers. The only memory limitations and execution time limitations are those imposed by PHP itself.

And I'm still confused by the figures in your error message.

sohailisonline wrote:

2. If 1.6k/cell then this file should be processed within 1.36GB.

This is a rule of thumb for maintaining the workbook in memory. Using a writer almost doubles that, but still well within your 6GB.

4. I used cell caching & I got following error.

Fatal error: Call to a member function getCellCacheController() on a non-object in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Cell.php on line
108

This error occurred when we reset background color and borders of all cells. If I commented this code then it worked fine.

I also analyzed that reset background color & borders take more resources. When I commented this code & enable cell caching then I were able to process this file but it took long time. But reset background color & borders functionality is also
necessary. So I need to process this file with this functionality. Please let me know further about this. Thanks.

Styling does take more resources.

Can you provide more details of the cell caching you're using. If there's problems with one or other of the caching mechanisms, it would be useful to know which one, otherwise I have to check 3000 lines of code for potential errors rather than a couple of
hundred.

sohailisonline wrote:

5. Is there any way we write data into multiple files & at the end merge these files into one file?

You can create multiple files, but you'd still need to merge them, and that would use all the memory required for the fully merged PHPExcel object, as well as for (at least) the last file you loaded, so it would take even more memory.

I just found that above issue occurred when we provide last row in range to reset the background color & borders. For example if we have total 50 rows in the sheet & we provide range like "A1:AB50" then above cache error would occur. If
we provide range one row less like "A1:AB51" then it would work fine.

I am facing an issue with .xlsx file. I have another large .xlsx file of 13MB, having 50,000 rows & 60 columns. Memory limit is 7GB. When I process this file with above code then I got following warning message. After this, program unable to read cell
values. It always return empty cell value.

The problem with the SimpleXML Parser used by PHPExcel's readers (for the XML based formats such as .xlsx files) is that it loads the entire file into memory for parsing, though I'm surprised that 7GB of memory can't handle the file. I assume you're using
a 64-bit version of PHP (otherwise you're limited to 4GB), and this is about 60% more greedy in memory usage than 32-bit PHP.

The roadmap for this year includes plans to switch to using a pull parser (XMLReader) which doesn't load the entire file into memory, only those elements that are explicitly requested, as and when they are needed, and the elements are removed from memory
when no longer being accessed.

Then I have two different parsing methods, one using SimpleXML as at present, the second using XMLReader. Working with a simple worksheet of 4096x16. For the reader techniques, I get the following results:

Data Extract and Load Time (s)

Memory after Extract and Load

Memory after Parse

Peak Memory Usage

Total Time (s)

getFromName() read string to simpleXML

0.5349

524,288

1,048,576

4,980,736

0.8330

getStream() and stream_get_contents() read string to simpleXML

0.5874

524,288

1,048,576

4,980,736

0.8855

getStream() and fread() read string to simpleXML

0.5888

524,288

1,048,576

4,980,736

0.8869

getFromName() read string to XMLReader

0.0348

524,288

524,288

4,980,736

1.3534

getStream() and stream_get_contents() read string to XMLReader

0.0711

524,288

524,288

4,980,736

1.3896

getStream() and fread() read string to XMLReader

0.0824

524,288

524,288

4,980,736

1.4009

open() zip stream directly using XMLReader

0.0009

524,288

524,288

524,288

1.3194

Clearly streaming the xml directly from the zip file to XMLReader is incredibly fast, and eliminates all the memory overheads of the extraction, or of SimpleXML. Parsing using XMLReader is slower than SimpleXML; but overall well worth the effort for the
memory savings, but at a performance cost. It's the old trade-off... I'd expected XMLReader to be slower than SimpleXML, but not quite as slow as that.

I'm looking at a possible hybrid method using XMLReader to locate the nodes and then SimpleXML to parse them, which may or may not improve the XMLReader method speed... initial experiments suggest that it's even slower than straight XMLReader.

The real drawback: it adds more complexity to the code, and it's a significant effort to convert the code to use these methods. This isn't something that can be done for the next release, but I will target at least the Excel2007 (.xlsx) Reader for the 1.7.9
release, with the other zipped-xml readers before the end of the year. Rather than a simple replacement for the existing readers, I'll try to make it configurable, offering the choice of SimpleXML or XMLReader.

It already is possible to get the list of worksheets without loading the entire file using the listWorksheetNames() or listWorksheetInfo() methods of the Readers. listWorksheetNames() returns just an array of the actual worksheet names, while listWorksheetInfo()
returns the size of each worksheet as well. It is also possible to define a sheet (or sheets) to load rather than to load all worksheets using the Reader's setLoadSheetsOnly() method, and to specify a range of cells (rather than all cells) to read using a
Read Filter. These options are already available for all Readers (xlsx, xls, ods, csv, gnumeric, etc.

The listWorksheetNames() and listWorksheetInfo() methods don't return anything more than I've indicated; and while named ranges would be possible, it's probably not a requirement that a majority of users would consider necessary (especially as it would
slow down those methods). The main benefit would be that you could configure a read filter based on the value of the named range, so I can add it to the TODO list, but it won't be a high priority.

Streaming is still likely to be version 1.7.9 (probably somewhere around November/December). Because it's trade-off between speed/memory, I need to work out how to make it configurable so that people who want speed and aren't too worried about memory
can still use the simpleXML Readers, while those who need the memory and are less worried about speed can use the XMLReader-based Reader... without giving myself too much duplication to maintain between two versions of the Readers.

I'm also starting work on some changes to the cell definition to handle array formulae, which entails deprecating some cell methods, so that will be staged across several releases (culminating in a complete rewrite of the calculation engine) which may be
the one thing that delays the XMLReader alternative to SimpleXML - especially as I'll need to modify every single reader/writer that needs to support array formulae to handle this... I want to get that change complete before I branch the XML-based readers
into SimpleXML/XMLReader variants. If my timing is right, both those changes will be in 1.7.9, but it might push November into December.

I got the same issue (Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 115 bytes)) when reading Excel file. I have tried some solutions like increasing memory or using cache but all of them cannot solve this issue. Please help!!!