Parameters

Must be greater than the longest line (in characters) to be found in
the CSV file (allowing for trailing line-end characters). Otherwise the
line is split in chunks of length characters,
unless the split would occur inside an enclosure.

Omitting this parameter (or setting it to 0 in PHP
5.1.0 and later) the maximum line length is not limited, which is
slightly slower.

delimiter

The optional delimiter parameter sets the field delimiter (one character only).

enclosure

The optional enclosure parameter sets the field enclosure character (one character only).

escape

The optional escape parameter sets the escape character (one character only).

Note:
Usually an enclosure character is escpaped inside
a field by doubling it; however, the escape
character can be used as an alternative. So for the default parameter
values "" and \" have the same
meaning. Other than allowing to escape the
enclosure character the
escape character has no special meaning; it isn't
even meant to escape itself.

Return Values

Returns an indexed array containing the fields read.

Note:

A blank line in a CSV file will be returned as an array
comprising a single null field, and will not be treated
as an error.

Note: If PHP is not properly recognizing
the line endings when reading files either on or created by a Macintosh
computer, enabling the
auto_detect_line_endings
run-time configuration option may help resolve the problem.

fgetcsv() returns NULL if an invalid
handle is supplied or FALSE on other errors,
including end of file.

This won't, you will still get concatenated fields at the new line position:<?php$handle = fopen('/path/to/file','r');ini_set('auto_detect_line_endings',TRUE);while ( ($data = fgetcsv($handle) ) !== FALSE ) {//process}ini_set('auto_detect_line_endings',FALSE);?>

Here is a OOP based importer similar to the one posted earlier. However, this is slightly more flexible in that you can import huge files without running out of memory, you just have to use a limit on the get() method

I had to write a script to validate and import a large amount of data to MySQL and it would stop running. I've noticed, after trawling the internet for a solution, quite a few people have had similar problems.

Solution? Like a dope I had forgotten to put in set_time_limit() within a loop. Otherwise the script would time out before it finished importing all the data.

Moral of the story? Only suppress errors after your script works with a large amount of test data!

I needed a function to analyse a file for delimiters and line endings prior to importing the file into MySQL using LOAD DATA LOCAL INFILE

I wrote this function to do the job, the results are (mostly) very accurate and it works nicely with large files too.<?phpfunction analyse_file($file, $capture_limit_in_kb = 10) {// capture starting memory usage$output['peak_mem']['start'] = memory_get_peak_usage(true);

// log the limit how much of the file was sampled (in Kb)$output['read_kb'] = $capture_limit_in_kb;

Only problem with fgetcsv(), at least in PHP 4.x -- any stray slash in the data that happens to come before a double-quote delimiter will break it -- ie, cause the field delimiter to be escaped. I can't find a direct way to deal with it, since fgetcsv() doesn't give you a chance to manipulate the line before it reads it and parses it...I've had to change all occurrences of '\"' to '" in the file first before feeding ot to fgetcsv(). Otherwise this is perfect for that Microsoft-CSV formula, deals gracefully with all the issues.

The fgetcsv function seems to follow the MS excel conventions, which means:

- The quoting character is escaped by itself and not the back slash. (i.e.Let's use the double quote (") as the quoting character:

Two double quotes "" will give a single " once parsed, if they are inside a quoted field (otherwise neither of them will be removed).

\" will give \" whether it is in a quoted field or not (same for \\) , and

if a single double quote is inside a quoted field it will be removed. If it is not inside a quoted field it will stay).

- leading and trailing spaces (\s or \t) are never removed, regardless of whether they are in quoted fields or not.

- Line breaks within fields are dealt with correctly if they are in quoted fields. (So previous comments stating the opposite are wrong, unless they are using a different PHP version.... I am using 4.4.0.)

So fgetcsv if actually very complete and can deal with every possible situation. (It does need help for macintosh line breaks though, as mentioned in the help files.)

I wish I knew all this from the start. From my own benchmarks fgetcsv strikes a very good compromise between memory consumption and speed.

-------------------------Note: If back slashes are used to escape quotes they can easily be removed afterwards. Same for leading and trailing spaces.

Note that fgetcsv, at least in PHP 5.3 or previous, will NOT work with UTF-16 encoded files. Your options are to convert the entire file to ISO-8859-1 (or latin1), or convert line by line and convert each line into ISO-8859-1 encoding, then use str_getcsv (or compatible backwards-compatible implementation). If you need to read non-latin alphabets, probably best to convert to UTF-8.

See str_getcsv for a backwards-compatible version of it with PHP < 5.3, and see utf8_decode for a function written by Rasmus Andersson which provides utf16_decode. The modification I added was that the BOP appears at the top of the file, then not on subsequent lines. So you need to store the endian-ness, and then re-send it upon each subsequent line decoding. This modified version returns the endianness, if it's not available:

But that's perhaps because my platform didn't support it. However, fgetcsv only supports single characters for the delimiter, etc. and complains if you pass in a UTF-16 version of said character, so I gave up on that rather quickly.

Here's something I put together this morning. It allows you to read rows from your CSV and get values based on the name of the column. This works great when your header columns are not always in the same order; like when you're processing many feeds from different customers. Also makes for cleaner, easier to manage code.

function __construct($sFileName)
{
//quick and dirty opening and processing.. you may wish to clean this up
if ($this->m_fp = fopen($sFileName, 'r'))
{
$this->processHeader();
}
}

function __call($sMethodName, $saArgs)
{
//check to see if this is a set() or get() request, and extract the name
if (preg_match("/[sg]et(.*)/", $sMethodName, $saFound))
{
//convert the name portion of the [gs]et to uppercase for header checking
$sName = strtoupper($saFound[1]);

//get a nicely formatted header name. This will take product_id and make
//it PRODUCTID in the header map. So now you won't need to worry about whether you need
//to do a getProductID, or getproductid, or getProductId.. all will work.
public static function GetNiceHeaderName($sName)
{
return strtoupper(preg_replace('/[^A-Za-z0-9]/', '', $sName));
}

//process the header entry so we can map our named header fields to a numerical index, which
//we'll use when we use fgetcsv().
private function processHeader()
{
$sLine = fgets($this->m_fp);
//you'll want to make this configurable
$saFields = split(",", $sLine);

$nIndex = 0;
foreach ($saFields as $sField)
{
//get the nice name to use for "get" and "set".
$sField = trim($sField);

Currently, I'm working on a script that takes data exported from one database and converts it so that it can be imported in another. I have been given a bunch of CSV files containing data to make the translation possible --actually, the values are semi-colon separated, not comma-separated.

The following script reads a CSV file with field names on the first line (for an example, see below the script), and turns it into a hash table where the key is one of the fields.

$index_out = $index_in;
}
else
{
// If a column that is used as part of the key to the hash table is supplied
// as a name rather than as an integer, then determine that named column's
// integer index in the $names array, because the integer index is used, below.
$get_index = array_keys($names, $index_in);
$index_out = $get_index[0];

if(is_null($index_out))
{
// A column name was given (as opposed to an integer index), but the
// name was not found in the first row that was read from the CSV file.
fclose($handle);
return -3;
}
}

I used fgetcsv to read pipe-delimited data files, and ran into the following quirk.

The data file contained data similar to this:

RECNUM|TEXT|COMMENT1|hi!|some comment2|"error!|another comment3|where does this go?|yet another comment4|the end!"|last comment

I read the file like this:

<?php$row = fgetcsv( $fi, $length, '|' );?>

This causes a problem on record 2: the quote immediately after the pipe causes the file to be read up to the following quote --in this case, in record 4. Everything in between was stored in a single element of $row.

In this particular case it is easy to spot, but my script was processing thousands of records and it took me some time to figure out what went wrong.

The annoying thing is, that there doesn't seem to be an elegant fix. You can't tell PHP not to use an enclosure --for example, like this:

<?php$row = fgetcsv( $fi, $length, '|', '' );?>

(Well, you can tell PHP that, but it doesn't work.)

So you'd have to resort to a solution where you use an extremely unlikely enclosure, but since the enclosure can only be one character long, it may be hard to find.

Alternatively (and IMNSHO: more elegantly), you can choose to read these files like this, instead:

<?php$line = fgets( $fi, $length );$row = explode( '|', $line );?>

As it's more intuitive and resilient, I've decided to favor this 'construct' over fgetcsv from now on.

I had a csv file whose fields included data with line endings (CRLF created by hitting the carriage returns in html textarea). Of course, the LF in these fields was escaped by MySQL during the creation of the csv. Problem is I could NOT get fgetcsv to work correctly here, since each and every LF was regarded as the end of a line of the csv file, even when it was escaped!

Since what I wanted was to get THE FIRST LINE of the csv file, then count the number of fields by exploding on all unescaped commas, I had to resort to this:

<?php/*First five lines of csv: the 4th row has a line-break within a data field. The LFs represent line-feeds or \n1,okonkwo joseph,nil,2010-01-12 17:41:40LF2,okafor john,cq and sulphonamides,2010-01-12 17:58:03LF3,okoye andrew,lives with hubby in abuja,2011-03-30 13:39:19LF4,okeke peter,In 2001\, had appendicectomy in AbaCR\LFIn 2004\, had ELCS at a private hoapital in Lagos,2011-03-30 13:39:19LF5,adewale chris,cq and sulphonamides,2010-01-12 17:58:03LF

Note that fgetcsv() uses the system locale setting to make assumptions about character encoding.So if you are trying to process a UTF-8 CSV file on an EUC-JP server (for example),you will need to do something like this before you call fgetcsv():

setlocale(LC_ALL, 'ja_JP.UTF8');

[Also not that setlocale() doesn't *permanently* affect the system locale setting]

The $escape parameter is completely unintuitive, but it is not broken. Here is a breakdown of fgetcsv()'s behaviour. In the examples I've used underscores (_) to show spaces and brackets ([]) to show individual fields:

- Leading whitespace in each field will be stripped if it comes immediately before an enclosure: ___"foo" -> [foo]- There can only be one enclosure per field, although it will be concatenated with any data that appears between the end enclosure and the next delimiter/new line, including any trailing whitespaces ___"foo"_"bar"__ -> [foo_"bar"__]- If the field does not start with (leading whitespace +) an enclosure, the whole field is interpreted as raw data, even if enclosure characters appear elsewhere within the field: _foo"bar"_ -> [_foo"bar"_]- Delimiters cannot be escaped outside enclosures, they have to be enclosed instead. Delimiters don't need to be escaped inside enclosures: "foo,bar","baz,qux" -> [foo,bar][baz,qux]; foo\,bar -> [foo\][bar]; "foo\,bar" -> [foo\,bar]- Double enclosures inside single enclosures are converted to single enclosures: "foobar" -> [foobar]; "foo""bar" -> [foo"bar]; """foo""" -> ["foo"]; ""foo"" -> [foo""] (empty enclosure followed by raw data)- The $escape parameter works as expected, but unlike enclosures DOES NOT get unescaped. It is necessary to unescape the data elsewhere in the code: "\"foo\"" -> [\"foo\"]; "foo\"bar" -> [foo\"bar]

Note: the following data (which is a very common problem) is invalid: "\". Its structure is equivalent to "@ or in other words, an open enclosure, some data and no closing enclosure.

Caution: ideally, there shouldn't be any unescaped escape characters outside enclosures; the field should be enclosed and escaped instead. If there are any, they could end up being removed as well, depending on the function used.

For anyone else struggling with disappearing non-latin characters in one-byte encodings - setting LANG env var (as the manual states) does not help at all. Look at LC_ALL instead.

In my case it was set to "pl_PL.utf8" but since my input file was in CP1250 most of polish characters (but not all of them!) had gone missing and city of "Łódź" had become just "dź". I've "fixed" it with "pl_PL".

NOTE: i think the example in this page should be changed. I think the following line should be added to just inside the while loop:

if( $data === true ) die('PAY ATTENTION DUMMY');

The reasoning for this, is if the user happens to take out the seemingly "extra" brackets around the "$data" part, they will get a while bunch of rows where all they get is the value true instead of an array. This way, if the user borrows the code but then later on removes the seemingly "extra" brackets, they have something keeping them safe from themselves at least.

I say this because this just happened to some poorly designed code in a system I manage, that as a result caused a whole bunch of blank records to be imported into a database.

For my surprise this function using 5.3.2 on windows now when it reads an empty line it returns FALSE instead of an empty array as the previous versions. This affect code that do not check the result and tries something like this:

If you want to get your CSV rows (in this case a file posted from a form) into one big array, you can use this code. You may have to play around with the ord() and explode separators depending on your line endings. You can change to the commented out explode to use " delimiter.

This function takes a csv line and splits it into an array, much like fgetcsv does but you can use it on data that isn't coming in from a file, or you can read data from a file some other way (like if your Mac files aren't being read correctly) and use this to split it. If you have any corrections, comments (good or bad), etc. I would appreciate an email to the above address.

The extra spaces behind a few fields in the example (which are useful, when you manually manage a small csv database to align the columns) were ignored by fgetcsv from PHP 4.3. With the new 4.4.1 release they get appended to the string, so you end up with "Riverstreet " instead the expected "Riverstreet".

I've had alot of projects recently dealing with csv files, so I created the following class to read a csv file and return an array of arrays with the column names as keys. The only requirement is that the 1st row contain the column headings.

I only wrote it today, so I'll probably expand on it in the near future.

The fgetcsv() is really great. CSV files are not as simple as they seem. They may span multiple lines, contain unmatched double quotes (like: 32" screen size) and lot of other surprising variants. My problem was that I had an ISO-8859-1 encoded file and fgetcsv() expects it to be UTF-8 encoded. I tried with a range of regular expressions I found on the Internet but none of them worked 100 %. So, if you haven't discovered it yet: fgetcsv() is king. My server doesn't have str_getcsv() so I had to figure out some way to use fgetcsv() for my ISO-8859-1 files.

using the example above with a length of 1000 will truncate any csv rows longer than 1000 bytes, the remainder of that line will be represented next time you call $data = fgetcsv(). one solution i have seen to this is to use filesize("test.csv") as the length argument, however sometimes with large csv files you may encounter errors for exceeding the memory limit. to remedy this, i have read the csv file into an array, looping through that array to find the longest line in the csv, and then using that value as my length argument, unset($array) to free up the memory. im open to better solutions.

I was using the function to import a csv file that had some values with "\" included this confused the import function to ignore the End of line and hence an incorrect number of rows is returned to me.

Suggest you try to replace that \ and { brackets as well since they have the same function of an escape character before u import the function