moritz's quotesplit didn't work for me. It seemed to split on a comma even though it was between a pair of quotes. However, this did work:

function quotesplit($s, $splitter=','){//First step is to split it up into the bits that are surrounded by quotes and the bits that aren't. Adding the delimiter to the ends simplifies the logic further down

$getstrings = split('\"', $splitter.$s.$splitter);

//$instring toggles so we know if we are in a quoted string or not $delimlen = strlen($splitter); $instring = 0;

while (list($arg, $val) = each($getstrings)) { if ($instring==1) {//Add the whole string, untouched to the result array. $result[] = $val; $instring = 0; } else {//Break up the string according to the delimiter character//Each string has extraneous delimiters around it (inc the ones we added above), so they need to be stripped off $temparray = split($splitter, substr($val, $delimlen, strlen($val)-$delimlen-$delimlen ) );

The example from ramkumar rajendran did not work. $line = split("/\n", $input_several_lines_long);I do not know why this does not work for me.

The following has worked for me to get a maximum of 2 array parts separated by the first new line (independant if saved under UNIX or WINDOWS):$line = preg_split('/[\n\r]+/',$input_several_lines_long,2);Also empty lines are not considered here.

If you are looking for EITHER open square brackets OR close square brackets, then '[[]]' won't work (reasonably expected), but neither will '[\[\]]', nor with any number of escapes. HOWEVER, if your pattern is '[][]' it will work.

I use charset UTF-8. When I use char &#65533; the split function ad an empty string between "2" and "12"... Why?

Explanation:============

UTF-8 charset codes some characters (like the "&#65533;" character) into two bytes. In fact the regular expresion "[&#65533;]" contains 4 bytes (4 non-unicode characters). To demonstrate the real situation I wrote following example:

Split is acting exactly as it should; it splits on regular expressions.A period is a regular expression pattern for a single character.So, an actual period must be escaped with a backslash: '\.'A period within brackets is not an any-character pattern, because it doesnot make sense in that context.

Beware that regular expressions can be confusing becuase thereare a few different varieties of patterns.

Ups! It seems that neither explode nor split REALY takes a STRING but only a single character as a string for splitting the string. I found this problem in one of my codes when trying to split a string using ";\n" as breaking string. The result, only ";" was thaken... the rest of the string was ignored. Same when I tried to substitute "\n" by any other thing. :(

split() doesn't like NUL characters within the string, it treats the first one it meets as the end of the string, so if you have data you want to split that can contain a NUL character you'll need to convert it into something else first, eg:

[Ed. note: Close. The pipe *is* an operator in PHP, but
the reason this fails is because it's also an operator
in the regex syntax. The distinction here is important
since a PHP operator inside a string is just a character.]

wchris's quotesplit assumes that anything that is quoted must also be a complete delimiter-seperated entry by itself. This version does not. It also uses split's argument order.

function quotesplit( $splitter=',', $s ) { //First step is to split it up into the bits that are surrounded by quotes //and the bits that aren't. Adding the delimiter to the ends simplifies //the logic further down

$getstrings = explode('"', $splitter.$s.$splitter);

//$instring toggles so we know if we are in a quoted string or not $delimlen = strlen($splitter); $instring = 0;

while (list($arg, $val) = each($getstrings)) { if ($instring==1) { //Add the whole string, untouched to the previous value in the array $result[count($result)-1] = $result[count($result)-1].$val; $instring = 0; } else { //Break up the string according to the delimiter character //Each string has extraneous delimiters around it (inc the ones we added //above), so they need to be stripped off $temparray = split($splitter, substr($val, $delimlen, strlen($val)-$delimlen-$delimlen+1 ) );

Actually, this version is better than the last I submitted. The goal here is to be able to engage in *multiple* delimeter removal passes; for all but the last pass, set the third value to "1", and everything should go well.

function quotesplit( $splitter=',', $s, $restore_quotes=0 ) { //First step is to split it up into the bits that are surrounded by quotes //and the bits that aren't. Adding the delimiter to the ends simplifies //the logic further down

$getstrings = explode('"', $splitter.$s.$splitter);

//$instring toggles so we know if we are in a quoted string or not $delimlen = strlen($splitter); $instring = 0;

while (list($arg, $val) = each($getstrings)) { if ($instring==1) { if( $restore_quotes ) { //Add the whole string, untouched to the previous value in the array $result[count($result)-1] = $result[count($result)-1].'"'.$val.'"'; } else { //Add the whole string, untouched to the array $result[] = $val; } $instring = 0; } else { //Break up the string according to the delimiter character //Each string has extraneous delimiters around it (inc the ones we added //above), so they need to be stripped off $temparray = split($splitter, substr($val, $delimlen, strlen($val)-$delimlen-$delimlen+1 ) );

This *should* work for any valid CSV string, regardless of what it contains inside its quotes (using RFC 4180). It should also be faster than most of the others I've seen. It's very simple in concept, and thoroughly commented.

Though this is obvious, the manual is a bit incorrect when claiming that the return will always be 1+number of time the split pattern occures. If the split pattern is the first part of the string, the return will still be 1. E.g.

$a = split("zz," "zzxsj.com");count($a);

=> 1.

The return of this can not in anyway be seperated from the return where the split pattern is not found.

I kept running into the same issue Chris Tyler experienced with lewis [ at t] hcoms [d dot t] co [d dot t] uk's function before realizing that Chris had come up with a solution. However, that solution was just a little off it seems, unless your CSV only contains one line.

If you simply add another --length; in the place you suggested, then the function will always trim the last two characters on the line. Since the newline character is the last character on the line and the redundant quote (or other enclosure) is the second to last character, this works for the final segment. But when parsing segments that do not include a newline character, you end up trimming the redundant enclosure and the last character before the enclosure.

For example,

"he","she","him","her"\r\n

becomes

[0] => h[1] => sh[2] => hi[3] => her

Since the segment could end with the enclosure (i.e., ") or the enclosure followed by the newline (i.e., "\r\n), you have make sure you are only adding another --length; when the latter is the case. Replacing the code block that you suggested with the following will do the trick.

# Is the last thing a newline?if( $char == $newline ){ # Well then get rid of it --$length;}

# Is the last thing a quote?if( $trim_quote ){ # Well then get rid of it --$length;}

I've tested this only for the purposes of the script I'm working on at this time. So, there could be other bugs I haven't come across, but this seems like the easiest way to eliminate the redundant enclosure.