The result of split is a list of substrings of string that are delimitied by splitChars, which is a list of characters. If adjacent characters in string are also in splitChars, the result will the empty substring between those adjacent characters. If the first character of string is in splitChars, the result will include the empty substring before the first character, and if the last character of string is in splitchars, the result will include the empty string after the last character.

Strick: Oops, i forgot to actually use split in my script above. So now I test four different notions of white, and get three different answers. I understand why Tcl's builtin list-splitting rules must be fixed, regardless of locale. But it seems 'split' should use the list-splitting rule or the the string is space rule, but it uses its own (pre-unicode?) rule:

escargo 2006-01-27: If split used chars 9 10 11 12 13 32 then there would be only two sets, with the smaller set as a proper subset of the larger set. The two characters that would have to be added are the vertical tab and form feed.

splitChars is a series of 0 to n individual characters. However, if you want to split on a specific sequence of 2 or more characters together, or if you want to split on a regular expression, split will not work for you. See Tcllib's textutil::splitx, or ycl::string::delimit for that functionality.

This version is recursive, so it may be better to rewrite it if you plan to use the function against very long strings with many separators. The difference between wsplit and splitx is that splitx uses regexp, so it may create problems with unknown separators.

IL 2005-01-03: on the near anniversary of this proc, the iterative version, quick-n-dirty since I'm in a hurry to parse some html...

proc wsplit {str sep} {
set out {}
set sepLen [string length $sep]
if {$sepLen < 2} {
return [split $str $sep]
}
while {[set idx [string first $sep $str]] >= 0} {
# the left part : the current element
lappend out [string range $str 0 [expr {$idx-1}]]
# get the right part and iterate with it
set str [string range $str [incr idx $sepLen] end]
}
# there is no separator anymore, but keep in mind the right part must be
# appended
lappend out $str
}

escargo: So what should you use when you don't care how many spaces were between tokens, you just want the non-blank tokens in the list and none of the separators?