From these 2 examples I feel inclined to conclude that the Split method recursively tokenizes as it goes through each element of the array from left to right.

However, once we throw in separators that contain alphanumeric characters into the equation, it is clear that the above theory is wrong.

"5.x.7".Split(new String[]{".x", "x."}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

"5.x.7".Split(new String[]{"x.", ".x"}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

This time we obtain the same output, which means that the rule theorized based on the first set of examples no longer applies. (ie: if separator precedence was always determined based on the position of the separator within the array, then in the last example we would have obtained "5." & "7" instead of "5" & ".7".

As to why I am wasting my time trying to guess how .NET standard API's work, it's because I want to implement similar functionality for my java apps, but neither StringTokenizer nor org.apache.commons.lang.StringUtils provide the ability to split a String using multiple multi-character separators (and even if I were to find an API that does provide this ability, it would be hard to know if it always tokenizes using the same algorithm used by the String.Split method.

String#split method in Java takes a regex as split criteria. So, you can merge as many split criteria using pipe(|). Further, it would be better if you can post the real problem here, rather than equivalent code in other language. Not all people know multiple languages.
–
Rohit JainFeb 7 '13 at 22:43

@RohitJain: Even so, I would be interested in learning what .NET's algorithm is.
–
MatthewFeb 7 '13 at 22:44

3 Answers
3

To avoid ambiguous results when strings in separator have characters
in common, the Split operation proceeds from the beginning to the end
of the value of the instance, and matches the first element in
separator that is equal to a delimiter in the instance. The order in
which substrings are encountered in the instance takes precedence over
the order of elements in separator.

So, for the first case ".." and "..." are found on the same position and their order in separator is used to determine the used one. For the second case, ".x" is found before "x." and the order of elements in separator does not apply.

string .split does splits the first matching character matching to the argument. In simple Question : lets say you provided the option split("a", "b")
and the String contains "appaleisbigapll" the algorithm is simple that is start with first character and matching with either of a or b. if it found these it does split and start with next character. in your example

5.x.7 with ".x", "x.". It rules with "or" operator so it finds .x first and checking the remaining .7 now as there is no matching character left so it leaves .7 as it is. Result 5 and .7

Same happening in the second question it founds .x and as the rule says .x or x. it continue with .7
the precedence is not applied here. And for your first set of example yes it does the split operation recursively.

"Yes it does the split operation recursively" I would say is incorrect. It only appears that way in the first example because they are in the order that it matches. It doesn't actually happen that way.
–
Simon WhiteheadFeb 7 '13 at 23:40