The .NET Framework defines a text element as a unit of text that is displayed as a single character, that is, a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The Unicode Standard defines a surrogate pair as a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate. The Unicode Standard defines a combining character sequence as a combination of a base character and one or more combining characters. A surrogate pair can represent a base character or a combining character.

The StringInfo class enables you to work with a string as a series of textual elements rather than individual Char objects.

To instantiate a StringInfo object that represents a specified string, you can do either of the following:

Call the StringInfo(String) constructor and pass it the string that the StringInfo object is to represent as an argument.

Call the default StringInfo() constructor, and assign the string that the StringInfo object is to represent to the String property.

You can work with the individual text elements in a string in two ways:

The following example illustrates both ways of working with the text elements in a string. It creates two strings:

strCombining, which is a string of Arabic characters that includes three text elements with multiple Char objects. The first text element is the base character ARABIC LETTER ALEF (U+-627) followed by ARABIC HAMZA BELOW (U+-655) and ARABIC KASRA (U+0650). The second text element is ARABIC LETTER HEH (U+0647) followed by ARABIC FATHA (U+-64E). The third text element is ARABIC LETTTER BEH (U+0628) followed by ARABIC DAMMATAN (U+064C).

strSurrogates, which is a string that includes three surrogate pairs: GREEK ACROPHONIC FIVE TALENTS (U+10148) from the Supplementary Multilingual Plane, U+20026 from the Supplementary Ideographic Plane, and U+F1001 from the private user area. The UTF-16 encoding of each character is a surrogate pair that consists of a high surrogate followed by a low surrogate.

using System;
using System.Text;
using System.Globalization;
publicsealedclass App {
staticvoid Main() {
// The string below contains combining characters.
String s = "a\u0304\u0308bc\u0327";
// Show each 'character' in the string.
EnumTextElements(s);
// Show the index in the string where each 'character' starts.
EnumTextElementIndexes(s);
}
// Show how to enumerate each real character (honoring surrogates) in a string.staticvoid EnumTextElements(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the enumerator returned from GetTextElementEnumerator // method to examine each real character.
TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);
while (charEnum.MoveNext()) {
sb.AppendFormat(
"Character at index {0} is '{1}'{2}",
charEnum.ElementIndex, charEnum.GetTextElement(),
Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of GetTextElementEnumerator:");
Console.WriteLine(sb);
}
// Show how to discover the index of each real character (honoring surrogates) in a string.staticvoid EnumTextElementIndexes(String s) {
// This StringBuilder holds the output results.
StringBuilder sb = new StringBuilder();
// Use the ParseCombiningCharacters method to // get the index of each real character in the string.
Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s);
// Iterate through each real character showing the character and the index where it was found.for (Int32 i = 0; i < textElemIndex.Length; i++) {
sb.AppendFormat(
"Character {0} starts at index {1}{2}",
i, textElemIndex[i], Environment.NewLine);
}
// Show the results.
Console.WriteLine("Result of ParseCombiningCharacters:");
Console.WriteLine(sb);
}
}
// This code produces the following output.//// Result of GetTextElementEnumerator:// Character at index 0 is 'a-"'// Character at index 3 is 'b'// Character at index 4 is 'c,'// // Result of ParseCombiningCharacters:// Character 0 starts at index 0// Character 1 starts at index 3// Character 2 starts at index 4