In this article

Best Practices for Using Strings in .NET

In this article

.NET provides extensive support for developing localized and globalized applications, and makes it easy to apply the conventions of either the current culture or a specific culture when performing common operations such as sorting and displaying strings. But sorting or comparing strings is not always a culture-sensitive operation. For example, strings that are used internally by an application typically should be handled identically across all cultures. When culturally independent string data, such as XML tags, HTML tags, user names, file paths, and the names of system objects, are interpreted as if they were culture-sensitive, application code can be subject to subtle bugs, poor performance, and, in some cases, security issues.

This topic examines the string sorting, comparison, and casing methods in .NET, presents recommendations for selecting an appropriate string-handling method, and provides additional information about string-handling methods. It also examines how formatted data, such as numeric data and date and time data, is handled for display and for storage.

Specifying String Comparisons Explicitly

Most of the string manipulation methods in .NET are overloaded. Typically, one or more overloads accept default settings, whereas others accept no defaults and instead define the precise way in which strings are to be compared or manipulated. Most of the methods that do not rely on defaults include a parameter of type StringComparison, which is an enumeration that explicitly specifies rules for string comparison by culture and case. The following table describes the StringComparison enumeration members.

We recommend that you select an overload that does not use default values, for the following reasons:

Some overloads with default parameters (those that search for a Char in the string instance) perform an ordinal comparison, whereas others (those that search for a string in the string instance) are culture-sensitive. It is difficult to remember which method uses which default value, and easy to confuse the overloads.

The intent of the code that relies on default values for method calls is not clear. In the following example, which relies on defaults, it is difficult to know whether the developer actually intended an ordinal or a linguistic comparison of two strings, or whether a case difference between protocol and "http" might cause the test for equality to return false.

Dim protocol As String = GetProtocol(url)
If String.Equals(protocol, "http") Then
' ...Code to handle HTTP protocol.
Else
Throw New InvalidOperationException()
End If

In general, we recommend that you call a method that does not rely on defaults, because it makes the intent of the code unambiguous. This, in turn, makes the code more readable and easier to debug and maintain. The following example addresses the questions raised about the previous example. It makes it clear that ordinal comparison is used and that differences in case are ignored.

The Details of String Comparison

String comparison is the heart of many string-related operations, particularly sorting and testing for equality. Strings sort in a determined order: If "my" appears before "string" in a sorted list of strings, "my" must compare less than or equal to "string". Additionally, comparison implicitly defines equality. The comparison operation returns zero for strings it deems equal. A good interpretation is that neither string is less than the other. Most meaningful operations involving strings include one or both of these procedures: comparing with another string, and executing a well-defined sort operation.

However, evaluating two strings for equality or sort order does not yield a single, correct result; the outcome depends on the criteria used to compare the strings. In particular, string comparisons that are ordinal or that are based on the casing and sorting conventions of the current culture or the invariant culture (a locale-agnostic culture based on the English language) may produce different results.

String Comparisons that Use the Current Culture

One criterion involves using the conventions of the current culture when comparing strings. Comparisons that are based on the current culture use the thread's current culture or locale. If the culture is not set by the user, it defaults to the setting in the Regional Options window in Control Panel. You should always use comparisons that are based on the current culture when data is linguistically relevant, and when it reflects culture-sensitive user interaction.

However, comparison and casing behavior in .NET changes when the culture changes. This happens when an application executes on a computer that has a different culture than the computer on which the application was developed, or when the executing thread changes its culture. This behavior is intentional, but it remains non-obvious to many developers. The following example illustrates differences in sort order between the U.S. English ("en-US") and Swedish ("sv-SE") cultures. Note that the words "ångström", "Windows", and "Visual Studio" appear in different positions in the sorted string arrays.

Case-insensitive comparisons that use the current culture are the same as culture-sensitive comparisons, except that they ignore case as dictated by the thread's current culture. This behavior may manifest itself in sort orders as well.

Comparisons that use current culture semantics are the default for the following methods:

In any case, we recommend that you call an overload that has a StringComparison parameter to make the intent of the method call clear.

Subtle and not so subtle bugs can emerge when non-linguistic string data is interpreted linguistically, or when string data from a particular culture is interpreted using the conventions of another culture. The canonical example is the Turkish-I problem.

For nearly all Latin alphabets, including U.S. English, the character "i" (\u0069) is the lowercase version of the character "I" (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, the Turkish ("tr-TR") alphabet includes an "I with a dot" character "İ" (\u0130), which is the capital version of "i". Turkish also includes a lowercase "i without a dot" character, "ı" (\u0131), which capitalizes to "I". This behavior occurs in the Azerbaijani ("az") culture as well.

Therefore, assumptions made about capitalizing "i" or lowercasing "I" are not valid among all cultures. If you use the default overloads for string comparison routines, they will be subject to variance between cultures. If the data to be compared is non-linguistic, using the default overloads can produce undesirable results, as the following attempt to perform a case-insensitive comparison of the strings "file" and "FILE" illustrates.

This comparison could cause significant problems if the culture is inadvertently used in security-sensitive settings, as in the following example. A method call such as IsFileURI("file:") returns true if the current culture is U.S. English, but false if the current culture is Turkish. Thus, on Turkish systems, someone could circumvent security measures that block access to case-insensitive URIs that begin with "FILE:".

Public Shared Function IsFileURI(path As String) As Boolean
Return path.StartsWith("FILE:", StringComparison.OrdinalIgnoreCase)
End Function

Ordinal String Operations

Specifying the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value in a method call signifies a non-linguistic comparison in which the features of natural languages are ignored. Methods that are invoked with these StringComparison values base string operation decisions on simple byte comparisons instead of casing or equivalence tables that are parameterized by culture. In most cases, this approach best fits the intended interpretation of strings while making code faster and more reliable.

Ordinal comparisons are string comparisons in which each byte of each string is compared without linguistic interpretation; for example, "windows" does not match "Windows". This is essentially a call to the C runtime strcmp function. Use this comparison when the context dictates that strings should be matched exactly or demands conservative matching policy. Additionally, ordinal comparison is the fastest comparison operation because it applies no linguistic rules when determining a result.

Strings in .NET can contain embedded null characters. One of the clearest differences between ordinal and culture-sensitive comparison (including comparisons that use the invariant culture) concerns the handling of embedded null characters in a string. These characters are ignored when you use the String.Compare and String.Equals methods to perform culture-sensitive comparisons (including comparisons that use the invariant culture). As a result, in culture-sensitive comparisons, strings that contain embedded null characters can be considered equal to strings that do not.

The following example performs a culture-sensitive comparison of the string "Aa" with a similar string that contains several embedded null characters between "A" and "a", and shows how the two strings are considered equal.

Case-insensitive ordinal comparisons are the next most conservative approach. These comparisons ignore most casing; for example, "windows" matches "Windows". When dealing with ASCII characters, this policy is equivalent to StringComparison.Ordinal, except that it ignores the usual ASCII casing. Therefore, any character in A, Z matches the corresponding character in a,z. Casing outside the ASCII range uses the invariant culture's tables. Therefore, the following comparison:

Both StringComparison.Ordinal and StringComparison.OrdinalIgnoreCase use the binary values directly, and are best suited for matching. When you are not sure about your comparison settings, use one of these two values. However, because they perform a byte-by-byte comparison, they do not sort by a linguistic sort order (like an English dictionary) but by a binary sort order. The results may look odd in most contexts if displayed to users.

Ordinal semantics are the default for String.Equals overloads that do not include a StringComparison argument (including the equality operator). In any case, we recommend that you call an overload that has a StringComparison parameter.

String Operations that Use the Invariant Culture

Comparisons with the invariant culture use the CompareInfo property returned by the static CultureInfo.InvariantCulture property. This behavior is the same on all systems; it translates any characters outside its range into what it believes are equivalent invariant characters. This policy can be useful for maintaining one set of string behavior across cultures, but it often provides unexpected results.

Case-insensitive comparisons with the invariant culture use the static CompareInfo property returned by the static CultureInfo.InvariantCulture property for comparison information as well. Any case differences among these translated characters are ignored.

The LATIN SMALL LETTER A character "a" (\u0061), when it is next to the COMBINING RING ABOVE character "+ " ̊" (\u030a), is interpreted as the LATIN SMALL LETTER A WITH RING ABOVE character "å" (\u00e5). As the following example shows, this behavior differs from ordinal comparison.

When interpreting file names, cookies, or anything else where a combination such as "å" can appear, ordinal comparisons still offer the most transparent and fitting behavior.

On balance, the invariant culture has very few properties that make it useful for comparison. It does comparison in a linguistically relevant manner, which prevents it from guaranteeing full symbolic equivalence, but it is not the choice for display in any culture. One of the few reasons to use StringComparison.InvariantCulture for comparison is to persist ordered data for a cross-culturally identical display. For example, if a large data file that contains a list of sorted identifiers for display accompanies an application, adding to this list would require an insertion with invariant-style sorting.

Common String Comparison Methods in .NET

String.Compare

As the operation most central to string interpretation, all instances of these method calls should be examined to determine whether strings should be interpreted according to the current culture, or dissociated from the culture (symbolically). Typically, it is the latter, and a StringComparison.Ordinal comparison should be used instead.

Types that implement the IComparable and IComparable<T> interfaces implement this method. Because it does not offer the option of a StringComparison parameter, implementing types often let the user specify a StringComparer in their constructor. The following example defines a FileName class whose class constructor includes a StringComparer parameter. This StringComparer object is then used in the FileName.CompareTo method.

Public Class FileName : Implements IComparable
Dim fname As String
Dim comparer As StringComparer
Public Sub New(name As String, comparer As StringComparer)
If String.IsNullOrEmpty(name) Then
Throw New ArgumentNullException("name")
End If
Me.fname = name
If comparer IsNot Nothing Then
Me.comparer = comparer
Else
Me.comparer = StringComparer.OrdinalIgnoreCase
End If
End Sub
Public ReadOnly Property Name As String
Get
Return fname
End Get
End Property
Public Function CompareTo(obj As Object) As Integer _
Implements IComparable.CompareTo
If obj Is Nothing Then Return 1
If Not TypeOf obj Is FileName Then
obj = obj.ToString()
Else
obj = CType(obj, FileName).Name
End If
Return comparer.Compare(Me.fname, obj)
End Function
End Class

String.Equals

The String class lets you test for equality by calling either the static or instance Equals method overloads, or by using the static equality operator. The overloads and operator use ordinal comparison by default. However, we still recommend that you call an overload that explicitly specifies the StringComparison type even if you want to perform an ordinal comparison; this makes it easier to search code for a certain string interpretation.

String.ToUpper and String.ToLower

You should be careful when you use these methods, because forcing a string to a uppercase or lowercase is often used as a small normalization for comparing strings regardless of case. If so, consider using a case-insensitive comparison.

Array.Sort and Array.BinarySearch

When you store any data in a collection, or read persisted data from a file or database into a collection, switching the current culture can invalidate the invariants in the collection. The Array.BinarySearch method assumes that the elements in the array to be searched are already sorted. To sort any string element in the array, the Array.Sort method calls the String.Compare method to order individual elements. Using a culture-sensitive comparer can be dangerous if the culture changes between the time that the array is sorted and its contents are searched. For example, in the following code, storage and retrieval operate on the comparer that is provided implicitly by the Thread.CurrentThread.CurrentCulture property. If the culture can change between the calls to StoreNames and DoesNameExist, and especially if the array contents are persisted somewhere between the two method calls, the binary search may fail.

' Incorrect.
Dim storedNames() As String
Public Sub StoreNames(names() As String)
Dim index As Integer = 0
ReDim storedNames(names.Length - 1)
For Each name As String In names
Me.storedNames(index) = name
index+= 1
Next
Array.Sort(names) ' Line A.
End Sub
Public Function DoesNameExist(name As String) As Boolean
Return Array.BinarySearch(Me.storedNames, name) >= 0 ' Line B.
End Function

A recommended variation appears in the following example, which uses the same ordinal (culture-insensitive) comparison method both to sort and to search the array. The change code is reflected in the lines labeled Line A and Line B in the two examples.

' Correct.
Dim storedNames() As String
Public Sub StoreNames(names() As String)
Dim index As Integer = 0
ReDim storedNames(names.Length - 1)
For Each name As String In names
Me.storedNames(index) = name
index+= 1
Next
Array.Sort(names, StringComparer.Ordinal) ' Line A.
End Sub
Public Function DoesNameExist(name As String) As Boolean
Return Array.BinarySearch(Me.storedNames, name, StringComparer.Ordinal) >= 0 ' Line B.
End Function

If this data is persisted and moved across cultures, and sorting is used to present this data to the user, you might consider using StringComparison.InvariantCulture, which operates linguistically for better user output but is unaffected by changes in culture. The following example modifies the two previous examples to use the invariant culture for sorting and searching the array.

Const initialTableCapacity As Integer = 100
Dim h As Hashtable
Public Sub PopulateFileTable(dir As String)
h = New Hashtable(initialTableCapacity, _
StringComparer.OrdinalIgnoreCase)
For Each filename As String In Directory.GetFiles(dir)
h.Add(filename, File.GetCreationTime(filename))
Next
End Sub
Public Sub PrintCreationTime(targetFile As String)
Dim dt As Object = h(targetFile)
If dt IsNot Nothing Then
Console.WriteLine("File {0} was created at {1}.", _
targetFile, _
CDate(dt))
Else
Console.WriteLine("File {0} does not exist.", targetFile)
End If
End Sub

Displaying and Persisting Formatted Data

When you display non-string data such as numbers and dates and times to users, format them by using the user's cultural settings. By default, the String.Format method and the ToString methods of the numeric types and the date and time types use the current thread culture for formatting operations. To explicitly specify that the formatting method should use the current culture, you can call an overload of a formatting method that has a provider parameter, such as String.Format(IFormatProvider, String, Object[]) or DateTime.ToString(IFormatProvider), and pass it the CultureInfo.CurrentCulture property.

You can persist non-string data either as binary data or as formatted data. If you choose to save it as formatted data, you should call a formatting method overload that includes a provider parameter and pass it the CultureInfo.InvariantCulture property. The invariant culture provides a consistent format for formatted data that is independent of culture and machine. In contrast, persisting data that is formatted by using cultures other than the invariant culture has a number of limitations:

The data is likely to be unusable if it is retrieved on a system that has a different culture, or if the user of the current system changes the current culture and tries to retrieve the data.

The properties of a culture on a specific computer can differ from standard values. At any time, a user can customize culture-sensitive display settings. Because of this, formatted data that is saved on a system may not be readable after the user customizes cultural settings. The portability of formatted data across computers is likely to be even more limited.

International, regional, or national standards that govern the formatting of numbers or dates and times change over time, and these changes are incorporated into Windows operating system updates. When formatting conventions change, data that was formatted by using the previous conventions may become unreadable.

The following example illustrates the limited portability that results from using culture-sensitive formatting to persist data. The example saves an array of date and time values to a file. These are formatted by using the conventions of the English (United States) culture. After the application changes the current thread culture to French (Switzerland), it tries to read the saved values by using the formatting conventions of the current culture. The attempt to read two of the data items throws a FormatException exception, and the array of dates now contains two incorrect elements that are equal to MinValue.