Ideas and thoughts on programming and software development

On strings, methods, return variables and IL code

Some days ago reviewing some old code I found out that a method was performing an operation with a string passed by arguments, storing the result in the same variable, and returning it at the end.

It looked weird, so I wanted to know what was really happening, and to check if there is really any difference between smashing the variable sent by argument, creating a new variable or directly returning the call result.

For that reason, I’ve created a small sample project, and later, using ILDasm, saw what was really under the hood. ILDasm is a disasembler for the Intermediate Language created by the CLR when we compile C#.

Just before we start, some quick notes:

IL looks like some sort of assembly-like language, in the way that it works with a call stack, and the result of a function call is stored on the stack before returning.

The values are index based, so when we are executing ldarg.0, we really are operating with the value located in the index 0

The result of calls to external methods is also saved on the stack.

The IL is not the bytecode that will execute, this code is interpreted at runtime by .NET, so the final code result may be sightly different.

At the beginning we define a variable that matches the return type specified in the header. This variable, placed on the 0 position, will contain the return of the method.

Afterwards, we load the arguments in the stack, in this case a single argument.

Before calling the substring function we must load into stack the other argument, a 4 byte integer of value 4.

Then we call the substring method, specifying both the assembly and the full namespace that contains the String class. The result of that call will be stored back into the stack.

After the call we retrieve the stack value and we place it back into the argument variable, replacing the existing object.

We read again the value from the argument to the stack and we store in the local variable 0, the return variable.

Finally, before returning the function, we place the return variable value on the stack, so it can be accesed from the caller method.

There are some calls like the br and the nop, that are related to how, in debug mode, extra instructions are added to the program for better step-by-step debugging, and there is a discussion on Stack Overflow about the subject, that is linked at the end of the article.

As we can see here, we are loading and storing the same value repeated times, and that may not be necesary at all.

As we can see It begins in the same way, but after calling the substring method the result of the method call is stored from the stack to the result variable, with no extra copying and no information smashing.

This looks like a more efficient way of working, because we save an extra Read/Write operation.

Let’s see what happens in the last case, using a extra variable defined inside the scope of the function, what would happen?

The first notable difference is in the local variable definition, that defines a second string variable that will hold our intermediate value.

The main difference between here and the first function is that no extra calls to the arguments are done, but, as we are saving the result in a variable before returning it, we have the same double Read/Write problem from the first case.

To sum up, if we directly return the result of a function instead of assigning it to a variable, we will avoid double Read/Write. The third option, while looks interesting, defines another variable, and more memory allocation.