I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.

So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?

One of the most important pieces of information you need is a description of how your C compiler passes the arguments and return address to a subroutine. This is called the "calling convention" for that machine or processor. For example, it is typical on x86 to pass arguments and return address using the stack. This can only work for variable argument functions if arguments are pushed onto the stack in right-to-left order, then the return address is pushed. If you write your assembly language function to expect this layout of stack (the "activation record"), then the only important...
–
Heath HunnicuttJan 22 '13 at 18:24

...additional realization is that your asm function, once assembled and linked, is assigned an address in your program code segment. Thus, your C code may transfer execution of the processor to this address of your asm function. At that point, as long your function does the correct thing with registers (some must be preserved for the caller, such as EBP), knows how to find the arguments and return address on the stack, and returns its result in the correct place (32-bit return value goes in EAX on x86), then there is nothing about which disqualifies it.
–
Heath HunnicuttJan 22 '13 at 18:26

6 Answers
6

The C standard dictates what each library function must do rather than how it is implemented.

Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.

Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.

At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.

Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.

First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.

"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.

A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.

The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.

I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.

When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.

Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.

One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.