self-modifying code

Software which deliberately modifies the machine instructions comprising itself in memory. This technique obviously only works when writing in assembly language. It is of questionable value unless working in an environment where memory is so restricted that incredibly tightcode is a necessity. It is a bitch to debug.

A simple example: say you have a program containing a variable foo which is only ever used by two instructions, one of which loads it into a register, the other of which increments it (we'll assume the architecture has an inc instruction which can be applied to a memory location):

start: load r1, foo
;
; other stuff...
;
inc foo
jmp start

Then instead of wasting memory by having foo stored somewhere outside the code, we can instead change things so that the loadinstruction contains a literal value (which we set to foo's initial value), and the inc instruction references the actual memory location of the loadinstruction's operand:

start: load r1, 42
;
; other stuff...
;
inc (start+1)
jmp start

Thus after execution of inc (start+1) the load instruction stored in memory will actually be load r1, 43--the code has self-modified.

This, of course, is a very weak example. Much more interesting things happen when you start modifying the actual opcodes, so that completely different instructions will be executed on each pass through the loop.

This technique is sometimes used in polymorphic viruses. It's fairly common to see one modify its own code to remain as small as possible, or change its signature. It could also be used to disguise the code as something harmless as a virus scanner passes by...

Self modifying code is a programming technique where the program
modifies itself as it runs. This technique is generally frowned on
except when used in extremely limited ways, and has been largely made
impossible, undesirable, or useless by modern computer architectures.
Self modifying code was most useful on architectures with a
very limited number of registers and limited (less than 64k) ram.

Instructions that are modified in memory are not modified in cpu cache, and thus are ignored until the cache line expires. This could be exploited, of course, but then you have to totally understand how the instruction cache works.

read only text segments

Executable code in memory may be marked as read only by the operating system so it can be shared...

shared text segments

Exeuctable pages may be shared between separate processes, and thus modifying one page would affect other users' processes.
This is generally not allowed in multiuser operating systems.

The linker may patch unresolved jump statements in a jump table or in the code itself at or immediately before runtime;
an unresolved symbol may be expressed as a jump to a routine
that would backpatch the original jump to the correct address,
thus allowing demand linking.

The Linux kernel does (or at one time did) include
cpu instructions and features such as math instructions that
were not available on all cpu's. When such an instruction is
encountered the first time, a trap is genenerated and code
is called to patch the instruction into a more efficent subroutine call to
emulate the instruction next time instead of generating the trap.

On the fly generation of temporary code which may load
or switch banks to run another piece of code; this was
especially popular in bank switched machines, where the
addressable memory was smaller than the available memory,
and in systems that used overlays.

Rather than modifying code in place, the code is generated using a function pointer (an indirect jump in assembly) which is given a value at runtime. This has the advantage that type checking can still be done, but may be less efficent on some architectures.

Some operating systems have support for linking in additional code at runtime, either via the use of function pointers to activate the code once linked in, or via unresolved symbols that cause the additional code to be automatically linked.
(This uses the same mechanism as shared libraries.)

Some object oriented languages allow functions to be overloaded (defined multiple times in different ways), and linking of overloaded functions may actually change at runtime depending on what modules are loaded or the current context.

Some languages (java, lisp, perl, others) allow code to be stored
in or with a variable; the key is that the thunk may be created and passed to another piece of code (carrying along with it some of its execution environment) where it is later executed, similar to a trampoline.

This was brought to you by the Save Our Archaic
Technical Terms Society.

Oddly, no-one has mentioned Lisp. Lisp is the original self-modifying programming language. Sometimes it's elegant, sometimes it's just dirty. Lisp code is just data, so you could store the text of a function in a variable, and modify it as needed, then redefine the definition of the function in the global context if you want, or keep the compiled version of the function as a variable, and call that (As ever in Common Lisp, other options probably exist). Alternatively, to access the text of a function foo, you could do: