In common commodity microprocessors, the instruction set is either hardwired or has a MicroCode ROM.

However, a few CPUs include a user-modifiable microcode RAM.

One such processor is the WISC Technologies' CPU/32 (ca 1987),
mentioned in the JFAR (http://www.jfar.org/volume5.html). This was built
from discrete TTL components, allowing considerable flexibility in system
configuration.

Another variation, the RTX 32P, optimized for ForthLanguage, (see
http://www.ece.cmu.edu/~koopman/stack_computers/sec5_3.html) was designed
to execute two microinstructions for each main memory access, including
instruction fetches.
This was a single-chip CMOS microprocessor, with
on-chip micro-program RAM (and 2 stacks).
It ran at 8 MHz, which was relatively fast for its time.

Another processor, the Cjip (Imsys Technologies) uses (writable) 72 bit wide
microcode instructions optimized for any of the following:

There are others. The attractive thing about writable instruction sets
is that it allows profiling techniques to not only optimize the software
for a given task, but also the hardware -- if an application is more
heavily weighted toward math, you would tweak the CPU in that direction --
if comms or data collection was required, that would be a different tweak.

I've worked for some companies that could have benefitted hugely from this
kind of technology, but I could never sell the idea. AssemblyLanguage is
cool, but MicroCode is where the RealEngineers hang out.

How would that work in a multi-threading OS? I imagine swapping out MicroCode is something that one wouldn't do every context-switch...

One approach is to just change an internal switch to point to a different range of microcode.

(For the same reason, many processors that change an internal switch to point to a different register bank during interrupts -- such as the PDP-11, the Z80, the ARM, ___, and ___. The Sun SPARC has "sliding register windows" that are ever so much more clever. "Interrupts on Pipelined Systems" http://www.cs.uiowa.edu/~jones/arch/notes/34int.html ).

However, since context-switch time is often terrible for other reasons,
the slight improvement in overall performance usually
isn't worth the cost of extra bank(s) of microstore
(or the extra banks of registers).

"MVP Microcoded CPU/16 Architecture"
http://www.ece.cmu.edu/~koopman/forth/rochester_86b.pdf
that was built (apparently in 1986, perhaps earlier) out of
less than 100 chips:
4 static RAMs ( data stack, return stack, program and data RAM, and micro-program memory),
and the rest 74LS00 series MSI components.

DavidCary thinks it looks very similar to the
TTL design in ComputationStructures?ISBN 0262231395
(identical ALU design, similar microcode),
with this major added feature:
the CPU/16 could be halted by the host,
and all the RAMs
(Including the micro-program memory !)
and most of the registers
examined and modified by the host,
single-stepped or run freely (~ 3 MHz, which is pretty fast for 73LS00 discrete-logic CPUs).

I used to think writable instruction sets were really, really cool.
However, now I see other solutions to the problems they solve:

I can't do exactly what I want to do in the time of a single machine-language instruction -->

... and I process lots of data. Solutions: WISC or instruction cache or ???. With an instruction cache, typically the bottleneck is now getting data in and out of the CPU. Even if you could add a specialized instruction to do something in a single CPU cycle (rather than a few cycles of subroutine), it wouldn't run any faster than a in-cache subroutine -- both are still waiting on RAM for the data. Both WISC and instruction caches slow down context-switches.

I can't do exactly what I want to do in the space of a single machine-language instruction -->

solutions: WISC or "call subroutine" instruction or ???. Once the subroutine is written once, the space required to call it as many places as you want is a single "call subroutine" instruction.

The instruction set is ugly -- I hate writing so many instructions to do a simple "call".

solutions: WISC or an interpreted language like ForthLanguage (that interprets a short "call" instruction into the appropriate sequence at run time) or higher-level languages (that expands a short "call" instruction into the appropriate sequence at compile time) or ???.