To really get down to basics:
And
Or
Left shift by 1
Right Shift by 1
Invert/Complement
Load
Store
Jump on true/false
Call
Return
Then you can add the hard way
And
Or
And (Or with ~And) for Xor
Left Shift the And
Repeat the above to propagate the Carries for a Ripple Carry Adder
Then when you get tired of hand coding the Least Instruction Set Computer (LISC) you will appreciate Subroutines and figure a way to Call the Add subroutine with parameters.
The Jump needs some work, but it would use flags like the old mainframes. Do a compare to get Gtr and Less. Conditional Jump on False that tests Gtr and Less for false would jump on Equal and Jump on True would jump if not equal.
Any logic function can be done with enough And, Or and Inverts .. why you can even build a computer.

When your 4-bit processor is all done and debugged, why don't you send it over to Ken Chapman at Xilinx? He has already created a VHDL design of an 8-bit processor called the "Picoblaze". If you haven't seen/used the Picoblaze you should check it out - it's very cool. Perhaps your 4-bit design could become an even simpler "Femtoblaze".

This is another thing Joe and I are bouncing around. On the one hand we could make the CPU very minimalist and then use more instructions ... but this might not be so much fun (and not so intuitive) for beginners.
Or we can have a more sophisticated CPU that supports more interesting instructions.
Watch this space...

I started on the basis of just having an accumulator -- also not having an interrupt structure or a stack -- but Joe says that if this is supposed to be educational then we should have a primitive interrupt structure and a stack. I think you'll be surprised with what we've come up with ... more soon -- Max

I agree -- XOR is just too useful. One criteria for me is how many times you use an instruction -- I use XOR a lot. Joe says we don't need a CMP (compare) instruction, but I'm fighting for that one also :-)

You can replace AND, OR and NOT with NAND (AND followed by NOT)or NOR (OR followed by NOT).
For instance,
M0 AND M1 is
LOAD A, M0
NAND A, M1
LOAD M2, A
NAND A, M2
; A has M0 AND M1
M0 XOR M1 is
LOAD A, M0
NAND A, M0
LOAD M2, A
LOAD A, M1
NAND A, M1
NAND A, M2
;A now has M0 OR M1
LOAD M2, A
LOAD A, M0
NAND A, M1
NAND A, M2
LOAD M2, A
NAND A, M2
; A has M0 XOR M1
Basically you can reduce the number of opcodes needed if you are willing to perform more operations. If you want to really have a minimal set of arithmetic/logic operations, you can build ADD up out of XOR (and AND) and so do all the arithmetic and logic ops with just NAND.

Good mental exercise. I've never thought about 16 instructions--I would assume only 1 register (the accumulator) and two flags, carry and zero.
For load/store, I think you would need:
LOAD A,(addr) (load from memory)
LOAD (addr),A and (store to memory)
LOAD A, immediate. (load constant)
For math, I would suppose:
CC (clear carry),
ADDC A,(addr), (add w/carry)
SUBC (addr), (subtract w/borrow)
COM A (1's complement),
OR A, (addr),
AND A, (addr)
Shifts/rotates could be done with
ROTL A and
ROTR A;
(CC could make these shifts.)
Jumps need
JUMP NC, (addr) and
JUMP NZ, (addr);
...unconditional jumps can be handled by CC inst. I would also lobby for:
CALL (addr) and
RET.
That's 15 instructions. INC and DEC can be handled by storing 1 in memory and performing ADDC or SUBC.