I wrote an analysis of the architecture and a modified proposal that (coincidentally) addresses almost all of these points and a few others :) (except for the first one, but I changed it to a load-store architecture so that one becomes irrelevant).

I notice this still says "Special opcodes always have their lower four bits unset", but don't we actually need five bits to be unset in order for the lower 5 bits to be 0, coinciding with the 5-bit "special instruction" opcode?

Then the next question is: is the special opcode 6 bits with a 5 bit value, or is it 5 bits with a 6 bit value? I think I'd go for the 6 bit value to allow for the literals to go in this single value, so 5 bit opcode here.

Minor comment. Update the version number.
Interesting changes, lots of room for optimization in generated code. Literal -1 will be very useful. 1970's coding styles ofter used -1 as an undefined flag as well as the other examples cited