dcupthings

This is the dcputhings, assorted tools for DCPU-16 development. This
repository is maintained by Kang Seonghoon and contains
the following softwares:

dcpu.c and dcpuopt.c, my early attempts to build a DCPU-16 emulator.

DcpuAsm, an Ocaml DSL for DCPU-16 assembly.

DcpuAsm

DcpuAsm is a DCPU-16 assembler embedded in the Ocaml syntax. It allows
easier DCPU-16 code generation, but it can also be used as an ordinary macro
assembler if you understand a bit of Ocaml.

Basically, the assembly is represented as an Ocaml list:

[SETA,4;SETB,5;%loop:SETPC,%loop;]

More specifically, SET A, 4, SET B, 5 and %loop: SET PC, %loop evaluates
to the internal representation via Camlp4. Note that Ocaml allows the trailing
semicolon in the brackets, so the last ; is just fine.

This will resolve all the labels to the fixed offset. to_binary_le converts
the static code into the little-endian byte string. The big-endian counterpart
is to_binary_be, and you can get an array of words using to_words.

By default ASM assumes the origin at 0x0000. This can be changed using ASM
~origin:0x1000 [...] syntax; in fact, ASM is just a shorter alias to
DcpuAsm.asm function.

Basic opcodes has two arguments, and special opcodes has one of them.
Multiple arguments are separated with , as much like other assemblers.

DAT has one or more arguments. The argument can be a typical immediate (see
below for the syntax) which occupies exactly one word, or a string which
occupies the same number of words (so that "ok" equals to 'o', 'k').
One can also use _ for placeholders which value can be ignored; it is mostly
equivalent to 0 but _s at the end of the binary will be ignored.
Arguments can have a TIMES prefix as like DAT 3 TIMES 0x1234, where the
repeat count can be any expression including labels. Repeating string is also
allowed (e.g. 3 TIMES "hello?").

ORG x is equivalent to DAT (x-%_) TIMES _, and will set the current
assembly position to x if possible. It will raise an error if it is
impossible. x should be a positive integer.

ALIGN x is equivalent to DAT ((x-(%_ MOD x)) MOD x) TIMES _, and will set
the current assembly position to the next multiple of x. (Therefore it will
add at most x-1 zeroes.)

NOP is equivalent to SET A, A. It does nothing but take one cycle. While
DCPU-16 has lots of nops, this encoding is chosen because of the simplicity of
its binary encoding (0x0001). It may change if 0x0000 also turns out to be a
nop.

JMP a sets the PC to a in the fastest or at least shortest way. There are
4 possible encodings for JMP: SET PC, ..., XOR PC, ..., AND PC, ...
and SUB PC, .... (Among them XOR is fastest but not applicable for all
cases.) Note that the plain SET PC, a won't be optimized; you must
explicitly use JMP a instead.

PUSH a is equivalent to SET PUSH, a. POP a is equivalent to SET a,
POP. You can also use [SP] instead of PEEK, and [SP+...] instead of PICK
....

RET is equivalent to SET PC, POP, and used for returning from the
subroutine initiated by JSR instruction.

BRK and HLT are equivalent to SUB PC, 1. This forms a simple infinite
loop, and used as a de facto instruction to terminate the emulator.

PASS does not emit the binary at all; it can be used as a placeholder.

DcpuAsm does not support EQU pseudo-instruction or similar; you can use an
ordinary let x = ... in ... construct to define constants, however.

Labels

DcpuAsm supports labels. Labels are a valid Ocaml name (always starts with a
lowercase letter or _) prepended by %; %asdf, %_foo_bar, %loop42
are valid labels, for example.

There are two ways to use labels:

It can occur in the expression, and evaluates to the location pointed by
the label. The instruction may contain labels defined after it.

It can also occur at the front of the instruction (e.g. %foo: SET A, 3)
to declare the label. The colon (:) is optional for predefined
instructions, but you are recommended to keep the colon as it allows
multiple label definitions. Skipping a colon may be natural for DAT
instructions however.

You can define labels at the end of list; the PASS statement will be
implicitly added:

[JMP%garbage;DAT1,2,3,4;%garbage:]

DcpuAsm will automatically resolve labels to the appropriate position. Since
the length of instructions may vary depending on the position of labels,
DcpuAsm runs multiple passes to settle them down. If it is not stabilized
after given number of passes DcpuAsm gives up. The default limit is 50, but
can be configured like ASM ~maxpass:10 [...].

It is possible to have free (undefined) labels in the assembly. DcpuAsm will
make sure that these labels, while unresolved, will not affect the other parts
of generated code. This is done by forcing all remaining immediates to always
use a longer form.

It is advised to prepend _ to local labels. DcpuAsm has a special support
for these local labels (see below).

The special label %_, when used in the expression, resolves to the position
of the current instruction. For example SET PC, %_ will be same as %_temp:
SET PC, %_temp. You cannot define a label named %_.

Expressions

Expression can occur as an instruction's argument. It may contain registers,
numbers, labels, memory references (enclosed in []) and expressions.

DcpuAsm supports all general and special registers: A, B, C, X, Y,
Z, I, J, SP, PC, EX, IA. (IA cannot really be used, but it is
there for the better error handling.) It also supports PUSH, PEEK, POP and
PICK ...; they cannot be used in the expression.

DcpuAsm supports numbers in base 2 (0b101), base 8 (0o337), base 10 and
base 16 (0x1337) just like Ocaml. Additionally a character literal ('A')
will be equal to its numerical code (i.e. int_of_char 'A'). All numbers are
treated as built-in Ocaml numbers (31 or 63 bits long depending on the
platform) so you should be aware of it.

DcpuAsm supports all ordinary arithmetic and bitwise operations: +, -,
*, DIV, MOD, NOT, AND, OR, XOR, SHL, SHR. While arithmetic
operations are permitted for registers, the resulting expression has to be in
the form register + other expression or [register + other expression] due
to the constraint of DCPU-16. (The intermediate expression does not have to
however: [3*A+2*(2*B-A)-(8 DIV 2)*B] will be resolved to [A], which is
perfectly valid in DCPU-16.)

As mentioned before, labels in the expression evaluate to their positions. You
can do something like this:

IMM (e) (note the parentheses) and PTR [e] is a longer form of e and
[e], respectively, and you should use them outside the assembly instruction.

Normal Ocaml expression can also be used; VAL e will evaluate e as an
Ocaml expression and use its value as an immediate. Similarly, STR e uses
its value as a string (only useful in DAT arguments). If the Ocaml
expression is simple enough (e.g. a single identifier) then you can omit
VAL entirely. This is very useful for compile-time constants:

DcpuAsm tries to generate the shortest code for given assembly, but you can
override this behavior by SHORT and LONG prefixes. SHORT e will cause an
error when e does not fit in the range of -1--30 or it is used as a first
operand (cannot use a short literal there), and LONG e will generate a longer
form of given immediate (not the instruction, so should use it twice for basic
opcodes). This only applies to a literal value; it is silently ignored in other
kind of values.

A special value NEXT can also be used as a part of an expression. It won't
generate the "next words" required for long literals and register-relative
addressing, so whatever the next instruction is it (or its first word) will be
the next word. It can be used for simple self-modifying programs in combination
with SHORT (to ensure that the next instruction is always one word long), for
example. Note that NEXT is not canonicalized, so [A+NEXT*2-NEXT] (for
example) is invalid. Only a form of NEXT, [reg+NEXT], [NEXT+reg] is valid.

Blocks

DcpuAsm supports a block as a unit of assembly instructions. They can be used
as a building block:

BLOCK e evaluates e as an Ocaml expression (which includes,
incidentally, a list containing assembly instructions) and makes an
instruction block out of it. You don't have use blocks if you have exactly one
instruction, as the case n = 1 of the above code suggests.

BLOCK LOCAL will make all defined labels starting with _ local. It is not
possible to access these local labels outside of the block (unless you use a
nasty hack). Non-local blocks (in this case, case 1 and case 2) will not
affect this procedure. This is very useful for generated codes.

You can manually give a list of local labels using BLOCK LOCAL %a, %b, %c
syntax; or by an Ocaml list using BLOCK LOCAL *["a"; "b"; "c"].
DcpuAsmExample.ml contains some extreme example of local blocks.

Macros

There are no separate macro feature in DcpuAsm, but you can trivially make a
simple macro with local blocks and Ocaml let construct.

DcpuAsm does support some additional features for macros. While VAL and
STR allows the insertion of arbitrary immediate value or string, you cannot
insert registers or other expression in this way. Therefore DcpuAsm supports a
quoted form #e of an Ocaml expression, which can appear in:

Expressions (e.g. 3 + #reg). The expression should evaluate to the
internal representation of expression; an immediate should be quoted using
IMM prefix.

Labels (e.g. %#labelname). The expression should evaluate to a string.
While you can use any character in the label name (even an empty string is
permitted), you should restrict yourself to the normal identifier. Local
labels, for example, start with . character internally. DcpuAsm.gensym
function can be used to generate unique symbols.

Caveats / To-do List

As always you should expect the following caveats:

You cannot use codes like [(%label1, ...); (%label2, ...); ...] in the
normal Ocaml code because (% will be treated as one token. (( %label1
etc. will work.) The assembly syntax is specially crafted to separate those
two, however. Any suggestions about this problem are welcomed.