Maybe a better way would be to set a flag in the VBI and poll that flag in the mainloop?

As far as I remember some games uses this technique ( For example unfinished polish game Robbo, http://korin.ppa.pl/ ). Rare example of using VERTB as main loop is GordianTomb ( all things are done in VERTB and main loop only drawscreen as I remember ).

I execute "stuff to happen at Vblank" with a Vblank interrupt Or use a lib function. If you have a copper running you can write to INTREQ and poll a bit. If you have random ints running you can have a bhs-wait for a rasterline and a bhi-wait to make sure you leave it in case you do nothing that frame.

First waits for line 0, second waits for line 256. (it can also wait for line 512, 1024 and so on but I think this was about standard video modes )

Technically byte read from custom register is "illegal" but has always worked perfectly fine. Everyone did that

Byte writes to custom registers may not work properly with all accelerator models (Blizzard 1260 for example, only to odd or even bytes, if I remember correctly)

I don't remember any Official docs saying that byte read/writes are illegal. Are you sure?

I do remember that HRM says its legal to read/write a pair of registers with
a long access, and that since many (or all?) registers are either read-only or write-only, it is illegal to access a register with a CPU instruction that perform both a read and a write on its operand like CLR on 68000, or BSET/BCLR. But BTST should only read its operand, I think.

I don't remember any Official docs saying that byte read/writes are illegal. Are you sure?

I said "technically" illegal, as "illegal" as word or long CIA accesses and because byte writes can have different results on different Amigas:

A500: 0x12 byte write is written as 0x1212 but in some accelerators it can be 0x0012 or even 0x<garbage byte>12 or something similar. (afaik whdload AUDxVOL byte write fixes on 68060 are needed because of above "feature")

A bit off-topic, but I'd give yourself some more processing time by waiting for the last displayable scanline instead of VB. The best place to begin running code for the next frame if single-buffering is often just after the last displayable line, especially if some memory needs to be cleared. You get almost double the normal memory bandwidth when there's no other DMA going on. Pairing MOVEM with a blitter memory clear is especially fast. A blit using channels BCD (a rectangular bitmap move, for example) runs nicely in parallel with the CPU during the overscan areas, even with blitter-nasty turned on.

If you're double buffering, beginning your processing at the first line is often best, especially if you're calling anything in ROM.

It's nice because I know that the main loop will not continue until after vblank isr is finished (maybe that's guaranteed anyway, but I'm not sure about that), and also the wait-code is guaranteed to finish only once per vblank because it clears the flag in conjunction with the test.

Your approach is recommendable, although the PC-relative addressing mode is not allowed here.

In Sqrxz I am incrementing a frame counter during a copper interrupt, caused at the bottom of the display. The rendering routine can then compare its frame counter with the hardware frame counter and decide to either drop the frame, when the hw-counter is higher, or wait until both are equal again.

Autovectored interrupts on the 68000 have the annoying property that the 68000 must sync with an internally generated E-clock running at CLK/10 during interrupt handling. This can add up to 18 cycles on top of the time needed to push state onto the stack -- up to 44 cycles total.

And then you have to worry about the time it takes for the current instruction to complete before interrupt handling can even begin. A DIVU can delay interrupt handling by 100+cycles. This can be helped by executing a STOP and waiting for an interrupt, but the E-clock still complicates things. There are even situations where an interrupt handler must be padded with a variable number of instructions based on the predicted phase of the E-clock so as to perfectly time 68000 writes to chip registers. This might happen when reprogramming the sprite registers with a MOVEM in the middle of a scanline.

Well, you need 2x movem (if you do more than just just testing the VBL flag in the interrupt at least) plus the rte which take up a fair amount of processor time. It can make quite a difference not to use interrupts in time critical situations.